Welcome! Log In Create A New Profile

Advanced

Debian on Dell Kace M300

Posted by JDS420 
Re: Debian on Dell Kace M300
February 20, 2020 07:58AM
bodhi Wrote:
-------------------------------------------------------
>We welcome any Linux
> distro users.

Thanks bodhi!

> I did not recall Mike has "bus error" ?

He didn't, but his /proc/cpu/alignment output from his m300 running LMS listed 3 alignment faults - no idea really how to interpret what caused them and as these are in the "system" category vs. the "user" category I'm guessing that these are corrected automatically by the kernel and the entries are just informational - as opposed to the user faults where you can control how faults are handled by echoing the various numeric options to /proc/cpu/alignment. ?

For the "user" category faults that I'm seeing on Alpine, I agree with you, these are software related and specific to applications which aren't coded to always perform aligned memory access (again, not sure if this the 100% accurate way to word it but you get my meaning).

What continues to nag me though is that it seems based on Mike's output that the debian image you guys are using defaults /proc/cpu/alignment to 0 for user faults (like my Alpine system is). I'm pretty sure then that you guys would also get alignment faults and strange application behavior depending on the software you might choose to use. That Mike has 0 user errors in his output could just be luck so far in other words. If he installed something new in the future, he could have issues.

I was trying to figure out if debian has been defaulting to doing something differently than Alpine which would explain why you guys haven't had issues with this before. I did find this old post:

https://lists.debian.org/debian-arm/2011/06/msg00072.html

One part of that post states:
"The default /proc/cpu/alignment mode seems to be 2 (fixup), on v6/v7,
priovided that the v6 unaligned access model (CR_U) is supported by the CPU:"

Which I think is related to the kernel, and not debian specifically. The cpu on the m300 is v5 right so might explain why it defaults to 0 (ignore) as seen in Mike's output? It was suggested in the top part of that post to add an alignment= kernel option to perform user fixup, so maybe that is one option to consider.

There are a few small programs on github to induce/test memory alignment faults so someone could pick one and test with it and we could compare results. I'm planning on doing it for Alpine but I'll be compiling for musl libc so I don't believe the resulting binary will run on your glibc debian builds.



Edited 3 time(s). Last edit at 02/20/2020 12:00PM by sodface.
Re: Debian on Dell Kace M300
February 20, 2020 05:05PM
Quote

He didn't, but his /proc/cpu/alignment output from his m300 running LMS listed 3 alignment faults - no idea really how to interpret what caused them and as these are in the "system" category vs. the "user" category I'm guessing that these are corrected automatically by the kernel and the entries are just informational - as opposed to the user faults where you can control how faults are handled by echoing the various numeric options to /proc/cpu/alignment. ?

True. I don't think the kernel has anything to do with it in this case. Some applications have bugs regarding alignment. It is best that they fix the code. We realy should not mess with system settings. As long as the kernel was built correctly (GCC also has type of bug before, but it got fixed quickly), and the rootfs architecture is correct, no need to do anything.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: Debian on Dell Kace M300
February 20, 2020 05:57PM
Out of curiosity, I booted back up the factory Dell M300 image and checked the alignment setting:

root@dellkaceM300:~# cat /proc/cpu/alignment 
User:		0
System:		0
Skipped:	0
Half:		0
Word:		0
DWord:		0
Multi:		0
User faults:	2 (fixup)

So that's kind of interesting. It's not being set via the cmdline:

root@dellkaceM300:~# cat /proc/cmdline 
console=ttyS0,115200 mtdparts=spi_flash:0x7f000@0(uboot),0x1000@0x7f000(u-boot-env)

I thought it was being set in the kernel config:

root@dellkaceM300:/# grep -i alignment /boot/config-2.6.32-5-kirkwood 
CONFIG_ALIGNMENT_TRAP=y

but I see that this setting is the same in your kernel config, though Mike's output shows /proc/cpu/alignment set to 0 (ignore) for user faults.

So how is it being set to 2 on the factory load? Odds are that had my Alpine load defaulted to 2 somehow, I would have never had any issues with apk and been blissfully ignorant that all those alignment faults were happening behind the scenes. Not sure what's better. Would have saved me many hours of head scratching.

//edit, just verified that adding alignment=2 to the kernel command line options does set /proc/cpu/alignment to 2 (fixup)

earlyprintk root=/dev/sda2 alignment=2 console=ttyS0,115200 mtdparts=spi_flash:0x7f000@0(uboot),0x1000@0x7f000(u-boot-env)
localhost:~# cat /proc/cpu/alignment 
User:		0
System:		0 (0x0)
Skipped:	0
Half:		0
Word:		0
DWord:		0
Multi:		0
User faults:	2 (fixup)



Edited 4 time(s). Last edit at 02/20/2020 06:51PM by sodface.
Re: Debian on Dell Kace M300
February 20, 2020 07:14PM
sodface,

2.6.32-5-kirkwood is a very old kernel. And being a stock kernel, It could have settings enabled in some patched code, too. I don't think it should be compared to modern kernel.

The GCC and the kernel have evolved so much from that day to the present.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: Debian on Dell Kace M300
February 20, 2020 07:37PM
bodhi, while I agree with your statements, I think that the following is also true (even today) as far as the alignment setting goes:

- The debian config you guys are currently using sets /proc/cpu/alignment to User faults: 0 (ignore)
- Due to this setting, applications that perform unaligned memory operations may exhibit unexpected behavior because the kernel will not correct it and the program will continue running with unpredictable and undesirable results
- Setting /proc/cpu/alignment to User faults: 2 (fixup) will allow the same applications to operate normally because the kernel will correct the memory accesses, at the cost of extra cpu cycles and therefore degraded performance

I decided to check the setting on one of my Raspberry Pi Zero W's that runs the official armhf release of Alpine Linux and it was set to 2 (fixup).

I think I will probably go with the 2 (fixup) setting going forward, just to be safe.



Edited 3 time(s). Last edit at 02/20/2020 07:58PM by sodface.
Re: Debian on Dell Kace M300
February 20, 2020 09:45PM
sodface,

> I think I will probably go with the 2 (fixup)
> setting going forward, just to be safe.

That would be a good work around for bad behavior applications!

However, I think Debian is more correct in that such applications bugs should be fixed. Unless you can make it apparent so that users can submit a bug report, it is like sweeping dirts under the rug :) Besides, there is no guarantee that the kernel could fix it up successfully. Some of the out of alignment problem could also cause unpredictable behavior. I mean... pick your your poison :)

In real world mission critical applications, these problems must be fixed.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)



Edited 1 time(s). Last edit at 02/20/2020 09:46PM by bodhi.
Re: Debian on Dell Kace M300
February 20, 2020 10:41PM
bodhi Wrote:
-------------------------------------------------------
> However, I think Debian is more correct in that
> such applications bugs should be fixed. Unless you
> can make it apparent so that users can submit a
> bug report, it is like sweeping dirts under the
> rug :) Besides, there is no guarantee that the
> kernel could fix it up successfully. Some of the
> out of alignment problem could also cause
> unpredictable behavior. I mean... pick your your
> poison :)
>
> In real world mission critical applications, these
> problems must be fixed.

Agree, except when User faults is set to 0 (ignore) the misbehaving application doesn't necessarily crash or exit but continues to run with unpredictable results!

To detect faulty applications, it should be set to 4 or 5 so the process exits with Bus error and doesn't continue to run. 0 is probably the worst setting, which is what you have now!
Re: Debian on Dell Kace M300
February 21, 2020 01:34AM
> Agree, except when User faults is set to 0
> (ignore) the misbehaving application doesn't
> necessarily crash or exit but continues to run
> with unpredictable results!

That's not the behavior in Debian as I known of. But I guess it's all academic. Glad you got it figured out what to do in Alpine!

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: Debian on Dell Kace M300
February 21, 2020 06:30AM
bodhi Wrote:
-------------------------------------------------------
> That's not the behavior in Debian as I known of.
> But I guess it's all academic. Glad you got it
> figured out what to do in Alpine!
I'm still trying to figure out what the default behavior in Debian is supposed to be, based on Mike's output it looked like it default to 0 (ignore). If that's the only place where that setting is set and checked.

The Debian wiki page that I linked to earlier, last updated in 2017 (though wikis are hard to keep everything current), says:

Quote

word accesses must be aligned to a multiple of their size

Accesses from non-aligned locations give garbled results

For example, loading a 32-bit word from a non-aligned pointer reads an aligned 32-bit word from the next lowest 32-bit-aligned location (ignoring the lower 2 bits of the pointer) and then rotates the result so that the byte indicated by the pointer ends up in the least significant byte.

This applies to 16-bit, 32-bit and 64-bit data.

You can test for this kind of error without recompiling, using the pseudo-file /proc/cpu/alignment. Catting it tells you whether misaligned user accesses are detected and fixed up or not (the default setting is usually no, 0).

Another mode is "fixup", in which the kernel will quietly perform correct unaligned access for user processes.

https://wiki.debian.org/ArmEabiFixes#word_accesses_must_be_aligned_to_a_multiple_of_their_size

Maybe running a test program on Debian would confirm or deny the default behavior, something like:
https://github.com/gavingolden/Unaligned-Memory-Access

But I will drop this now unless I come up with any other info. I think I have a way ahead for Alpine and one of the developers for the application I was having issues with has just committed another code fix for it so hopefully that should be good now anyway.

Thanks for the help!
Re: Debian on Dell Kace M300
March 22, 2020 07:09PM
Hello again, I wanted to follow up on this thread with a couple of things related to the M300 (still with Alpine Linux).

First is, I started having some issues with network transfers of larger files being corrupt (md5sums not matching, crc errors when extracting). The test file I was using was around 1.6GB, but I also had corruption in a 200MB file. I tried a bunch of different things but I'm pretty sure that the problem was using ext4 for the root filesystem. I'm using two partitions, a small ext2 /boot and I was using ext4 on / (I had xfs on root on another m300 but had switched it to ext4).
This is an example of what I was seeing:
abuilder:~$ ls
abuilder-sodface-home.tar.gz
abuilder:~$ rm abuilder-sodface-home.tar.gz
abuilder:~$ scp sodface@10.0.0.11:abuilder-sodface-home.tar.gz .
sodface@10.0.0.11's password:
abuilder-sodface-home.tar.gz                  100% 1671MB   9.5MB/s   02:55
abuilder:~$ md5sum abuilder-sodface-home.tar.gz
4f3641e12692c6a1ae74a6b314fd27b0  abuilder-sodface-home.tar.gz
abuilder:~$ tar tvf abuilder-sodface-home.tar.gz | wc -l
gzip: crc error
tar: Child returned status 1
tar: Error is not recoverable: exiting now
44546
abuilder:~$ ls
Illegal instruction
abuilder:~$ ls
Illegal instruction
abuilder:~$ sudo reboot
Illegal instruction

Long story short, I switched back to xfs on root and all is working well again so far. Filesystems were created with no arguments (eg. mkfs.ext4 /dev/sda2) so maybe there's some user error involved here but I was using ext4 for weeks on one of the m300's before running into the problem once I started handling some larger files.

The second thing is I'm wondering whether bodhi's patch to disable tso in mv643xx_eth.c is still necessary?
See:
https://lore.kernel.org/patchwork/patch/639426/

I thought the file corruption might have been nic related so as part of my troubleshooting I recompiled the kernel without applying the patch for mv643xx_eth.c. It didn't fix the corruption problem (I saw the same thing with and without the patch) but I did see a small increase in network performance with the unpatched driver. I'm currently running 5.5.11 with the unpatched nic driver and xfs on /. Again, so far at least, everything looks ok.
Re: Debian on Dell Kace M300
March 22, 2020 08:31PM
sodface,

> First is, I started having some issues with
> network transfers of larger files being corrupt
> (md5sums not matching, crc errors when
> extracting). The test file I was using was around
> 1.6GB, but I also had corruption in a 200MB file.
> I tried a bunch of different things but I'm pretty
> sure that the problem was using ext4 for the root
> filesystem.

It is troublesome sometime. Once in a while there is Ext4 regression that get fixed. So I don't use Ext4 for rootfs, only use it for data partitions.


> The second thing is I'm wondering whether bodhi's
> patch to disable tso in mv643xx_eth.c is still
> necessary?
> See:
> https://lore.kernel.org/patchwork/patch/639426/

It is no longer necessary. But since it's easy to turn on TSO with ethtool and the performance gain is not that significant, I don't see urgent need to remove it yet. Definitely, I will remove it sometime in future.

Thanks for reportting your findings!

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: Debian on Dell Kace M300
March 24, 2020 05:31PM
All,

I'm building new u-boot for this box. So far I have an initial version. This version can not be used yet. Only network is working. No USB or SATA activation, since I have not figured out which GPIO turn on power for these.

This would serve as a rescue u-boot in that can be used with kwboot to start the box.

I thought it would be of interest if you have serial console and want to see if you can rescue the box should it be needed.

The command to run kwboot:
kwboot -t -B 115200 /dev/ttyUSB0 -b uboot.2017.07-tld-2.m300.mtd0.kwb  -p

It is best that you interrupt serial console. Look around for fun, perhaps some profit, and do a reset to get back to stock u-boot to start booting:

reset

Note: don't do any saveenv, resetenv, while you're at this u-boot prompt. Be careful.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)



Edited 2 time(s). Last edit at 03/24/2020 05:34PM by bodhi.
Attachments:
open | download - uboot.2017.07-tld-2.m300.mtd0.kwb (384 KB)
Re: Debian on Dell Kace M300
March 24, 2020 06:29PM
Cool! Bodhi you are the best! Unfortunately, I'll be away from my M300 for a few weeks. :-( Maybe I'll pack one if I have space.

-JT
Re: Debian on Dell Kace M300
March 26, 2020 09:57PM
OK here is more info.

I've tried to poke every single GPIO to find one that would turn on USB or HDD. But they apparently failed to. And then I clear my mind and thought about it some more.

It seems that the GPIOs are Active Low (the GPIO for LEDs are that way). Meaning the bit must be clear to activate :) So back to the drawing board to use another approach in testing.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: Debian on Dell Kace M300
April 10, 2020 07:57AM
I'll see if I can make this clear and concise. I'm still having an issue with data integrity running on Alpine.

Basically:

- Boot off internal sata, login and immediately scp a ~500MB tar.gz file.
- md5sum the file, it's correct.
- scp the same file again to a new local name so I have two local copies of the same file.
- md5sum the second file, it's correct.
- cmp the two files, cmp runs for a bit and then says there's a difference at char x, line x
- md5sum files again and now they are incorrect.

I've also seen it where I've got two known identical files on disk and on a fresh boot, before the files have been read once, I can md5sum them and they are correct. I can then run md5sum again on both of them and one or both of them will change to a different checksum.

Filesystem type doesn't actually seem to matter - I noted above that I thought xfs was working, and it does seem to take a bit longer to trigger an error but I can make it happen fairly quickly by doing enough md5sum, scp, cmp operations.

I've tried multiple kernels in the 5.4.x and 5.5.x line.

It seemed like it was something to do with the way the kernel was buffering the files after the initial read from disk, that is, reading from disk was ok but a read from the buffer could be incorrect.

Except... I can pull the exact same disk off the internal sata, connect it to a USB to SATA adapater, boot from USB, run all the same tests and everything is fine, I can't make it break. Again tried with multiple filesystem types and they all work ok when booted from USB.

I thought maybe that pointed more to the sata_mv driver or something but I don't get why the first read is ok, before any kernel file buffering occurs, and then subsequent reads can give different results?? So I started to suspect something with the file buffering or bad RAM but then, as I said, the same disk booted from USB works fine, which should be the exact same buffering mechanisms right?

Unrelated note, there seems to be a kernel file size limit, at least with the stock u-boot env settings. As I was trying different kernels, the sizes were growing as I added in more non-module filesystem support. I had one kernel around 5.1 MB that caused the M300 to freeze right after detecting the storage device and attempting to read the uImage. I truncated the kernel uImage file to match the file size of another image I knew to work and then u-boot would read it - it wouldn't boot of course since I'd truncated it but it did seem to prove that file size was the issue.



Edited 6 time(s). Last edit at 04/10/2020 08:03AM by sodface.
Re: Debian on Dell Kace M300
April 10, 2020 03:44PM
I decided to test with bodhi's 5.5.1 kernel and 5.2.9 rootfs.

I wanted to configure it closely to what I've been doing with Alpine, using a small ext2 partition for /boot and a second partition for /.

I'm not changing any of the default M300 uboot environment variables so I've been setting the root=/dev/sda2 in the .dts file and then changing the kernel config to "Extend with bootloader kernel arguments" so the kernel command line options are appended to the uboot ones rather than being replaced by them.

So for bodhi's kernel, I got the zImage file, appended my m300 dtb file and then created the uImage file. Since bodhi's kernel config ignores the root=/dev/sda2 set in the appended .dtb, I have to stop uboot and set it:

set bootargs_console 'console=ttyS0,115200 mtdparts=spi_flash:0x7f000@0(uboot),0x1000@0x7f000(u-boot-env) earlyprintk root=/dev/sda2'

So I've got uImage and uInitrd files on a small ext2 partition and the debian rootfs on ext4, mainly because that was the most problematic filesystem on Alpine, I wanted to try worst case first.

I extracted the modules out of the .deb file and added them to the rootfs.

My uInitrd file is an empty archive, just to give uboot something to load and not error out. The kernel just says it's not a valid initrd and presses on and finds the root filesystem based on the command line setting.

Boot seems fine.

root@debian:~# uname -a
Linux debian 5.5.1-kirkwood-tld-1 #1.0 PREEMPT Sat Feb 1 22:28:36 PST 2020 armv5tel GNU/Linux

root@debian:~# cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/";
SUPPORT_URL="https://www.debian.org/support";
BUG_REPORT_URL="https://bugs.debian.org/";

It's definitely a lot more stable. Initially I thought I wasn't going to be able to duplicate the problems I'm seeing on Alpine, but, eventually I did. Transfer via scp the 500MB file, md5sum, transfer again to a new filename, md5sum, md5sum them all at the same time, delete a couple, re-transfer them, md5sum etc.

Things were looking good after multiple transfers:
root@debian:~# md5sum m300-kernel-5.4.28-build-dir*
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir2.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir3.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir4.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir5.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir6.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir7.tar.gz

Then I deleted file #7:

root@debian:~# rm m300-kernel-5.4.28-build-dir7.tar.gz

root@debian:~# ls -alh
total 3.5G
drwx------  5 root root 4.0K Apr 10 13:13 .
drwxr-xr-x 22 root root 4.0K Apr 10 11:41 ..
-rw-------  1 root root 2.6K Apr 10 11:58 .bash_history
-rw-r--r--  1 root root  570 Jan 31  2010 .bashrc
drwx------  3 root root 4.0K Apr 10 08:18 .config
drwxr-xr-x  2 root root 4.0K Aug 24  2019 .nano
-rw-r--r--  1 root root  481 Jul 20  2017 .profile
drwx------  2 root root 4.0K Apr 10 09:35 .ssh
-rw-r--r--  1 root root 592M Apr 10 09:36 m300-kernel-5.4.28-build-dir.tar.gz
-rw-r--r--  1 root root 592M Apr 10 10:04 m300-kernel-5.4.28-build-dir2.tar.gz
-rw-r--r--  1 root root 592M Apr 10 12:02 m300-kernel-5.4.28-build-dir3.tar.gz
-rw-r--r--  1 root root 592M Apr 10 12:05 m300-kernel-5.4.28-build-dir4.tar.gz
-rw-r--r--  1 root root 592M Apr 10 12:15 m300-kernel-5.4.28-build-dir5.tar.gz
-rw-r--r--  1 root root 592M Apr 10 12:26 m300-kernel-5.4.28-build-dir6.tar.gz

And scp'd again and then md5sum'd them all, uh-oh:

root@debian:~# md5sum m300-kernel-5.4.28-build-dir*
e739deca3e8ff148c95d7dab917a7b48  m300-kernel-5.4.28-build-dir.tar.gz
064411bf30702de3c170ecbea247d598  m300-kernel-5.4.28-build-dir2.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir3.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir4.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir5.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir6.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir7.tar.gz

The files on disk haven't actually been corrupted though. I can reboot (or clear the page cache), run md5sum again and they all match.

At one point I was able to get rm to crash the system:

root@debian:~# rm m300-kernel-5.4.28-build-dir3.tar.gz 
[ 4615.695975][ T1627] BUG: Bad page state in process rm  pfn:2b14d
[ 4615.702020][ T1627] page:ef55b9a0 refcount:0 mapcount:-2 mapping:00000000 index:0x1
[ 4615.709712][ T1627] raw: 00000000 00000100 00000122 00000000 00000001 00000000 fffffffd 00000000
[ 4615.718529][ T1627] page dumped because: nonzero mapcount
[ 4615.723944][ T1627] Modules linked in: ipv6 nf_defrag_ipv6 sg marvell_cesa orion_wdt kirkwood_thermal uio_pdrv_genirq uio
[ 4615.734962][ T1627] CPU: 0 PID: 1627 Comm: rm Not tainted 5.5.1-kirkwood-tld-1 #1.0
[ 4615.742650][ T1627] Hardware name: Marvell Kirkwood (Flattened Device Tree)
--- snip ---

Not really sure where to go from here. Is it something I'm doing wrong or is there actually a problem?



Edited 4 time(s). Last edit at 04/10/2020 03:55PM by sodface.
Re: Debian on Dell Kace M300
April 10, 2020 05:25PM
sodface,

> something I'm doing wrong or is there actually a
> problem?

Try Ext3 with the same md5 test, see what'll happen.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: Debian on Dell Kace M300
April 11, 2020 08:39AM
bodhi Wrote:
> Try Ext3 with the same md5 test, see what'll
> happen.

I switched over to ext3 for the rootfs and wrote a little test script to scp the same file from the server 10 times, each time changing the local file name to get a new local copy and also running md5sum against all files before starting the next transfer.

That completed successfully last night. I left the M300 up and idle over night and then started the test script again this morning. I did not delete the existing files but let the script just replace them as it went.

So the first block of md5sums is the result of the final test loop from last night, then you see me start the script and the second block of md5sums is loops 1-3 of the new test run:

m300-kernel-5.4.28-build-dir.tar.gz           100%  592MB  10.5MB/s   00:56    
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir1.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir10.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir2.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir3.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir4.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir5.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir6.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir7.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir8.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir9.tar.gz

root@debian:~# ./test.sh 
m300-kernel-5.4.28-build-dir.tar.gz           100%  592MB  11.3MB/s   00:52    
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir1.tar.gz
93e8bb8048807f11be9aee66ff355f72  m300-kernel-5.4.28-build-dir10.tar.gz
250c148216d7b57df3791a1c3aced746  m300-kernel-5.4.28-build-dir2.tar.gz
9fd5871b67477da8b2341667eea24c9c  m300-kernel-5.4.28-build-dir3.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir4.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir5.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir6.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir7.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir8.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir9.tar.gz
m300-kernel-5.4.28-build-dir.tar.gz           100%  592MB  11.3MB/s   00:52    
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir1.tar.gz
93e8bb8048807f11be9aee66ff355f72  m300-kernel-5.4.28-build-dir10.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir2.tar.gz
9fd5871b67477da8b2341667eea24c9c  m300-kernel-5.4.28-build-dir3.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir4.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir5.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir6.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir7.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir8.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir9.tar.gz
m300-kernel-5.4.28-build-dir.tar.gz           100%  592MB  11.7MB/s   00:50    
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir1.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir10.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir2.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir3.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir4.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir5.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir6.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir7.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir8.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir9.tar.gz

root@debian:~# uptime
 06:40:14 up 13:04,  1 user,  load average: 0.00, 0.00, 0.24

You can see though, that the md5sums for those three files are only incorrect for two loops of the script and then they are correct again and remained correct from then on. Files "dir1", "dir2", and "dir3" were newly downloaded after three loops but "dir10", whose md5sum is also showing as incorrect in the first loops, is again correct by loop 3 even though it obviously won't be re-downloaded until the last loop of the script.



Edited 4 time(s). Last edit at 04/11/2020 08:49AM by sodface.
Re: Debian on Dell Kace M300
April 11, 2020 04:34PM
sodface,

Do you do sync each time, and how long you wait in between copying?

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: Debian on Dell Kace M300
April 11, 2020 05:10PM
bodhi Wrote:
> Do you do sync each time, and how long you wait
> in between copying?

No sync. The md5sum takes ~23 seconds to complete for each file, though if I run it again immediately on the same file it takes ~5 seconds so somewhere between 5-23 seconds minimum between transfers though really it's a lot more than that because the script md5sums all the files each time and I haven't been deleting them so it's probably more like, transfer, 3-4 minutes of md5sum and then the next transfer.

I have the same drive booted from my USB adapter and have run the script several times already without a hiccup. I'll leave it up overnight again and try it in the morning. I noticed that with Alpine too, things would be fine, it would sit idle overnight and I'd login to do something and have some weird issue that would force me to reboot.



Edited 2 time(s). Last edit at 04/11/2020 06:27PM by sodface.
Re: Debian on Dell Kace M300
April 12, 2020 11:51AM
Left the hard drive connected up to USB over night. I've run the test again a couple of times today, no issues. I have yet to duplicate the problem when booted from USB.
Re: Debian on Dell Kace M300
April 12, 2020 05:18PM
sodface,

If you could, use different way to copy, such as rsync.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: Debian on Dell Kace M300
April 12, 2020 10:52PM
sodface Wrote:
-------------------------------------------------------
> Left the hard drive connected up to USB over
> night. I've run the test again a couple of times
> today, no issues. I have yet to duplicate the
> problem when booted from USB.

This looks more and more to me like something weird happens with the RAM when booting from other than a USB drive. Very strange to be sure.
Re: Debian on Dell Kace M300
April 12, 2020 11:42PM
> This looks more and more to me like something
> weird happens with the RAM when booting from other
> than a USB drive. Very strange to be sure.

Could be.

I think for the time being. If you boot from SATA, you should have an USB drive permanently plugged in, to be safe.

For investigation, without the USB drive plugged in, I would look at the power source for SATA port to see if it has enough voltage and amp.

Kernel wise, I could make sure the power is correct with regulators. But I don't know the GPIO that used in this box to trigger that. The problem with no GPL source code is that we have to brute force (poking GPIO in u-boot, which I've done one go round without success, many more needed), or to take a magnified pic of the board and trace the logics, to find it.

I thought I'll have more free time in self-isolation, but it turned out I have to do even more, working from home :)) I'll get back to this subject in about a week, I hope.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)



Edited 1 time(s). Last edit at 04/12/2020 11:44PM by bodhi.
Re: Debian on Dell Kace M300
April 13, 2020 08:43AM
I concur with ensuring a USB is plugged in for boot from SATA. The RAM suggestions seems interesting as the there was no consistent error in the boot failures I had with the original 16GB SATA flash card with latest Debian file system and kernel. Assuming that the stock filesystem and kernel was reliable, would there be any way to "dissect" the original kernel to see how it managed the SATA? Or was something different in the old kernels relative to this issue? I blew away both of my original systems, so can't be of any help.
Re: Debian on Dell Kace M300
April 13, 2020 09:55AM
>> Assuming that the stock filesystem and kernel was reliable

MIke, I do have the old load still and did some brief testing with it and did _not_ see the issue though scp speed was much worse, I was only seeing ~6MB/s with the stock kernel/rootfs vice ~12MB/s with the newer loads, I don't think this is relevant to the issue, just a comment.

I didn't do extensive testing with the factory load though, just a few runs of the script and that's it. I didn't leave it up overnight and run it again in the morning, I might still try that.

I sure hope it's not something stupid I'm doing, I apologize in advance if this end up being a wild goose chase. I already discovered that I was using 0x10008000 for mkimage to make the uImage instead of 0x00008000. Both seem to work and I'm not real sure what issues that could cause. I remade the uImage with 0x00008000 and tested again and still had the problem when booted from sata.
Re: Debian on Dell Kace M300
April 13, 2020 12:13PM
Sodface,

The last M300 I bought a month ago has been a basket case as far as reliability goes. Using the same methods / procedures I used on the other 4 I have, this one is unreliable for reasons I cannot understand. And it doesn't seem to matter whether I boot from USB or SATA, same kernel panics either while booting or after completing booting.

I'm convinced some of these have hardware issues which cause them to be unreliable and that sucks because you don't find out until after you buy it and try to load Debian.

So in other words, you might have one that doesn't work correctly no matter what you do.

LME
Re: Debian on Dell Kace M300
April 13, 2020 02:55PM
Thanks for all the inputs gents. I'll do some more testing and update the thread. @LME, I have five M300's also. One I'm using as my main firewall/router running shorewall on Alpine connected to a managed switch (router on a stick type setup with the single M300 interface). It's been working great with the stock 16GB SSD:

router:~$ uptime
 15:42:46 up 21 days, 19:05,  load average: 0.04, 0.01, 0.00

Other than logging though, this one shouldn't really be writing to local storage much.

M300 #2 is the one I posted a pic of which I was going to use for LMS / file server duties. It's been up but I haven't really been using it until I get the issues sorted out. I haven't been taking great notes but I don't think I've really seen any issues with that box. I'll run some more tests on it.

M300 #3 was the box I first started having problems with as I was using it the most, compiling packages for Alpine. It should be up right now but I can't get into it, which is one of the problems I've seen with it - all of a sudden I can't ssh to it and I have to cycle power on it, which I just did and now I can log in again.

M300 #4 is the one I'm currently doing all the testing with and I only pulled it out of the box because I was having issues with #3.

M300 #5 I pulled out yesterday so I could back up the stock factory OS load and do some testing with. I'm going to take my test hard drive over to that one and re-run all the tests.

Could be that #3 and #4 have hardware problems and assuming it's a software issue because I see it on more than one box is a faulty assumption.



Edited 1 time(s). Last edit at 04/13/2020 02:57PM by sodface.
Re: Debian on Dell Kace M300
April 13, 2020 03:16PM
I would assume Kace used different RAM chips in the board over the course of manufacturing. Obviously they would vary in specification depending on sourcing. Maybe they ended up with some that were out of spec enough to necessitate lowered performance settings in the OS that of course we would never know about. The other core chips on the motherboard would be all Marvell so I would suspect them less to be out of spec.
Re: Debian on Dell Kace M300
April 13, 2020 03:57PM
Well, I moved my test spinning hard disk over to M300 #5 and ran the test script before going out to walk the dog. Just got back and reviewed the results. Looked ok. Started the script again, below is what happened, md5sums at the beginning are the end of the first script run:

m300-kernel-5.4.28-build-dir.tar.gz           100%  592MB  11.1MB/s   00:53    
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir1.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir10.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir2.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir3.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir4.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir5.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir6.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir7.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir8.tar.gz
617c938c2053d6ebfac73397b729896b  m300-kernel-5.4.28-build-dir9.tar.gz
root@debian:~# free
              total        used        free      shared  buff/cache   available
Mem:        1803460       16596        9404         272     1777460     1763292
Swap:       1048572           0     1048572
root@debian:~# ./test.sh 
m300-kernel-5.4.28-build-dir.tar.gz            11%   71MB  11.5MB/s   00:45 ETA[ 2551.413646][  T112] BUG: Bad page state in process kswapd0  pfn:656a9
[ 2551.420125][  T112] page:efca6520 refcount:0 mapcount:-8192 mapping:00000000 index:0x1
[ 2551.428079][  T112] raw: 00000000 00000100 00000122 00000000 00000001 00000000 ffffdfff 00000000
[ 2551.436897][  T112] page dumped because: nonzero mapcount
[ 2551.442311][  T112] Modules linked in: ipv6 nf_defrag_ipv6 sg marvell_cesa orion_wdt kirkwood_thermal uio_pdrv_genirq uio
[ 2551.453330][  T112] CPU: 0 PID: 112 Comm: kswapd0 Not tainted 5.5.1-kirkwood-tld-1 #1.0
[ 2551.461365][  T112] Hardware name: Marvell Kirkwood (Flattened Device Tree)
[ 2551.468369][  T112] [<8010f668>] (unwind_backtrace) from [<8010b9f4>] (show_stack+0x10/0x14)
[ 2551.476853][  T112] [<8010b9f4>] (show_stack) from [<80241b50>] (bad_page+0x100/0x138)
[ 2551.484812][  T112] [<80241b50>] (bad_page) from [<80241bec>] (free_pages_check+0x64/0x88)
[ 2551.493113][  T112] [<80241bec>] (free_pages_check) from [<80242ae4>] (free_pcppages_bulk+0x130/0x284)
[ 2551.502462][  T112] [<80242ae4>] (free_pcppages_bulk) from [<80243e14>] (free_unref_page_list+0x12c/0x184)
[ 2551.512158][  T112] [<80243e14>] (free_unref_page_list) from [<80218c98>] (shrink_page_list+0x88/0xac4)
[ 2551.521591][  T112] [<80218c98>] (shrink_page_list) from [<80219d7c>] (shrink_inactive_list+0x1d0/0x3d4)
[ 2551.531113][  T112] [<80219d7c>] (shrink_inactive_list) from [<8021aae8>] (shrink_node+0x7b8/0x8b4)
[ 2551.540199][  T112] [<8021aae8>] (shrink_node) from [<8021b5b8>] (kswapd+0x40c/0x73c)
[ 2551.548065][  T112] [<8021b5b8>] (kswapd) from [<801371bc>] (kthread+0x100/0x10c)
[ 2551.555582][  T112] [<801371bc>] (kthread) from [<801010e0>] (ret_from_fork+0x14/0x34)
[ 2551.563530][  T112] Exception stack(0xeeec7fb0 to 0xeeec7ff8)
[ 2551.569295][  T112] 7fa0:                                     00000000 00000000 00000000 00000000
[ 2551.578208][  T112] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 2551.587119][  T112] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 2551.594459][  T112] Disabling lock debugging due to kernel taint
Author:

Subject:


Spam prevention:
Please, enter the code that you see below in the input field. This is for blocking bots that try to post this form automatically. If the code is hard to read, then just try to guess it right. If you enter the wrong code, a new image is created and you get another chance to enter it right.
Message: