Welcome! Log In Create A New Profile

Advanced

Debian on Dell Kace M300

Posted by JDS420 
Re: Debian on Dell Kace M300
February 20, 2020 07:58AM
bodhi Wrote:
-------------------------------------------------------
>We welcome any Linux
> distro users.

Thanks bodhi!

> I did not recall Mike has "bus error" ?

He didn't, but his /proc/cpu/alignment output from his m300 running LMS listed 3 alignment faults - no idea really how to interpret what caused them and as these are in the "system" category vs. the "user" category I'm guessing that these are corrected automatically by the kernel and the entries are just informational - as opposed to the user faults where you can control how faults are handled by echoing the various numeric options to /proc/cpu/alignment. ?

For the "user" category faults that I'm seeing on Alpine, I agree with you, these are software related and specific to applications which aren't coded to always perform aligned memory access (again, not sure if this the 100% accurate way to word it but you get my meaning).

What continues to nag me though is that it seems based on Mike's output that the debian image you guys are using defaults /proc/cpu/alignment to 0 for user faults (like my Alpine system is). I'm pretty sure then that you guys would also get alignment faults and strange application behavior depending on the software you might choose to use. That Mike has 0 user errors in his output could just be luck so far in other words. If he installed something new in the future, he could have issues.

I was trying to figure out if debian has been defaulting to doing something differently than Alpine which would explain why you guys haven't had issues with this before. I did find this old post:

https://lists.debian.org/debian-arm/2011/06/msg00072.html

One part of that post states:
"The default /proc/cpu/alignment mode seems to be 2 (fixup), on v6/v7,
priovided that the v6 unaligned access model (CR_U) is supported by the CPU:"

Which I think is related to the kernel, and not debian specifically. The cpu on the m300 is v5 right so might explain why it defaults to 0 (ignore) as seen in Mike's output? It was suggested in the top part of that post to add an alignment= kernel option to perform user fixup, so maybe that is one option to consider.

There are a few small programs on github to induce/test memory alignment faults so someone could pick one and test with it and we could compare results. I'm planning on doing it for Alpine but I'll be compiling for musl libc so I don't believe the resulting binary will run on your glibc debian builds.



Edited 3 time(s). Last edit at 02/20/2020 12:00PM by sodface.
Re: Debian on Dell Kace M300
February 20, 2020 05:05PM
Quote

He didn't, but his /proc/cpu/alignment output from his m300 running LMS listed 3 alignment faults - no idea really how to interpret what caused them and as these are in the "system" category vs. the "user" category I'm guessing that these are corrected automatically by the kernel and the entries are just informational - as opposed to the user faults where you can control how faults are handled by echoing the various numeric options to /proc/cpu/alignment. ?

True. I don't think the kernel has anything to do with it in this case. Some applications have bugs regarding alignment. It is best that they fix the code. We realy should not mess with system settings. As long as the kernel was built correctly (GCC also has type of bug before, but it got fixed quickly), and the rootfs architecture is correct, no need to do anything.

-bodhi
===========================
Forum Wiki
bodhi's corner
Re: Debian on Dell Kace M300
February 20, 2020 05:57PM
Out of curiosity, I booted back up the factory Dell M300 image and checked the alignment setting:

root@dellkaceM300:~# cat /proc/cpu/alignment 
User:		0
System:		0
Skipped:	0
Half:		0
Word:		0
DWord:		0
Multi:		0
User faults:	2 (fixup)

So that's kind of interesting. It's not being set via the cmdline:

root@dellkaceM300:~# cat /proc/cmdline 
console=ttyS0,115200 mtdparts=spi_flash:0x7f000@0(uboot),0x1000@0x7f000(u-boot-env)

I thought it was being set in the kernel config:

root@dellkaceM300:/# grep -i alignment /boot/config-2.6.32-5-kirkwood 
CONFIG_ALIGNMENT_TRAP=y

but I see that this setting is the same in your kernel config, though Mike's output shows /proc/cpu/alignment set to 0 (ignore) for user faults.

So how is it being set to 2 on the factory load? Odds are that had my Alpine load defaulted to 2 somehow, I would have never had any issues with apk and been blissfully ignorant that all those alignment faults were happening behind the scenes. Not sure what's better. Would have saved me many hours of head scratching.

//edit, just verified that adding alignment=2 to the kernel command line options does set /proc/cpu/alignment to 2 (fixup)

earlyprintk root=/dev/sda2 alignment=2 console=ttyS0,115200 mtdparts=spi_flash:0x7f000@0(uboot),0x1000@0x7f000(u-boot-env)
localhost:~# cat /proc/cpu/alignment 
User:		0
System:		0 (0x0)
Skipped:	0
Half:		0
Word:		0
DWord:		0
Multi:		0
User faults:	2 (fixup)



Edited 4 time(s). Last edit at 02/20/2020 06:51PM by sodface.
Re: Debian on Dell Kace M300
February 20, 2020 07:14PM
sodface,

2.6.32-5-kirkwood is a very old kernel. And being a stock kernel, It could have settings enabled in some patched code, too. I don't think it should be compared to modern kernel.

The GCC and the kernel have evolved so much from that day to the present.

-bodhi
===========================
Forum Wiki
bodhi's corner
Re: Debian on Dell Kace M300
February 20, 2020 07:37PM
bodhi, while I agree with your statements, I think that the following is also true (even today) as far as the alignment setting goes:

- The debian config you guys are currently using sets /proc/cpu/alignment to User faults: 0 (ignore)
- Due to this setting, applications that perform unaligned memory operations may exhibit unexpected behavior because the kernel will not correct it and the program will continue running with unpredictable and undesirable results
- Setting /proc/cpu/alignment to User faults: 2 (fixup) will allow the same applications to operate normally because the kernel will correct the memory accesses, at the cost of extra cpu cycles and therefore degraded performance

I decided to check the setting on one of my Raspberry Pi Zero W's that runs the official armhf release of Alpine Linux and it was set to 2 (fixup).

I think I will probably go with the 2 (fixup) setting going forward, just to be safe.



Edited 3 time(s). Last edit at 02/20/2020 07:58PM by sodface.
Re: Debian on Dell Kace M300
February 20, 2020 09:45PM
sodface,

> I think I will probably go with the 2 (fixup)
> setting going forward, just to be safe.

That would be a good work around for bad behavior applications!

However, I think Debian is more correct in that such applications bugs should be fixed. Unless you can make it apparent so that users can submit a bug report, it is like sweeping dirts under the rug :) Besides, there is no guarantee that the kernel could fix it up successfully. Some of the out of alignment problem could also cause unpredictable behavior. I mean... pick your your poison :)

In real world mission critical applications, these problems must be fixed.

-bodhi
===========================
Forum Wiki
bodhi's corner



Edited 1 time(s). Last edit at 02/20/2020 09:46PM by bodhi.
Re: Debian on Dell Kace M300
February 20, 2020 10:41PM
bodhi Wrote:
-------------------------------------------------------
> However, I think Debian is more correct in that
> such applications bugs should be fixed. Unless you
> can make it apparent so that users can submit a
> bug report, it is like sweeping dirts under the
> rug :) Besides, there is no guarantee that the
> kernel could fix it up successfully. Some of the
> out of alignment problem could also cause
> unpredictable behavior. I mean... pick your your
> poison :)
>
> In real world mission critical applications, these
> problems must be fixed.

Agree, except when User faults is set to 0 (ignore) the misbehaving application doesn't necessarily crash or exit but continues to run with unpredictable results!

To detect faulty applications, it should be set to 4 or 5 so the process exits with Bus error and doesn't continue to run. 0 is probably the worst setting, which is what you have now!
Re: Debian on Dell Kace M300
February 21, 2020 01:34AM
> Agree, except when User faults is set to 0
> (ignore) the misbehaving application doesn't
> necessarily crash or exit but continues to run
> with unpredictable results!

That's not the behavior in Debian as I known of. But I guess it's all academic. Glad you got it figured out what to do in Alpine!

-bodhi
===========================
Forum Wiki
bodhi's corner
Re: Debian on Dell Kace M300
February 21, 2020 06:30AM
bodhi Wrote:
-------------------------------------------------------
> That's not the behavior in Debian as I known of.
> But I guess it's all academic. Glad you got it
> figured out what to do in Alpine!
I'm still trying to figure out what the default behavior in Debian is supposed to be, based on Mike's output it looked like it default to 0 (ignore). If that's the only place where that setting is set and checked.

The Debian wiki page that I linked to earlier, last updated in 2017 (though wikis are hard to keep everything current), says:

Quote

word accesses must be aligned to a multiple of their size

Accesses from non-aligned locations give garbled results

For example, loading a 32-bit word from a non-aligned pointer reads an aligned 32-bit word from the next lowest 32-bit-aligned location (ignoring the lower 2 bits of the pointer) and then rotates the result so that the byte indicated by the pointer ends up in the least significant byte.

This applies to 16-bit, 32-bit and 64-bit data.

You can test for this kind of error without recompiling, using the pseudo-file /proc/cpu/alignment. Catting it tells you whether misaligned user accesses are detected and fixed up or not (the default setting is usually no, 0).

Another mode is "fixup", in which the kernel will quietly perform correct unaligned access for user processes.

https://wiki.debian.org/ArmEabiFixes#word_accesses_must_be_aligned_to_a_multiple_of_their_size

Maybe running a test program on Debian would confirm or deny the default behavior, something like:
https://github.com/gavingolden/Unaligned-Memory-Access

But I will drop this now unless I come up with any other info. I think I have a way ahead for Alpine and one of the developers for the application I was having issues with has just committed another code fix for it so hopefully that should be good now anyway.

Thanks for the help!
Re: Debian on Dell Kace M300
March 22, 2020 07:09PM
Hello again, I wanted to follow up on this thread with a couple of things related to the M300 (still with Alpine Linux).

First is, I started having some issues with network transfers of larger files being corrupt (md5sums not matching, crc errors when extracting). The test file I was using was around 1.6GB, but I also had corruption in a 200MB file. I tried a bunch of different things but I'm pretty sure that the problem was using ext4 for the root filesystem. I'm using two partitions, a small ext2 /boot and I was using ext4 on / (I had xfs on root on another m300 but had switched it to ext4).
This is an example of what I was seeing:
abuilder:~$ ls
abuilder-sodface-home.tar.gz
abuilder:~$ rm abuilder-sodface-home.tar.gz
abuilder:~$ scp sodface@10.0.0.11:abuilder-sodface-home.tar.gz .
sodface@10.0.0.11's password:
abuilder-sodface-home.tar.gz                  100% 1671MB   9.5MB/s   02:55
abuilder:~$ md5sum abuilder-sodface-home.tar.gz
4f3641e12692c6a1ae74a6b314fd27b0  abuilder-sodface-home.tar.gz
abuilder:~$ tar tvf abuilder-sodface-home.tar.gz | wc -l
gzip: crc error
tar: Child returned status 1
tar: Error is not recoverable: exiting now
44546
abuilder:~$ ls
Illegal instruction
abuilder:~$ ls
Illegal instruction
abuilder:~$ sudo reboot
Illegal instruction

Long story short, I switched back to xfs on root and all is working well again so far. Filesystems were created with no arguments (eg. mkfs.ext4 /dev/sda2) so maybe there's some user error involved here but I was using ext4 for weeks on one of the m300's before running into the problem once I started handling some larger files.

The second thing is I'm wondering whether bodhi's patch to disable tso in mv643xx_eth.c is still necessary?
See:
https://lore.kernel.org/patchwork/patch/639426/

I thought the file corruption might have been nic related so as part of my troubleshooting I recompiled the kernel without applying the patch for mv643xx_eth.c. It didn't fix the corruption problem (I saw the same thing with and without the patch) but I did see a small increase in network performance with the unpatched driver. I'm currently running 5.5.11 with the unpatched nic driver and xfs on /. Again, so far at least, everything looks ok.
Re: Debian on Dell Kace M300
March 22, 2020 08:31PM
sodface,

> First is, I started having some issues with
> network transfers of larger files being corrupt
> (md5sums not matching, crc errors when
> extracting). The test file I was using was around
> 1.6GB, but I also had corruption in a 200MB file.
> I tried a bunch of different things but I'm pretty
> sure that the problem was using ext4 for the root
> filesystem.

It is troublesome sometime. Once in a while there is Ext4 regression that get fixed. So I don't use Ext4 for rootfs, only use it for data partitions.


> The second thing is I'm wondering whether bodhi's
> patch to disable tso in mv643xx_eth.c is still
> necessary?
> See:
> https://lore.kernel.org/patchwork/patch/639426/

It is no longer necessary. But since it's easy to turn on TSO with ethtool and the performance gain is not that significant, I don't see urgent need to remove it yet. Definitely, I will remove it sometime in future.

Thanks for reportting your findings!

-bodhi
===========================
Forum Wiki
bodhi's corner
Re: Debian on Dell Kace M300
March 24, 2020 05:31PM
All,

I'm building new u-boot for this box. So far I have an initial version. This version can not be used yet. Only network is working. No USB or SATA activation, since I have not figured out which GPIO turn on power for these.

This would serve as a rescue u-boot in that can be used with kwboot to start the box.

I thought it would be of interest if you have serial console and want to see if you can rescue the box should it be needed.

The command to run kwboot:
kwboot -t -B 115200 /dev/ttyUSB0 -b uboot.2017.07-tld-2.m300.mtd0.kwb  -p

It is best that you interrupt serial console. Look around for fun, perhaps some profit, and do a reset to get back to stock u-boot to start booting:

reset

Note: don't do any saveenv, resetenv, while you're at this u-boot prompt. Be careful.

-bodhi
===========================
Forum Wiki
bodhi's corner



Edited 2 time(s). Last edit at 03/24/2020 05:34PM by bodhi.
Attachments:
open | download - uboot.2017.07-tld-2.m300.mtd0.kwb (384 KB)
Re: Debian on Dell Kace M300
March 24, 2020 06:29PM
Cool! Bodhi you are the best! Unfortunately, I'll be away from my M300 for a few weeks. :-( Maybe I'll pack one if I have space.

-JT
Re: Debian on Dell Kace M300
March 26, 2020 09:57PM
OK here is more info.

I've tried to poke every single GPIO to find one that would turn on USB or HDD. But they apparently failed to. And then I clear my mind and thought about it some more.

It seems that the GPIOs are Active Low (the GPIO for LEDs are that way). Meaning the bit must be clear to activate :) So back to the drawing board to use another approach in testing.

-bodhi
===========================
Forum Wiki
bodhi's corner
Author:

Your Email:


Subject:


Spam prevention:
Please, enter the code that you see below in the input field. This is for blocking bots that try to post this form automatically. If the code is hard to read, then just try to guess it right. If you enter the wrong code, a new image is created and you get another chance to enter it right.
Message: