Welcome! Log In Create A New Profile

Advanced

Kernel 4.12.4 MVEBU armhf: ethernet unstable!

Posted by ptosch 
Kernel 4.12.4 MVEBU armhf: ethernet unstable!
September 06, 2017 03:45AM
Hello,

unfortunately ethernet does not work occasionally. It usually happens when large files are transferred. The nas326 box does not respond to any network activities. On the box itself everything seems to be fine, IP is still bound to the eth port, but no network response. If I do /etc/init.d/network restart, then something crashes:

[84945.616863] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[84947.288412] Alignment trap: not handling instruction e1932f9f at [<c0729abc>]
[84947.295584] Unhandled fault: alignment exception (0x001) at 0xd3794776
[84947.302157] pgd = c0004000
[84947.304874] [d3794776] *pgd=1361141e(bad)
[84947.308915] Internal error: : 1 [#1] PREEMPT SMP ARM
[84947.313901] Modules linked in: uio_pdrv_genirq uio sg uas
[84947.319334] CPU: 0 PID: 5690 Comm: kworker/0:1 Not tainted 4.12.4-mvebu-tld-1 #1
[84947.326760] Hardware name: Marvell Armada 380/385 (Device Tree)
[84947.332711] Workqueue: ipv6_addrconf addrconf_dad_work
[84947.337873] task: df50b2c0 task.stack: de77c000
[84947.342424] PC is at consume_skb+0x30/0x11c
[84947.346627] LR is at mvneta_txq_bufs_free+0xd4/0x138
[84947.351613] pc : [<c0729ac0>]    lr : [<c0631948>]    psr: a00e0013
[84947.351613] sp : de77dc70  ip : c0901a80  fp : deeb8580
[84947.363138] r10: 00000000  r9 : 00000000  r8 : d37946be
[84947.368384] r7 : 249ba050  r6 : 00000003  r5 : de4c9700  r4 : df4d65ec
[84947.374938] r3 : d3794776  r2 : d37946be  r1 : 00000001  r0 : d37946be
[84947.381493] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[84947.388658] Control: 10c5387d  Table: 1dff804a  DAC: 00000051
[84947.394428] Process kworker/0:1 (pid: 5690, stack limit = 0xde77c220)
[84947.400894] Stack: (0xde77dc70 to 0xde77e000)
[84947.405270] dc60:                                     df4d65ec de4c9700 00000003 c0631948
[84947.413484] dc80: ff7f5d08 dec16b00 00000000 00000005 c0a20558 deeb8000 000001dc de4c9700
[84947.421696] dca0: df4d65ec 00000005 de4c9700 000000b1 df4d6410 c0634d34 00000001 c062fef0
[84947.429910] dcc0: df4d65ec deeb8580 000000b1 de4c9740 00000040 de4c9000 00000046 ff7f5d08
[84947.438123] dce0: 0000001d ff7f5d00 000001dc df4d6410 df451e40 00000000 c0a20558 ff7f5d08
[84947.446337] dd00: dfbe0c80 c0c70c80 00000001 1ef70000 de77c000 0000000a 00000040 c073f178
[84947.454550] dd20: 00000000 00000000 0000012c 00812a59 de77dd30 de77dd30 de77dd38 de77dd38
[84947.462764] dd40: c0a5c96f de77c000 00000003 c0d0308c 00000008 04208060 00000101 0000000a
[84947.470978] dd60: de77dd68 c012aa5c df451e40 00000000 00812a59 00000004 c0d03d00 c0a95374
[84947.479191] dd80: 00000001 600e0013 00000200 c0d91440 de702600 00000000 0000001d c2525cb3
[84947.487405] dda0: 00001000 c012ad34 de77c000 c012ade0 00000000 dd465e00 c0d91440 c07eacc8
[84947.495619] ddc0: 00000044 c0c74830 01080020 01080020 00000044 c07485f8 000005dc c07fb314
[84947.503832] dde0: df451e40 00000001 df451e40 c0d91440 df4d33c0 deeb8000 df4d33c0 000000ff
[84947.512046] de00: 00001000 c07eeac4 de77de50 de77dec8 df4d33c0 df46fa38 c0da14f4 c07fea2c
[84947.520259] de20: 00000000 de77de50 df451e40 00000001 c0d91440 df618000 000e0013 c0805184
[84947.528472] de40: 776864b5 00000002 01080020 00000085 00000002 00000000 00000000 003a0000
[84947.536685] de60: 00000000 00000000 00000000 00000000 00000000 00000000 000002ff 00000000
[84947.544898] de80: 00000000 02000000 000080fe 00000000 ff2ec9cc ad9a63fe 00000000 00000085
[84947.553110] dea0: 00000000 de6f9600 deeb8000 de6f9620 00000001 00000000 00000000 00000001
[84947.561323] dec0: c0da14f4 c07f6464 000080fe 00000000 ff2ec9cc ad9a63fe de6f9640 df618078
[84947.569536] dee0: de6f9620 de6f9600 000000c0 00000000 de6f961c c07f79a0 00000000 df50b648
[84947.577749] df00: de77df4c c086f6f4 00000201 dfbdfec0 de73a680 de6f9640 dfbdfec0 00000000
[84947.585963] df20: ff7f6900 c0d03d00 00000000 c013c11c de73a680 de6f9640 00812a58 de73a680
[84947.594177] df40: dfbdfec0 dfbdfec0 de77c000 dfbdfed8 c0d03d00 de73a698 00000008 c013ce58
[84947.602391] df60: df50b2c0 ded11680 dec2ab00 de77c000 00000000 de73a680 c013cbac dd433ee8
[84947.610604] df80: ded1169c c01412a0 de77c000 dec2ab00 c0141170 00000000 00000000 00000000
[84947.618817] dfa0: 00000000 00000000 00000000 c010c338 00000000 00000000 00000000 00000000
[84947.627030] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[84947.635242] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[84947.643462] [<c0729ac0>] (consume_skb) from [<c0631948>] (mvneta_txq_bufs_free+0xd4/0x138)
[84947.651766] [<c0631948>] (mvneta_txq_bufs_free) from [<c0634d34>] (mvneta_poll+0x1b4/0xa20)
[84947.660157] [<c0634d34>] (mvneta_poll) from [<c073f178>] (net_rx_action+0x168/0x418)
[84947.667938] [<c073f178>] (net_rx_action) from [<c012aa5c>] (__do_softirq+0x13c/0x36c)
[84947.675805] [<c012aa5c>] (__do_softirq) from [<c012ad34>] (do_softirq+0x50/0x60)
[84947.683235] [<c012ad34>] (do_softirq) from [<c012ade0>] (__local_bh_enable_ip+0x9c/0xe4)
[84947.691363] [<c012ade0>] (__local_bh_enable_ip) from [<c07eacc8>] (ip6_finish_output2+0x4c8/0x5e0)
[84947.700364] [<c07eacc8>] (ip6_finish_output2) from [<c07eeac4>] (ip6_output+0x100/0x178)
[84947.708493] [<c07eeac4>] (ip6_output) from [<c0805184>] (ndisc_send_skb+0x268/0x320)
[84947.716273] [<c0805184>] (ndisc_send_skb) from [<c07f6464>] (addrconf_dad_completed+0x130/0x240)
[84947.725099] [<c07f6464>] (addrconf_dad_completed) from [<c07f79a0>] (addrconf_dad_work+0x358/0x40c)
[84947.734186] [<c07f79a0>] (addrconf_dad_work) from [<c013c11c>] (process_one_work+0x244/0x484)
[84947.742750] [<c013c11c>] (process_one_work) from [<c013ce58>] (worker_thread+0x2ac/0x3d0)
[84947.750966] [<c013ce58>] (worker_thread) from [<c01412a0>] (kthread+0x130/0x150)
[84947.758399] [<c01412a0>] (kthread) from [<c010c338>] (ret_from_fork+0x14/0x3c)
[84947.765654] Code: f57ff05b e28030b8 f593f000 e1932f9f (e2422001) 
[84947.771790] ---[ end trace c69c43e9909d26d8 ]---
[84947.776428] Kernel panic - not syncing: Fatal exception in interrupt
[84947.782813] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

This is very annoying since this is what is the main task of the box: network service.
Only powering the box off and on again by hand brings network back again.

Cheers,
Peter
Re: Kernel 4.12.4 MVEBU armhf: ethernet unstable!
September 06, 2017 04:29AM
Peter,

> unfortunately ethernet does not work occasionally.
> It usually happens when large files are transferred.

I did not experience this problem on my box. Can you tell me what type of files and how big are they? and what file transfer protocol you used? i.e NFS, Samba, ftp,....

I could run a test to see if I can recreate your problem.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)



Edited 1 time(s). Last edit at 09/06/2017 04:30AM by bodhi.
Re: Kernel 4.12.4 MVEBU armhf: ethernet unstable!
September 06, 2017 09:26AM
Hi Bodhi,

after several test it seems to have something to do with cifs and files around 4096000000 bytes size.
When connecting with ftp (I installed ftpd server on nas326) I can transfer files without reproducing this problem.
Only when copying over a cifs share (cifs server on nas326) this problem occurs (copy direction is nas326->client).
Getting files with "smbclient -U ...." stops the nic responding either.
Anyway it is a mystery, how a nic stops responding only because of cifs problems ;)
Peter
Re: Kernel 4.12.4 MVEBU armhf: ethernet unstable!
September 06, 2017 05:30PM
Peter,


> Anyway it is a mystery, how a nic stops responding
> only because of cifs problems ;)

That could happen. Have you tried NFS for the same files?

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: Kernel 4.12.4 MVEBU armhf: ethernet unstable!
September 06, 2017 07:48PM
I think this might be some bug in the mvneta driver buffer.

Quote

mvneta_txq_bufs_free

ftp usually consumes a much smaller buffer.

This does look like a bug in mvneta.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)



Edited 1 time(s). Last edit at 09/06/2017 08:02PM by bodhi.
Re: Kernel 4.12.4 MVEBU armhf: ethernet unstable!
September 06, 2017 11:53PM
Peter,

I've tried using NFS to copy a 4GB, and it works without incident

This is the file size:
root@Nas326:/media/rootfs_sata/localdisk# dd if=/dev/zero of=largefile.txt count=4096 bs=1048576 
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 62.5208 s, 68.7 MB/s

Copy direction: NAS326 hosts the file, and cp command was used to copy to another Linux box:
root@tldDebian:/localdisk# cp -a /mnt/nfs/nas326/media/rootfs_sata/localdisk/largefile.txt . &

So perhaps CIFS has triggered some bug in the mvneta driver.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: Kernel 4.12.4 MVEBU armhf: ethernet unstable!
September 07, 2017 07:16AM
I did not check nfs because I think the test with ftp is sufficient to prove that the problem lies within samba.

As a workaround i set socket options in smb.conf to SO_RCVBUF=32768 and SO_SNDBUF=32768 .
Now transfer is reliable, unfortunately speed dropped from 100MB/sec to 70MB/sec.
I am going to play with the buffersizes to get higher reliable transfers.

Strange that nobody else has these issues...

Cheers,
Peter
Re: Kernel 4.12.4 MVEBU armhf: ethernet unstable!
September 07, 2017 04:30PM
Peter,

> Strange that nobody else has these issues...

I think it's a bug in somewhere in TX code.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Author:

Your Email:


Subject:


Spam prevention:
Please, enter the code that you see below in the input field. This is for blocking bots that try to post this form automatically. If the code is hard to read, then just try to guess it right. If you enter the wrong code, a new image is created and you get another chance to enter it right.
Message: