Welcome! Log In Create A New Profile

Advanced

Kernel DSA dirver seems broken on armada 370 and kirkwood devices

Posted by wacke 
Kernel DSA dirver seems broken on armada 370 and kirkwood devices
April 19, 2021 09:26AM
Hi,
Is there anyone noticed that the kernel dsa driver for 88e6171r and 88e6161 switch chip was broken from kernel 5.10 and above?

The issue is that it can't transmit data from mvebu device (eg: via scp), it just hangs like this:

root@debian:~# scp /tmp/rescue.img Wacke@192.168.1.2:/tmp/
Password: 
rescue.img                                      5% 1920KB   1.9MB/s   00:16 ETA

But the data can transmit from other device to mvebu or kirkwood device:

Wacke@HOME-Server:/srv/tftpboot> scp openwrt-ddnasv3-Generic-ubi-sysupgrade.img root@192.168.1.200:/tmp
root@192.168.1.200's password: 
openwrt-ddnasv3-Generic-ubi-sysupgrade.img

And I've tested on my armada 370 device (with 88e6171r switch chip), and kernel 5.10 5.11 5.12 with openwrt and debain, all the same.
Another kirkwood device (with 88e6161 switch), tested on openwrt 5.10 kernel, got the same issue.

I also posted this issue to the openwrt forum:
https://forum.openwrt.org/t/5-10-kernel-dsa-driver-issues-on-armada-370-devices/91432



Edited 1 time(s). Last edit at 04/19/2021 09:27AM by wacke.
Re: Kernel DSA dirver seems broken on armada 370 and kirkwood devices
April 19, 2021 05:26PM
wacke,

> And I've tested on my armada 370 device (with
> 88e6171r switch chip), and kernel 5.10 5.11 5.12
> with openwrt and debain, all the same.

I don't have any Armada 370 with a DSA switch.

> Another kirkwood device (with 88e6161 switch),
> tested on openwrt 5.10 kernel, got the same
> issue.

I do have the Linksys EA4500, but not using it for anything other than testing. Will boot it up with kernel 5.11.4 to see if I can repeat the problem.

Instead of using scp, can you test it with rsync or cp to see if the same problem occurs?

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: Kernel DSA dirver seems broken on armada 370 and kirkwood devices
April 19, 2021 07:23PM
Hi bodhi,
Tested using rsync, seems the same issue, it just hangs:

root@debian:/tmp# rsync -v -a rescue.img Wacke@192.168.1.2:/tmp
Password: 
sending incremental file list
rescue.img

Re: Kernel DSA dirver seems broken on armada 370 and kirkwood devices
April 20, 2021 02:32AM
Yes, indeed it is broken.

From the Linksys EA4500 (switch chip 6171), I can copy a file (pull) from another Kirwood box. But when I push a file from the EA4500 to that Kirkwood box, it hang.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: Kernel DSA dirver seems broken on armada 370 and kirkwood devices
April 20, 2021 03:54AM
bodhi Wrote:
-------------------------------------------------------
> Yes, indeed it is broken.
>
> From the Linksys EA4500 (switch chip 6171), I can
> copy a file (pull) from another Kirwood box. But
> when I push a file from the EA4500 to that
> Kirkwood box, it hang.

Hi bodhi,
Then is there anyway to fix this?
Re: Kernel DSA dirver seems broken on armada 370 and kirkwood devices
April 20, 2021 04:23AM
wacke,

I logged into the box on another ssh session and look at dmesg. There are block tasks shown, but I have not figured out the reason why. It is not apparent why NFS reported that it did not receive response (from the other Kirkwood box) when I push file to it.

The EA4500 is basically dead, just alive enough for me to see the log. You could try ssh in from another terminal session and see dmesg.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: Kernel DSA dirver seems broken on armada 370 and kirkwood devices
April 20, 2021 06:29AM
Hi bodhi,
Actually I can't get any kernel message when transfer data at all....
Re: Kernel DSA dirver seems broken on armada 370 and kirkwood devices
April 20, 2021 04:13PM
wacke,

> Actually I can't get any kernel message when
> transfer data at all....

After 2 minutes or more, the kernel eventually will report a blocked task. So after you see it hang, wait for 2 minutes and then open another terminal and ssh in. The first thing you should do is running dmesg, don't do anything else before you get the log.

I accidentally closed that teminal window, so cannot post the log.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: Kernel DSA dirver seems broken on armada 370 and kirkwood devices
April 21, 2021 08:26AM
Hi bodhi,
I've confirmed that the issue was actually start from 5.9.0, it's about the port mtu setting.

The change log of kernel 5.9.0:
commit dfecd3e00cd32b2a6d1cfdb30b513dd42575ada3
Merge: 9b964f1654616 1baf0fac10fbe
Author: David S. Miller <davem@davemloft.net>
Date:   Fri Jul 24 20:03:28 2020 -0700

    Merge branch 'net-dsa-mv88e6xxx-port-mtu-support'
    
    Chris Packham says:
    
    ====================
    net: dsa: mv88e6xxx: port mtu support
    
    This series connects up the mv88e6xxx switches to the dsa infrastructure for
    configuring the port MTU. The first patch is also a bug fix which might be a
    candiatate for stable.
    
    I've rebased this series on top of net-next/master to pick up Andrew's change
    for the gigabit switches. Patch 1 and 2 are unchanged (aside from adding
    Andrew's Reviewed-by). Patch 3 is reworked to make use of the existing mtu
    support.
    ====================
    
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 1baf0fac10fbe3084975d7cb0a4378eb18871482
Author: Chris Packham <chris.packham@alliedtelesis.co.nz>
Date:   Fri Jul 24 11:21:22 2020 +1200

    net: dsa: mv88e6xxx: Use chip-wide max frame size for MTU
    
    Some of the chips in the mv88e6xxx family don't support jumbo
    configuration per port. But they do have a chip-wide max frame size that
    can be used. Use this to approximate the behaviour of configuring a port
    based MTU.
    
    Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit e8b34c67d6c10ee3f187469958af3fb36c9c3361
Author: Chris Packham <chris.packham@alliedtelesis.co.nz>
Date:   Fri Jul 24 11:21:21 2020 +1200

    net: dsa: mv88e6xxx: Support jumbo configuration on 6190/6190X
    
    The MV88E6190 and MV88E6190X both support per port jumbo configuration
    just like the other GE switches. Install the appropriate ops.
    
    Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 0f3c66a3c7b4e8b9f654b3c998e9674376a51b0f
Author: Chris Packham <chris.packham@alliedtelesis.co.nz>
Date:   Fri Jul 24 11:21:20 2020 +1200

    net: dsa: mv88e6xxx: MV88E6097 does not support jumbo configuration
    
    The MV88E6097 chip does not support configuring jumbo frames. Prior to
    commit 5f4366660d65 only the 6352, 6351, 6165 and 6320 chips configured
    jumbo mode. The refactor accidentally added the function for the 6097.
    Remove the erroneous function pointer assignment.
    
    Fixes: 5f4366660d65 ("net: dsa: mv88e6xxx: Refactor setting of jumbo frames")
    Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>


I tried remove the code from kernel 5.9.0 (linux-5.9/drivers/net/dsa/mv88e6xxx/chip.c @ static const struct dsa_switch_ops mv88e6xxx_switch_ops):
	.port_max_mtu		= mv88e6xxx_get_max_mtu,
	.port_change_mtu	= mv88e6xxx_change_mtu,

And the kernel log:
[   29.999597][   T20] mv88e6085 f1072004.mdio-mii:00: nonfatal error -95 setting MTU on port 0
[   30.032961][   T20] mv88e6085 f1072004.mdio-mii:00 lan1 (uninitialized): PHY [!soc!internal-regs!mdio@72004!switch0@0!mdio:00] driver [Generic PHY] (irq=POLL)
[   30.077756][   T20] mv88e6085 f1072004.mdio-mii:00: nonfatal error -95 setting MTU on port 1
[   30.117423][   T20] mv88e6085 f1072004.mdio-mii:00 lan2 (uninitialized): PHY [!soc!internal-regs!mdio@72004!switch0@0!mdio:01] driver [Generic PHY] (irq=POLL)
[   30.163016][   T20] mv88e6085 f1072004.mdio-mii:00: nonfatal error -95 setting MTU on port 2
[   30.198685][   T20] mv88e6085 f1072004.mdio-mii:00 lan3 (uninitialized): PHY [!soc!internal-regs!mdio@72004!switch0@0!mdio:02] driver [Generic PHY] (irq=POLL)
[   30.251670][   T20] mv88e6085 f1072004.mdio-mii:00: nonfatal error -95 setting MTU on port 3
[   30.288805][   T20] mv88e6085 f1072004.mdio-mii:00 lan4 (uninitialized): PHY [!soc!internal-regs!mdio@72004!switch0@0!mdio:03] driver [Generic PHY] (irq=POLL)
[   30.342700][   T20] mv88e6085 f1072004.mdio-mii:00: nonfatal error -95 setting MTU on port 4
[   30.381677][   T20] mv88e6085 f1072004.mdio-mii:00 wan (uninitialized): PHY [!soc!internal-regs!mdio@72004!switch0@0!mdio:04] driver [Generic PHY] (irq=POLL)
[   30.430544][   T20] mv88e6085 f1072004.mdio-mii:00: configuring for fixed/rgmii-id link mode

But the data transfer was not hang anymore:

root@debian:~# rsync -a rescue.img Wacke@192.168.1.2:/NAS/Software/Software/
Password: 
root@debian:~#

I'm not goot at source code, how to fix this issue?
Re: Kernel DSA dirver seems broken on armada 370 and kirkwood devices
April 21, 2021 10:05AM
Hi bodhi,
Tried modify (mv88e6xxx_ops mv88e6171_ops @ linux-5.9/drivers/net/dsa/mv88e6xxx/chip.c):
remove
	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,

add
	.set_max_frame_size = mv88e6185_g1_set_max_frame_size,

Seems worked:

root@debian:~# scp rescue.img Wacke@192.168.1.2:/NAS/Software/Software/
Password: 
rescue.img                                                                                                                                          100%   32MB   8.8MB/s   00:03    
root@debian:~# scp rescue.img Wacke@192.168.1.2:/NAS/Software/Software/
Password: 
rescue.img                                                                                                                                          100%   32MB   9.4MB/s   00:03    
root@debian:~# 
Re: Kernel DSA dirver seems broken on armada 370 and kirkwood devices
April 21, 2021 05:03PM
Good works wacke!

I need to read that thread to see why it seems they've missed the 6171 switch. Apparently the 6097 was taken into consideration.

Quote

net: dsa: mv88e6xxx: MV88E6097 does not support jumbo configuration

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: Kernel DSA dirver seems broken on armada 370 and kirkwood devices
April 22, 2021 11:45PM
wacke,

This seems to be a common problem for all DTS that has the 6171 chip in it.

Your RT NAS V3, qizhitong_501m_v2, and my Linksys EA4500 DTS files, all have this same field:

&mdio {
switch
....
		compatible = "marvell,mv88e6085";
		#address-cells = <1>;


Note that using mv88e6085 worked in the past, because it is really compatible. But the chip was detected as 6171.

[    8.643897] mv88e6085 f1072004.mdio-bus-mii:10: switch 0x1710 detected: Marvell 88E6171, revision 2

I think it is the root cause, where it went wrong. The 6085 ops does not have jumbo frame, only set_max_frame_size (which is correct)

static const struct mv88e6xxx_ops mv88e6085_ops = {
	/* MV88E6XXX_FAMILY_6097 */
	.ieee_pri_map = mv88e6085_g1_ieee_pri_map,
	.ip_pri_map = mv88e6085_g1_ip_pri_map,
	.irl_init_all = mv88e6352_g2_irl_init_all,
	.set_switch_mac = mv88e6xxx_g1_set_switch_mac,
	.phy_read = mv88e6185_phy_ppu_read,
	.phy_write = mv88e6185_phy_ppu_write,
	.port_set_link = mv88e6xxx_port_set_link,
	.port_set_speed_duplex = mv88e6185_port_set_speed_duplex,
	.port_tag_remap = mv88e6095_port_tag_remap,
	.port_set_frame_mode = mv88e6351_port_set_frame_mode,
	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
	.port_set_ether_type = mv88e6351_port_set_ether_type,
	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
	.port_pause_limit = mv88e6097_port_pause_limit,
	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
	.port_get_cmode = mv88e6185_port_get_cmode,
	.port_setup_message_port = mv88e6xxx_setup_message_port,
	.stats_snapshot = mv88e6xxx_g1_stats_snapshot,
	.stats_set_histogram = mv88e6095_g1_stats_set_histogram,
	.stats_get_sset_count = mv88e6095_stats_get_sset_count,
	.stats_get_strings = mv88e6095_stats_get_strings,
	.stats_get_stats = mv88e6095_stats_get_stats,
	.set_cpu_port = mv88e6095_g1_set_cpu_port,
	.set_egress_port = mv88e6095_g1_set_egress_port,
	.watchdog_ops = &mv88e6097_watchdog_ops,
	.mgmt_rsvd2cpu = mv88e6352_g2_mgmt_rsvd2cpu,
	.pot_clear = mv88e6xxx_g2_pot_clear,
	.ppu_enable = mv88e6185_g1_ppu_enable,
	.ppu_disable = mv88e6185_g1_ppu_disable,
	.reset = mv88e6185_g1_reset,
	.rmu_disable = mv88e6085_g1_rmu_disable,
	.vtu_getnext = mv88e6352_g1_vtu_getnext,
	.vtu_loadpurge = mv88e6352_g1_vtu_loadpurge,
	.phylink_validate = mv88e6185_phylink_validate,
	.set_max_frame_size = mv88e6185_g1_set_max_frame_size,
};

But looks like after the kernel has initialized the chip, the actual ops definition used is mv88e6171_ops.

static const struct mv88e6xxx_ops mv88e6171_ops = {
	/* MV88E6XXX_FAMILY_6351 */
	.ieee_pri_map = mv88e6085_g1_ieee_pri_map,
	.ip_pri_map = mv88e6085_g1_ip_pri_map,
	.irl_init_all = mv88e6352_g2_irl_init_all,
	.set_switch_mac = mv88e6xxx_g2_set_switch_mac,
	.phy_read = mv88e6xxx_g2_smi_phy_read,
	.phy_write = mv88e6xxx_g2_smi_phy_write,
	.port_set_link = mv88e6xxx_port_set_link,
	.port_set_rgmii_delay = mv88e6352_port_set_rgmii_delay,
	.port_set_speed_duplex = mv88e6185_port_set_speed_duplex,
	.port_tag_remap = mv88e6095_port_tag_remap,
	.port_set_frame_mode = mv88e6351_port_set_frame_mode,
	.port_set_egress_floods = mv88e6352_port_set_egress_floods,
	.port_set_ether_type = mv88e6351_port_set_ether_type,
	.port_set_jumbo_size = mv88e6165_port_set_jumbo_size,
	.port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting,
	.port_pause_limit = mv88e6097_port_pause_limit,
	.port_disable_learn_limit = mv88e6xxx_port_disable_learn_limit,
	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
	.port_get_cmode = mv88e6352_port_get_cmode,
	.port_setup_message_port = mv88e6xxx_setup_message_port,
	.stats_snapshot = mv88e6320_g1_stats_snapshot,
	.stats_set_histogram = mv88e6095_g1_stats_set_histogram,
	.stats_get_sset_count = mv88e6095_stats_get_sset_count,
	.stats_get_strings = mv88e6095_stats_get_strings,
	.stats_get_stats = mv88e6095_stats_get_stats,
	.set_cpu_port = mv88e6095_g1_set_cpu_port,
	.set_egress_port = mv88e6095_g1_set_egress_port,
	.watchdog_ops = &mv88e6097_watchdog_ops,
	.mgmt_rsvd2cpu = mv88e6352_g2_mgmt_rsvd2cpu,
	.pot_clear = mv88e6xxx_g2_pot_clear,
	.reset = mv88e6352_g1_reset,
	.atu_get_hash = mv88e6165_g1_atu_get_hash,
	.atu_set_hash = mv88e6165_g1_atu_set_hash,
	.vtu_getnext = mv88e6352_g1_vtu_getnext,
	.vtu_loadpurge = mv88e6352_g1_vtu_loadpurge,
	.phylink_validate = mv88e6185_phylink_validate,
};

So nobody would notice that if they do the code review and trust the DTS compatible field (if they did not do the actual test for the box with 6171 chip and kernel 5.10 or later). They might just not have the real hardware to test with.

However, the fix is temporary only.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)



Edited 1 time(s). Last edit at 04/22/2021 11:52PM by bodhi.
Re: Kernel DSA dirver seems broken on armada 370 and kirkwood devices
April 22, 2021 11:55PM
Continue with my post above.

As I've learned in the past, the 6171 switch chip does support Jumbo Frame!

So there are more code to be fixed. Some where the settings for this chip is incorrect.

If we want this chip to have Jumbo frame capability, we need to dig deeper.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: Kernel DSA dirver seems broken on armada 370 and kirkwood devices
April 23, 2021 01:18AM
bodhi Wrote:
-------------------------------------------------------
> Continue with my post above.
>
> As I've learned in the past, the 6171 switch chip
> does support Jumbo Frame!
>

Hi bodhi,

Maybe that the 6171 switch actually don't support support jumbo configuration per port like the change log said?

    Some of the chips in the mv88e6xxx family don't support jumbo
    configuration per port. But they do have a chip-wide max frame size that
    can be used. Use this to approximate the behaviour of configuring a port
    based MTU.
Re: Kernel DSA dirver seems broken on armada 370 and kirkwood devices
April 23, 2021 02:25AM
wacke,

> Maybe that the 6171 switch actually don't support
> support jumbo configuration per port like the
> change log said?
>
>
>     Some of the chips in the mv88e6xxx family
> don't support jumbo
>     configuration per port. But they do have a
> chip-wide max frame size that
>     can be used. Use this to approximate the
> behaviour of configuring a port
>     based MTU.
>

I think your fix is pretty good. Perhaps that's all we need. Because Jumbo Frame is hard to use anyway in a personal environment, e.g. every node in a local network must support jumbo frame to make it work.

Regarding the jumbo frame supports in Marvell switches, the comment above is correct, of course (it came from experts). However, it's a general statement that said only some of the chips.

The 6171 Datasheet is still under NDA with Marvell, I believe. But I happened to come across a small excerpt of the datasheet that described jumbo frame receive/transmit.

I think you should submit a patch to OpenWrt.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: Kernel DSA dirver seems broken on armada 370 and kirkwood devices
June 08, 2021 07:52PM
Hi bodhi,
There are some patches for this problem, you can fallow these patches:
https://www.spinics.net/lists/netdev/msg743188.html
Re: Kernel DSA dirver seems broken on armada 370 and kirkwood devices
June 08, 2021 08:38PM
Cool! Thanks wacke!

Btw, I've included your RT NAS V3 support in the MVEBU kernel release thread.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: Kernel DSA dirver seems broken on armada 370 and kirkwood devices
June 10, 2021 07:14PM
bodhi Wrote:
-------------------------------------------------------
> Cool! Thanks wacke!
>
> Btw, I've included your RT NAS V3 support in the
> MVEBU kernel release thread.

Hi bodhi,
Many thanks. I've attached my self build uboot to that post, or it won't boot to debian at all.



Edited 1 time(s). Last edit at 06/10/2021 07:15PM by wacke.
Re: Kernel DSA dirver seems broken on armada 370 and kirkwood devices
June 10, 2021 11:31PM
wacke,

Cool! I've updated the release thread to mention the modified stock u-boot.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Author:

Your Email:


Subject:


Spam prevention:
Please, enter the code that you see below in the input field. This is for blocking bots that try to post this form automatically. If the code is hard to read, then just try to guess it right. If you enter the wrong code, a new image is created and you get another chance to enter it right.
Message: