Welcome! Log In Create A New Profile

Advanced

armada 370 L2 cache

Posted by Koen 
armada 370 L2 cache
November 11, 2020 05:33PM
Hi,

I just noticed this topic on the OpenWrt forum. Not sure if this is relevant to Debian on mvebu devices but decided to post here just in case.

Koen

https://forum.openwrt.org/t/mvebu-low-performance-on-armada-370-level-2-cache-disabled/78979
Re: armada 370 L2 cache
November 11, 2020 06:37PM
Koen,

Thanks! I'll look at this to verify. But I do remember that it was enabled (in the boot log).

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: armada 370 L2 cache
November 12, 2020 05:16AM
Hi Koen,

This seems legit for Armada 370 on any distro.

On my Thecus N2350 (Armada 385), the mvebu-pmsu disabled the CPU Idle automatically (its DTS does not have the node "cohenrencyfab broken-idle").

[    0.608363] mvebu-pmsu: CPU idle is currently broken on Armada 38x: disabling

But on my Mirabox (Armada 370) it does not show that message from mvebu-pmsu. So it looks like the driver does not recognize this is a problem for Armada 370, too.

I don't understand yet why would this would cause the L2 cache to be disabled! must be deep in the logic somewhere wrt how the L2 cache is handled when the CPU can not be idle.

I guess I need to run some benchmark before and after to see if this has caused some performance issue like danitool has seen. My observation has been that the Mirabox does underperform other Armada 38x boxes quite a bit and we did not know why. This could be a breakthrough.

Thanks!

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: armada 370 L2 cache
November 12, 2020 08:29AM
I'll test this out on the LS421DE (same device he's using on openwrt) under Debian and see if I can reproduce.
Re: armada 370 L2 cache
November 12, 2020 09:01AM
I tested it against Debian's current kernels:
4.9.0-14-armmp (Stretch)
4.19.0-12-armmp (Buster)
5.8.0-0.bpo.2-armmp (Buster-backports)

For all three the cache appears to be enabled already (the devmem values look correct and dmesg says it is)

[    0.000000] Aurora cache controller enabled, 4 ways, 256 kB
[    0.000000] Aurora: CACHE_ID 0x00000100, AUX_CTRL 0x1a086302

root@ls421de-Buster:~# busybox devmem 0xd0008100
0x00000001
root@ls421de-Buster:~# busybox devmem 0xd0008104
0x1A086302

I don't have any entries in my DTB for the coherencyfab currently:
https://github.com/1000001101000/Debian_on_Buffalo
Re: armada 370 L2 cache
November 12, 2020 02:11PM
Cool! So ithis might be a red herring. IOW, perhaps it only applicable to that device. I also see this on my Mirabox log:

[    0.000000] L2C: DT/platform modifies aux control register: 0x12086300 -> 0x1a086302
[    0.000000] Aurora cache controller enabled, 4 ways, 256 kB
[    0.000000] Aurora: CACHE_ID 0x00000100, AUX_CTRL 0x1a086302

I have not logged in to this Mirabox to do the devmem.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: armada 370 L2 cache
November 12, 2020 02:58PM
One of these days I should take a closer look at the openwrt work on this device. danitool has posted some really interesting stuff about the hardware and even figured out why a subset of these devices have network issues.

I wonder if there's something about the openwrt kernel or their DTB that is causing this problem and is subsequently fixed by that DTB entry on the post.
Re: armada 370 L2 cache
November 12, 2020 02:59PM
Now it gets interesting!

root@Mirabox:~# busybox devmem 0xd0008100
0x00000000

root@Mirabox:~# busybox devmem 0xd0008104
0x12086300

So the Mirabox behaves the same way that mamba box does.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: armada 370 L2 cache
November 12, 2020 03:14PM
So I guess the performance may improve by adding the same thing to the dts definition? I find it odd how there is different behaviour of the soc for different devices.

Koen
Re: armada 370 L2 cache
November 12, 2020 03:23PM
Koen Wrote:
-------------------------------------------------------
> So I guess the performance may improve by adding
> the same thing to the dts definition? I find it
> odd how there is different behaviour of the soc
> for different devices.
>
> Koen

Yes, quite strange. I'm doing some tests with that broken-idle node.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: armada 370 L2 cache
November 12, 2020 04:47PM
After adding that coherencyfabric node to the DTB:

root@Mirabox:~# busybox devmem 0xd0008100
0x00000000
root@Mirabox:~# busybox devmem 0xd0008104
0x12086300

Looks like it did not work.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: armada 370 L2 cache
November 13, 2020 10:40AM
I don't know enough about hardware / software but if the kernel doesn't change the devmem 0xd0008100
0x00000000 value would it be possible to do it from uboot before the kernel loads to get the performance improvement?

Koen
Re: armada 370 L2 cache
November 13, 2020 12:13PM
depends on whether the kernel will just change it again after you do or not.

as far as I can tell danitool and I are using the same hardware and uboot image (I flashed the one they posted) and are getting different results. I think the main difference is I'm using Debian's kernel and my DTB and they are using a custom openwrt kernel and DTB.

Seems to me a strong indication this is set when the kernel comes up.



Edited 1 time(s). Last edit at 11/13/2020 12:13PM by 1000001101000.
Re: armada 370 L2 cache
November 13, 2020 04:38PM
> as far as I can tell danitool and I are using the
> same hardware and uboot image (I flashed the one
> they posted) and are getting different results. I
> think the main difference is I'm using Debian's
> kernel and my DTB and they are using a custom
> openwrt kernel and DTB.

Add to that, I also use different custom kernel and the mainline DTB!

This seems to indicated that the kernel has set it

[    0.000000] L2C: DT/platform modifies aux control register: 0x12086300 -> 0x1a086302
[    0.000000] Aurora cache controller enabled, 4 ways, 256 kB
[    0.000000] Aurora: CACHE_ID 0x00000100, AUX_CTRL 0x1a086302

But then after that, something reversed the register back to 0x12086300.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: armada 370 L2 cache
November 13, 2020 06:39PM
@1000001101000,

> Seems to me a strong indication this is set when
> the kernel comes up.

Could you post your kernel config, or can I find it at your GitHub?

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: armada 370 L2 cache
November 13, 2020 07:33PM
I suppose technically it's available form debian in the kernel-image or kernel-config package. Here's the one I just grabbed from the device.
Attachments:
open | download - config-4.19.0-12-armmp (204.4 KB)
Re: armada 370 L2 cache
November 13, 2020 09:19PM
1000001101000 Wrote:
-------------------------------------------------------
> I suppose technically it's available form debian
> in the kernel-image or kernel-config package.
> Here's the one I just grabbed from the device.

Thanks!

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: armada 370 L2 cache
November 13, 2020 11:52PM
OK, this looks quite promising! my previous change in the DTS was not good.

I've tried again with a modified DTB, and this time the L2 cache register shows correct value.

[    0.000000] L2C: DT/platform modifies aux control register: 0x12086300 -> 0x1a086302
[    0.000000] Aurora cache controller enabled, 4 ways, 256 kB
[    0.000000] Aurora: CACHE_ID 0x00000100, AUX_CTRL 0x1a086302


root@Mirabox:/localdisk# dmesg | grep -i 'cpu idle'
[    0.559609] mvebu-pmsu: CPU idle is currently broken: disabling

root@Mirabox:/localdisk# busybox devmem 0xd0008100
0x00000001
root@Mirabox:/localdisk# busybox devmem 0xd0008104
0x1A086302

==========

Test copy a 489 MB file pull from nfs server.

-rw-r--r-- 1 root root 489M Mar 26  2017 bigfile_dockstar


Before:
root@Mirabox:/localdisk# time cp -av /mnt/nfs/tldplug/localdisk/bigfile_dockstar .
'/mnt/nfs/tldplug/localdisk/bigfile_dockstar' -> './bigfile_dockstar'

real	0m47.378s
user	0m0.029s
sys	0m23.374s

Note that if I repeat the above test several times, it gets system time to go down to 18 seconds, but not any lower than that.

After:

root@Mirabox:/localdisk# time cp -a  /mnt/nfs/tldplug/localdisk/bigfile_dockstar .

real	0m25.415s
user	0m0.019s
sys	0m10.592s

Note that it starts with system time = 14 seconds. And repeat this test, I gets system time to 11 seconds as shown above.

=====

Conclusion:

Yes, it does result in almost 70% kernel mode improvement!

I will post the new Mirabox and Netgear RN102 DTS/DTB in the next few posts for anybody want to try.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)



Edited 2 time(s). Last edit at 11/14/2020 07:17AM by bodhi.
Re: armada 370 L2 cache
November 14, 2020 07:14AM
It looks like danitool on the OpenWrt forum has found and possibly fixed another performance issue. However his fix looks like an OpenWrt specific patch.

https://forum.openwrt.org/t/mvebu-low-performance-on-armada-370-level-2-cache-disabled/78979/3



Edited 1 time(s). Last edit at 11/14/2020 07:17AM by Koen.
Re: armada 370 L2 cache
November 14, 2020 07:15AM
armada-370-mirabox.dtb

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Attachments:
open | download - armada-370-mirabox.dtb (12.9 KB)
open | download - armada-370-mirabox.dts (3.3 KB)
Re: armada 370 L2 cache
November 14, 2020 07:16AM
armada-370-netgear-rn102.dtb

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Attachments:
open | download - armada-370-netgear-rn102.dtb (14.4 KB)
open | download - armada-370-netgear-rn102.dts (14.4 KB)
Re: armada 370 L2 cache
November 14, 2020 07:36AM
Koen,

> It looks like danitool on the OpenWrt forum has
> found and possibly fixed another performance
> issue. However his fix looks like an OpenWrt
> specific patch.
>
> https://forum.openwrt.org/t/mvebu-low-performance-on-armada-370-level-2-cache-disabled/78979/3


That's very good. The IO Coherency problem has been there for Armada 370 for quite a while. It was for the reason danitool mentioned:

Quote

The Hardware coherency cannot be enabled in kernel upstream

This IO Coherency patch would not be accepted upstream. I did attempt to fix IO Coherency with wacke briefly, I think several years ago. And I ran out of time ot make it conform to upstream code.

In retrospect, dannitool's finding for the CPU idle is more important. It is not apparent that the L2 cache was disabled after the kernel has set the Aurora cache! Probably because the Armada XP is working with CPU idle (?), so nobody noticed (XP and 370 use mostly the same code) .

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)



Edited 1 time(s). Last edit at 11/14/2020 07:38AM by bodhi.
Re: armada 370 L2 cache
November 14, 2020 12:06PM
Hmmm, i’ll have to try and reproduce that. It might give me a reason to move to a custom kernel for the armhf devices.

I tried to avoid that for years but the CI/CD process I put together for the MV78100 kernel is way less painful than I expected.
Re: armada 370 L2 cache
November 14, 2020 02:44PM
Used the new dtb on my mirabox, iperf to an i3-machine changed from about 4xxMb/s to 8xxMb/s
Re: armada 370 L2 cache
November 14, 2020 04:01PM
daviddyer Wrote:
-------------------------------------------------------
> Used the new dtb on my mirabox, iperf to an
> i3-machine changed from about 4xxMb/s to 8xxMb/s

Cool!

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: armada 370 L2 cache
November 14, 2020 04:17PM
@1000001101000,

> Hmmm, i’ll have to try and reproduce that. It
> might give me a reason to move to a custom kernel
> for the armhf devices.

Your box L2 cache seems to work fine without having the broken-idle attribute in cohenrencyfab. That's a puzzle.

I guess you meant to reproduce the IO Coherency improvement?

Now if you are going to investigate IO Coherency, perhaps you could implement it a little cleaner. I recall that I studied the is_smp() function to see if I can patch it sooner (so that is_smp() works differently running on Armada 370).

danitool patch:

-	if (is_smp()) {
-		if (cachepolicy != CPOLICY_WRITEALLOC) {
-			pr_warn("Forcing write-allocate cache policy for SMP\n");
-			cachepolicy = CPOLICY_WRITEALLOC;
-		}
-		if (!(initial_pmd_value & PMD_SECT_S)) {
-			pr_warn("Forcing shared mappings for SMP\n");
-			initial_pmd_value |= PMD_SECT_S;
-		}
+	if (cachepolicy != CPOLICY_WRITEALLOC) {
+		pr_warn("Forcing write-allocate cache policy for Armada 370\n");
+		cachepolicy = CPOLICY_WRITEALLOC;

IIRC, is_smp() eventually invokes an assembly routine where it reads some status to determine if SMP is support. At that time, perhaps we can either use ifdef to change the logic to recognize it is Armada 370. Or we could parse the bootargs to find "armada-370" and then use it (not sure if the bootargs is readable at this point, though)

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)



Edited 1 time(s). Last edit at 11/14/2020 05:33PM by bodhi.
Re: armada 370 L2 cache
November 14, 2020 04:56PM
Exactly!

Quote

Aurora cache disabled: 42 MB/s
Aurora cache enabled: 82 MB/s
Aurora cache enabled and CPU coherency enabled: 99 MB/s

If the 82MB/s->99MB/s checks out I could see going through the trouble of moving to a custom kernel based around this patch and a few others I'm aware of (phy voltage fix, patch to read MAC from atag etc). I suppose I'd need at least two so that I could also have an lpae version for the Armada-xp devices... if the fix is needed there too.

It seems as I learn more my backlog increases exponentially these days.
Re: armada 370 L2 cache
November 14, 2020 05:31PM
> It seems as I learn more my backlog increases
> exponentially these days.

:) Mine is quite long too.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: armada 370 L2 cache
November 15, 2020 10:52PM
Hi bodhi,
My box got this:

root@OpenWrt:/# devmem 0xd0008100
[   46.565761] 8<--- cut here ---
[   46.568848] Unhandled fault: external abort on non-linefetch (0x1018) at 0xb6f31100
[   46.576541] pgd = cdebe8d4
[   46.579259] [b6f31100] *pgd=2d01d831, *pte=d0008383, *ppte=d0008a33
Bus error



Device tree:
// SPDX-License-Identifier: (GPL-2.0+ OR MIT)
/*
 * Device Tree file for Marvell Armada 370 Reference Design board
 * (RD-88F6710-A1)
 *
 *  Copied from arch/arm/boot/dts/armada-370-db.dts
 *
 *  Copyright (C) 2013 Florian Fainelli <florian@openwrt.org>
 *
 * Note: this Device Tree assumes that the bootloader has remapped the
 * internal registers to 0xf1000000 (instead of the default
 * 0xd0000000). The 0xf1000000 is the default used by the recent,
 * DT-capable, U-Boot bootloaders provided by Marvell. Some earlier
 * boards were delivered with an older version of the bootloader that
 * left internal registers mapped at 0xd0000000. If you are in this
 * situation, you should either update your bootloader (preferred
 * solution) or the below Device Tree should be adjusted.
 */

/dts-v1/;
#include <dt-bindings/input/input.h>
#include <dt-bindings/interrupt-controller/irq.h>
#include <dt-bindings/gpio/gpio.h>
#include "armada-370.dtsi"

/ {
	model = "RTNAS V3";
	compatible = "marvell,armada-370-rtnasv3", "marvell,armada370", "marvell,armada-370-xp";

	chosen {
		stdout-path = "serial0:115200n8";
	};

	memory@0 {
		device_type = "memory";
		reg = <0x00000000 0x40000000>; /* 1024 MB */
	};

	soc {
		ranges = <MBUS_ID(0xf0, 0x01) 0 0xf1000000 0x100000
				  MBUS_ID(0x01, 0xe0) 0 0xfff00000 0x100000
				  MBUS_ID(0x09, 0x01) 0 0xf1100000 0x10000>;

		internal-regs {
			serial@12000 {
				status = "okay";
			};

			sata@a0000 {
				nr-ports = <2>;
				status = "okay";
			};

			mvsdio@d4000 {
				pinctrl-0 = <&sdio_pins1>;
				pinctrl-names = "default";
				status = "disabled";
				/* No CD or WP GPIOs */
				broken-cd;
			};

			usb@50000 {
				status = "okay";
			};

			usb@51000 {
				status = "okay";
			};

			gpio-keys {
				compatible = "gpio-keys";
				pinctrl-0 = <&reset_button_pin &pwr_button_pin>;
				pinctrl-names = "default";

				reset_button {
					label = "Reset Button";
					linux,code = <KEY_RESTART>;
					gpios = <&gpio1 30 GPIO_ACTIVE_LOW>;
				};

				wps_button {
					label = "Software Button";
					linux,code = <KEY_POWER>;
					gpios = <&gpio1 20 GPIO_ACTIVE_LOW>;
				};
			};

			gpio-leds {
				compatible = "gpio-leds";
				pinctrl-names = "default";
				pinctrl-0 = <&pwr_led_pin &wps_led_pins>;

				blue_pwr_led {
					label = "rtnasv3:blue:pwr";
					gpios = <&gpio0 6 GPIO_ACTIVE_HIGH>;
					default-state = "keep";
				};

				blue_wps_led {
					label = "rtnasv3:blue:wps";
					gpios = <&gpio1 18 GPIO_ACTIVE_HIGH>;
					default-state = "off";
				};
			};
		};
	};
};

&mdio {
	pinctrl-0 = <&mdio_pins>;
	pinctrl-names = "default";
	status = "okay";

	switch: switch0@0 {
		compatible = "marvell,mv88e6085";
		#address-cells = <1>;
		#size-cells = <0>;
		reg = <0x0>;
		interrupt-controller;
		#interrupt-cells = <2>;

		ports {
			#address-cells = <1>;
			#size-cells = <0>;

			port@0 {
			       reg = <0>;
			       label = "lan1";
			};

			port@1 {
			       reg = <1>;
			       label = "lan2";
			};

			port@2 {
			       reg = <2>;
			       label = "lan3";
			};

			port@3 {
			       reg = <3>;
			       label = "lan4";
			};

			port@4 {
			       reg = <4>;
			       label = "wan";
			};

			port@5 {
				reg = <5>;
				label = "cpu";
				ethernet = <&eth1>;

				fixed-link {
					speed = <1000>;
					full-duplex;
				};
			};
		};

		mdio {
			#address-cells = <1>;
			#size-cells = <0>;

			switchphy0: switchphy@0 {
				reg = <0>;
				interrupt-parent = <&switch>;
				interrupts = <0 IRQ_TYPE_LEVEL_HIGH>;
			};

			switchphy1: switchphy@1 {
				reg = <1>;
				interrupt-parent = <&switch>;
				interrupts = <1 IRQ_TYPE_LEVEL_HIGH>;
			};

			switchphy2: switchphy@2 {
				reg = <2>;
				interrupt-parent = <&switch>;
				interrupts = <2 IRQ_TYPE_LEVEL_HIGH>;
			};

			switchphy3: switchphy@3 {
				reg = <3>;
				interrupt-parent = <&switch>;
				interrupts = <3 IRQ_TYPE_LEVEL_HIGH>;
			};

			switchphy4: switchphy@4 {
				reg = <4>;
				interrupt-parent = <&switch>;
				interrupts = <4 IRQ_TYPE_LEVEL_HIGH>;
			};
		};
	};
};

&pciec {
	status = "okay";

	/* Internal mini-PCIe connector */
	pcie@1,0 {
		/* Port 0, Lane 0 */
		status = "okay";
	};

	/* Internal mini-PCIe connector */
	pcie@2,0 {
		/* Port 1, Lane 0 */
		status = "okay";
	};
};


&pinctrl {
	compatible = "marvell,mv88f6710-pinctrl";

	pwr_button_pin: pwr-button-pin {
		marvell,pins = "mpp52";
		marvell,function = "gpio";
	};

	reset_button_pin: reset-button-pin {
		marvell,pins = "mpp62";
		marvell,function = "gpio";
	};

	pwr_led_pin: pwr-led-pin {
		marvell,pins = "mpp6";
		marvell,function = "gpio";
	};

	wps_led_pins: wps-led-pins {
		marvell,pins = "mpp50";
		marvell,function = "gpio";
	};
};

&nand_controller {
	status = "okay";

	nand@0 {
		reg = <0>;
		label = "pxa3xx_nand-0";
		nand-rb = <0>;
		marvell,nand-keep-config;
		nand-on-flash-bbt;
		nand-ecc-strength = <4>;
		nand-ecc-step-size = <512>;

		partitions {
			compatible = "fixed-partitions";
			#address-cells = <1>;
			#size-cells = <1>;

			partition@0 {
				label = "u-boot";
				reg = <0x0 0x400000>;
			};

			partition@400000 {
				label = "uboot_env";
				reg = <0x400000 0x400000>;
			};

			partition@800000 {
				label = "vendor";
				reg = <0x800000 0x400000>;
			};

			partition@c00000 {
				label = "unused";
				reg = <0xc00000 0xc00000>;
			};

			partition@1800000 {
				label = "kernel";
				reg = <0x1800000 0x400000>;
			};

			partition@1c00000 {
				label = "ubi";
				reg = <0x1c00000 0x3e400000>;
			};

			partition@40000000 {
				label = "syscfg";
				reg = <0x40000000 0xbbc00000>;
			};
		};
	};
};

&coherencyfab {
	broken-idle;
};

&eth0 {
	status = "disabled";
};

/* eth1 is connected to a Marvell 88E6171 switch, without a PHY. So set
 * fixed speed and duplex.
 */
&eth1 {
	pinctrl-names = "default";
	pinctrl-0 = <&ge1_rgmii_pins>;
	status = "okay";
	phy-mode = "rgmii-id";

	fixed-link {
		   speed = <1000>;
		   full-duplex;
	};
};

&rtc {
	status = "disabled";
};



Edited 1 time(s). Last edit at 11/15/2020 10:53PM by wacke.
Re: armada 370 L2 cache
November 15, 2020 11:03PM
wacke,

Your box has a newer u-boot so the MBUS memory is different.

Try

devmem 0xf0008100
devmem 0xf0008104


If that did not work then post your dmesg output here.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)



Edited 1 time(s). Last edit at 11/15/2020 11:05PM by bodhi.
Author:

Your Email:


Subject:


Spam prevention:
Please, enter the code that you see below in the input field. This is for blocking bots that try to post this form automatically. If the code is hard to read, then just try to guess it right. If you enter the wrong code, a new image is created and you get another chance to enter it right.
Message: