Welcome! Log In Create A New Profile

Advanced

TUN/TAP problems with Kirkwood kernel

Posted by tufkal 
Re: TUN/TAP problems with Kirkwood kernel
March 16, 2015 03:35PM
tufkal,

> As long as the kernel has the tun.ko driver
> enabled or as a module, the binary takes care of
> the rest.

Based on what you mentioned above and everybody's suggestions:

- On a fresh 3.18.5-kirkwood-tld-1 rootfs
- Add tun module to /etc/initramfs-tools/modules so it will tell initrd to load during boot.
echo  tun >> /etc/initramfs-tools/modules

- Regenerate initramfs
cd /boot
cp -a initrd.img-3.18.5-kirkwood-tld-1 initrd.img-3.18.5-kirkwood-tld-1.bak
update-initramfs -u
cp -a uInitrd uInitrd.bak
mkimage -A arm -O linux -T ramdisk -C gzip -a 0x00000000 -e 0x00000000 -n initramfs-3.18.5-kirkwood-tld-1 -d /boot/initrd.img-3.18.5-kirkwood-tld-1 /boot/uInitrd
sync

- Reboot
- Run the Install script

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)



Edited 1 time(s). Last edit at 03/16/2015 05:30PM by bodhi.
Re: TUN/TAP problems with Kirkwood kernel
March 16, 2015 05:23PM
@bodhi - theres a typo in you instructions

echo tun >> /etc/initramfs-toolsmodules should read
echo tun >> /etc/initramfs-tools/modules


@all

I hope the OP tries the above and runs the installation, which then runs without error. and feeds back the findings



on a side note : this post did go "all torvalds" at one point ;) .



Edited 1 time(s). Last edit at 03/16/2015 05:25PM by Gravelrash.
Re: TUN/TAP problems with Kirkwood kernel
March 16, 2015 05:30PM
@Gravelrash,

Thanks! corrected.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: TUN/TAP problems with Kirkwood kernel
March 16, 2015 07:44PM
Gravelrash Wrote:
-------------------------------------------------------
> @tufkal
> the output you gave from the commands i listed
> above, looks like i would expect from a standard
> output....
>
> and i think one of your comments earlier may
> have hit the nail on the head.

>
> "As long as the kernel has the tun.ko driver
> enabled or as a module"

> quote : As long as the kernel has the tun.ko
> driver enabled or as a module, the binary takes
> care of the rest. The binary create the routes,
> the interface settings, adds entries to the arp
> table, etc automatically. No additional work is
> needed by the end user beyond running the binary.
> :quote
>
> @bodhi, does your kernel have this option
> set? if it does then we should check its actually
> loaded on the "offending" system.
>
> @tufkal - can you do an lsmod and check if its
> loading and functional??

Yup TUN is set =m in the config, and lsmod shows it is running. I tried to rmmod and modprobe it back in manually multiple times as well, no change if i I do it manually or on boot.

By default it was loading with boot, and I have tried using rmmod/modprobe and running the script before and after those commands.



Edited 1 time(s). Last edit at 03/16/2015 07:46PM by tufkal.
Re: TUN/TAP problems with Kirkwood kernel
March 16, 2015 08:12PM
bodhi Wrote:
-------------------------------------------------------
> tufkal,
>
> > As long as the kernel has the tun.ko driver
> > enabled or as a module, the binary takes care
> of
> > the rest.
>
> Based on what you mentioned above and everybody's
> suggestions:
>
> - On a fresh 3.18.5-kirkwood-tld-1 rootfs
> - Add tun module to /etc/initramfs-tools/modules
> so it will tell initrd to load during boot.
>
> echo  tun >> /etc/initramfs-tools/modules
>
>
> - Regenerate initramfs
>
> cd /boot
> cp -a initrd.img-3.18.5-kirkwood-tld-1
> initrd.img-3.18.5-kirkwood-tld-1.bak
> update-initramfs -u
> cp -a uInitrd uInitrd.bak
> mkimage -A arm -O linux -T ramdisk -C gzip -a
> 0x00000000 -e 0x00000000 -n
> initramfs-3.18.5-kirkwood-tld-1 -d
> /boot/initrd.img-3.18.5-kirkwood-tld-1
> /boot/uInitrd
> sync
>
>
> - Reboot
> - Run the Install script

I will give this a try first chance I get.
Re: TUN/TAP problems with Kirkwood kernel
March 17, 2015 05:50AM
assuming the following,

a. The preloading of the module makes no difference to the behaviour
b. The networking is set as expected and the configs are applied on the fly by the application
c. tun.ko is functional and works, as evidenced by Openvpn installations.

We are left with the folowing options which are not kernel related.
1. Hardware incompatability - i.e these devices are Arm5te and the working devices are Arm6 and x86/amd64
2. The Software isnt written to cope with this architecture

The recourse then, as i see it, is to
I. Feed this back to the application developers and seek there input
II. Utilise a workaround

Just my thoughts for what they are worth

EDIT : I wonder if installing from mainline debian versus installing from source makes a positive difference.... just a thouught. https://packages.debian.org/search?keywords=n2n



Edited 1 time(s). Last edit at 03/17/2015 06:25AM by Gravelrash.
Re: TUN/TAP problems with Kirkwood kernel
March 17, 2015 07:33PM
Bodhi:

Created the new initramfs, no change :/ The module if definitely loading properly, and before the edge binary runs from rc.local.

Gravel:

The binary in the repo is very old, v1 of N2N, which is incompatible with the current version i'm using.

Is there a possibility in the architecture being the problem? I did this exact process on a RPi last night, but as you pointed out not only is that ARM6, its armhf, whereas the kirkwood is armel. I assumed I would see some kind of error or warning during compile if some hardware prereq wasn't met, but the make output is identical to a x86 machine. It uses less components than OpenVPN, basically just SSL and a AES two-fish cypher.



Edited 1 time(s). Last edit at 03/17/2015 09:58PM by tufkal.
Re: TUN/TAP problems with Kirkwood kernel
March 18, 2015 03:09AM
tufkal,

> Is there a possibility in the architecture being
> the problem? I did this exact process on a RPi
> last night, but as you pointed out not only is
> that ARM6, its armhf, whereas the kirkwood is
> armel.

If you can compile it with that script on the Pogo then it is an armel binary. So I don't think it is architeture.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: TUN/TAP problems with Kirkwood kernel
March 18, 2015 03:50AM
Hi tufkal,

There is no problem in the rootfs or the other settings. No need to setup iptables or routes or sysctl etc in your case. (Seting up iptables to allow traffic from n2n lan ip is required, but that rule is usually "allowed" by default) N2N is really very very simple per my understanding and i like the idea.

I just started to playing with N2N after seeing your post here, and I came across the same issue as you. "arping" works fine while others all fail. After lots of trial and fail, i checked the logs from n2n edge process and believe this should be bugs in the n2n sw. The workaround is to run edge with "-r" option enabled like below:

edge -d n2n0 -c aaa -k secret -a 10.0.1.5 -m 00:10:75:1a:96:e1 -l a.b.c.d:40086 -r

Yes, it is weird that this option is not needed on my other armbox(armhf), i still do not know why it fails on dockstar.

Maybe you can try the old 1_X_X version, but i wil stick to the workaround.

Hope this helps

Yong
Re: TUN/TAP problems with Kirkwood kernel
March 18, 2015 03:54AM
tufkal,

one more note for you, you'd better have a fresh restart of supernode and all the edge process if you add "-r" for dockstar edge process.

good luck.

Yong
Re: TUN/TAP problems with Kirkwood kernel
March 18, 2015 07:49PM
Confirming Yong's findings, adding the -r option to edge makes things work on kirkwood rootfs!

I did a fresh install of 3.18, did my script, and as expected it didn't work. Then I added the -r and it works.

Now the -r option is only needed in N2N if you are using a node as a routing point. For example, if I wanted to connect two networks completely, I would put edge binaries on their gateways, bridge the interfaces, and setup the appropriate routing. None of that is happening here, this is just one node pinging another node, and the -r option is not needed.

I have been in contact with the N2N team, and they have given me a lengthy list of things to try to debug the problem. They suspect it has to do with the way this kernel handles ARP, from all the results seen in this thread. Having to force -r and effectively turn on edge's own internally ARP processing is more evidence of this.

I will start combing through the tests they gave me to try, and see if I can narrow down what in this particular kernel/rootfs is to blame for this oddity.

And thanks so much to Yang for doubling my findings, doing the work to test it, and for coming up with a solution that makes sense and will most likely lead to the real problem. You sir are a scholar and a gentleman!



Edited 1 time(s). Last edit at 03/18/2015 07:53PM by tufkal.
Re: TUN/TAP problems with Kirkwood kernel
March 18, 2015 08:44PM
@Yong,

Kudos!

@tufkal,

Please post the kernel config for this (after installing on the working box).

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: TUN/TAP problems with Kirkwood kernel
March 18, 2015 09:18PM
Frankly speaking I do not think it is related with kernel oddity. It is more likely an application bug. I did some debug on this issue by turning on "-v" option on edge process. Without "-r", you can always see error logs like "Discarding routed packet....". Following source codes i gid into file "edge.c":

/* Discard IP packets that are not originated by this hosts */
if(!(eee->allow_routing)) {
if(ntohs(eh.type) == 0x0800) {
/* This is an IP packet from the local source address - not forwarded. */
#define ETH_FRAMESIZE 14
#define IP4_SRCOFFSET 12
uint32_t *dst = (uint32_t*)&tap_pkt[ETH_FRAMESIZE + IP4_SRCOFFSET];

/* Note: all elements of the_ip are in network order */
if( *dst != eee->device.ip_addr) {
/* This is a packet that needs to be routed */
traceEvent(TRACE_INFO, "Discarding routed packet [%s]",
intoa(ntohl(*dst), ip_buf, sizeof(ip_buf)));

return;
} else {
/* This packet is originated by us */
/* traceEvent(TRACE_INFO, "Sending non-routed packet"); */
}
}
}

It seems that the source ip is not extracted corrected and cause the packet to be dropped. In my case, if i ping from 10.0.1.5 to another node, the error log show it is from 10.0.34.56, not sure why it extracted this weired address(offset wrong?). Using "-r" will by-pass this check. You can pass the debug log to N2N team, should be a simple bug for them to fix :-)

Cheers!

Yong
Re: TUN/TAP problems with Kirkwood kernel
March 18, 2015 09:24PM
javatmn Wrote:
-------------------------------------------------------
> Frankly speaking I do not think it is related with
> kernel oddity. It is more likely an application
> bug. I did some debug on this issue by turning on
> "-v" option on edge process. Without "-r", you can
> always see error logs like "Discarding routed
> packet....". Following source codes i gid into
> file "edge.c":
>
> /* Discard IP packets that are not originated
> by this hosts */
> if(!(eee->allow_routing)) {
> if(ntohs(eh.type) == 0x0800) {
> /* This is an IP packet from the local
> source address - not forwarded. */
> #define ETH_FRAMESIZE 14
> #define IP4_SRCOFFSET 12
> uint32_t *dst =
> (uint32_t*)&tap_pkt[ETH_FRAMESIZE +
> IP4_SRCOFFSET];
>
> /* Note: all elements of the_ip are in
> network order */
> if( *dst != eee->device.ip_addr) {
> /* This is a packet that needs to be
> routed */
> traceEvent(TRACE_INFO,
> "Discarding routed packet [%s]",
> intoa(ntohl(*dst),
> ip_buf, sizeof(ip_buf)));

> return;
> } else {
> /* This packet is originated by us
> */
> /* traceEvent(TRACE_INFO, "Sending
> non-routed packet"); */
> }
> }
> }
>
> It seems that the source ip is not extracted
> corrected and cause the packet to be dropped. In
> my case, if i ping from 10.0.1.5 to another node,
> the error log show it is from 10.0.34.56, not sure
> why it extracted this weired address(offset
> wrong?). Using "-r" will by-pass this check. You
> can pass the debug log to N2N team, should be a
> simple bug for them to fix :-)
>
> Cheers!
>
> Yong

That makes perfect sense, and does indeed look like a bug in the code because of the wierd offset. But if its a software bug why does it work on x86 and other arm builds? It has to be a combination of something in the way this kernel does things that N2N doesn't compensate for correctly, or visa versa.

The main fact the the exact same process on vanilla wheezy x86 will work is what throws me so much...
Re: TUN/TAP problems with Kirkwood kernel
March 18, 2015 09:36PM
Yes, i am also puzzled because it works fine on my other armhf box...
Re: TUN/TAP problems with Kirkwood kernel
March 18, 2015 11:03PM
@Yong,

> Frankly speaking I do not think it is related with
> kernel oddity. It is more likely an application
> bug.

Agree. Still good to see the config differrence because it might give some hints! Especially when you guys said that armhf 32-bit systems work fine.

> It seems that the source ip is not extracted
> corrected and cause the packet to be dropped. In
> my case, if i ping from 10.0.1.5 to another node,
> the error log show it is from 10.0.34.56, not sure
> why it extracted this weired address(offset
> wrong?). Using "-r" will by-pass this check. You
> can pass the debug log to N2N team, should be a
> simple bug for them to fix :-)

Is 10.0.34.56 in the network or is it a bogus number?

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: TUN/TAP problems with Kirkwood kernel
March 18, 2015 11:39PM
That is a bogus number, i can not relate that with any network params.
Re: TUN/TAP problems with Kirkwood kernel
March 18, 2015 11:51PM
javatmn Wrote:
-------------------------------------------------------
> That is a bogus number, i can not relate that with
> any network params.

Sounds like a coding error :) 3456 must be some default value somewhere. On your armhf system, which gcc version is used for compiling?

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: TUN/TAP problems with Kirkwood kernel
March 19, 2015 12:11AM
That number is not fixed, seems to be related with packet content.

Also be noted that my dockstar does not use your rootfs, i set it up from scratch with debootstrap(That is also another reason why i do not suspect your kernel)

The details on my armhf box(AllWinner A20 Chip):
[root@mele ~]# uname -a
Linux mele 3.4.103-javatmn+ #32 PREEMPT Mon Mar 16 16:25:58 CST 2015 armv7l GNU/Linux
[root@mele ~]# gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/4.6/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: ../src/configure -v --with-pkgversion='Debian 4.6.3-14' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin --enable-objc-gc --disable-sjlj-exceptions --with-arch=armv7-a --with-fpu=vfpv3-d16 --with-float=hard --with-mode=thumb --enable-checking=release --build=arm-linux-gnueabihf --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf
Thread model: posix
gcc version 4.6.3 (Debian 4.6.3-14) 
[root@mele ~]#

Details on my dockstar:
[root@ds-emdebian /]# uname -a
Linux ds-emdebian 3.2.0-4-kirkwood #1 Debian 3.2.46-1 armv5tel GNU/Linux
[root@ds-emdebian /]# gcc -v 
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabi/4.6/lto-wrapper
Target: arm-linux-gnueabi
Configured with: ../src/configure -v --with-pkgversion='Debian 4.6.3-14' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin --enable-objc-gc --disable-sjlj-exceptions --with-arch=armv4t --with-float=soft --enable-checking=release --build=arm-linux-gnueabi --host=arm-linux-gnueabi --target=arm-linux-gnueabi
Thread model: posix
gcc version 4.6.3 (Debian 4.6.3-14) 
[root@ds-emdebian /]# 

Re: TUN/TAP problems with Kirkwood kernel
March 19, 2015 10:40AM
Wow, now I'm really confused. Custom compiled kernel and userspace on 2 different arm devices, with the same software, compiler, kernel series, etc.

It is my understanding that the changes from armel to armhf are simply optimizations to provide better floating point math, and any code relating to these functions is simply slower on armel, not broken. Is that correct?



Edited 2 time(s). Last edit at 03/19/2015 10:58AM by tufkal.
Re: TUN/TAP problems with Kirkwood kernel
March 19, 2015 02:56PM
Im no expert, but I believe there is more to it than that, particularly in terms of compatabilities.
Re: TUN/TAP problems with Kirkwood kernel
March 19, 2015 03:46PM
tufkal,

When you recompile using a different platform, different code generated (different library), so it might mask the bug on that platform.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: TUN/TAP problems with Kirkwood kernel
March 25, 2015 03:16PM
To complete this post -

I take it that the issue actually is with the package in question and not with the kernel.
Re: TUN/TAP problems with Kirkwood kernel
March 25, 2015 05:20PM
Gravelrash Wrote:
-------------------------------------------------------
> To complete this post -
>
> I take it that the issue actually is with the
> package in question and not with the kernel.

I think javatmn has proven that above. Because the installation script actually compiles the code on the platform, the bug (if any) in the code might surface in one platform but not the others. It does look like a bug from what javatmn's debug log shown above.

As I've mentioned, I think if we dig deeper, we might find something about the code generated in armhf or x86 that changed the behavior (or armel code generated in this kernel did shift things a little and surface the bug), IMO. But since there is a workaround, thanks to javatmn, we should just let the maintainers troubleshoot the code. My WAG is the bogus IP address seems to point to an uninitialized variable somewhere being used.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Author:

Your Email:


Subject:


Spam prevention:
Please, enter the code that you see below in the input field. This is for blocking bots that try to post this form automatically. If the code is hard to read, then just try to guess it right. If you enter the wrong code, a new image is created and you get another chance to enter it right.
Message: