Welcome! Log In Create A New Profile

Advanced

Unhandled fault: page domain fault (0x81b) at 0x00465000 when starting pihole-FTL on recent kernel

Posted by kennywest 
All,

This week I upgraded my PogoPlug Pro to Debian Bullseye. The plug is running pi-hole which has been running fine for more than a year now. This was on kernel linux-4.14.198-oxnas-tld-1. Bullseye is running the same kernel and pi-hole runs fine as well.
Today I decided to upgrade to the most recent kernel I can find here (linux-5.4.224-oxnas-tld-1). Everything works but pihole-FTL fails with the following error at startup:

root@pogo01:/# /etc/init.d/pihole-FTL start
Not running

[  103.153164][ T1224] 8<--- cut here ---
[  103.156915][ T1224] Unhandled fault: page domain fault (0x81b) at 0x00465000
[  103.163927][ T1224] pgd = f9cf97d2
[  103.167308][ T1224] [00465000] *pgd=661c0831, *pte=63ae859f, *ppte=63ae8e6e
[  103.174243][ T1224] Internal error: : 81b [#1] PREEMPT SMP ARM
[  103.180042][ T1224] Modules linked in: rt2800pci rt2800mmio rt2800lib rt2x00pci rt2x00mmio rt2x00lib mac80211 cfg80211 rfkill dwmac_generic uas
[  103.192875][ T1224] CPU: 0 PID: 1224 Comm: pihole-FTL Not tainted 5.4.210-oxnas-tld-1 #1.0
[  103.201093][ T1224] Hardware name: Generic DT based system
[  103.206561][ T1224] PC is at v6_coherent_kern_range+0x4/0x2c
[  103.212195][ T1224] LR is at arm_syscall+0x188/0x3b4
[  103.217131][ T1224] pc : [<c010e6b0>]    lr : [<c010635c>]    psr: 60000013
[  103.224056][ T1224] sp : c61aff90  ip : 00000000  fp : bebdfb1c
[  103.229944][ T1224] r10: 00000000  r9 : c61ae000  r8 : c01011e4
[  103.235832][ T1224] r7 : 00466000  r6 : 00760000  r5 : c61ae000  r4 : 00000002
[  103.243018][ T1224] r3 : 00001000  r2 : c7189c00  r1 : 00466000  r0 : 00465000
[  103.250201][ T1224] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[  103.257991][ T1224] Control: 00c5787d  Table: 6617400a  DAC: 00000051
[  103.264396][ T1224] Process pihole-FTL (pid: 1224, stack limit = 0x2d7865ad)
[  103.271408][ T1224] Stack: (0xc61aff90 to 0xc61b0000)
[  103.276436][ T1224] ff80:                                     bebdfa70 00000000 00000000 000f0002
[  103.285271][ T1224] ffa0: c01011e4 c0101000 bebdfa70 00000000 00465000 00760000 00000000 00000014
[  103.294105][ T1224] ffc0: bebdfa70 00000000 00000000 000f0002 00000016 b6d8baac b6fcd9b8 bebdfb1c
[  103.302940][ T1224] ffe0: 00000000 bebdfa60 b6fa8fbc b6fa8fd4 60000010 00465000 00000000 00000000
[  103.311779][ T1224] [<c010e6b0>] (v6_coherent_kern_range) from [<c010635c>] (arm_syscall+0x188/0x3b4)
[  103.320957][ T1224] [<c010635c>] (arm_syscall) from [<c0101000>] (ret_fast_syscall+0x0/0x54)
[  103.329351][ T1224] Exception stack(0xc61affa8 to 0xc61afff0)
[  103.335073][ T1224] ffa0:                   bebdfa70 00000000 00465000 00760000 00000000 00000014
[  103.343905][ T1224] ffc0: bebdfa70 00000000 00000000 000f0002 00000016 b6d8baac b6fcd9b8 bebdfb1c
[  103.352734][ T1224] ffe0: 00000000 bebdfa60 b6fa8fbc b6fa8fd4
[  103.358458][ T1224] Code: ee070f15 e12fff1e e12fff1e e3c0001f (ee070f3a)
[  103.365217][ T1224] ---[ end trace 85d702e36acf4e5d ]---
Segmentation fault

Not sure who to ask for help on this subject, here? or the pihole people?, because this could be a kernel (?) issue.

Anyway, I would appreciate any thoughts on this.
Many thanks in advance ... and happy new year ;)
Re: Unhandled fault: page domain fault (0x81b) at 0x00465000 when starting pihole-FTL on recent kernel
December 31, 2022 02:54PM
kennywest,

I think you should try pihole community (and of course users here who use pihole), to see which kernel versions people have reported running, and whether a segfault was seen lately.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Thank you for your reply. I asked the pi-hole community for help ... let's see if we can find a solution. In the meantime I reverted to the 4.14.198 kernel which seems to work without any problems.
maybe it's too late for a reply, but I've got the same error using kernel 5.4.224-oxnas-tld-1 and after some google search I found

(one) explanation for page_domain_fault.

I changed kernel configuration and removed
CONFIG_CPU_SW_DOMAIN_PAN

After installing the modified kernel I do not see this error anymore.
I'm unsure if this is the right solution but for the moment pihole seems to work.
Hi hej,

> After installing the modified kernel I do not see
> this error anymore.
> I'm unsure if this is the right solution but for
> the moment pihole seems to work.

That's good to know. Look like it is the right diagnosis.

Because CONFIG_CPU_SW_DOMAIN_PAN is a security feature, I think it is pi-hole code that need to be updated..

The poster Stefan recommended a solution:

Quote
https://stackoverflow.com/questions/39515407/how-to-handle-a-page-domain-fault-in-a-self-written-character-device-kernel-modu
As Tsyvarev mentioned, the input buffer needs to be copied from user space to kernel space via copy_from_user. After memcpy is replaced by copy_from_user the module works fine.

Perhaps updating your pi-hole package for Debian to se if it has been fixed?

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)



Edited 1 time(s). Last edit at 04/17/2023 05:10PM by bodhi.
Hi Bodhi,

after a little more googling I found CONFIG_CPU_SW_DOMAIN_PAN breakage on ARM11 MPCore.

Quote
https://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/400752.html
Russel King:
Having thought about this some more, I'm coming to the conclusion that
the only sane solution here is to change the help text for SW_PAN such
that if you want to run a kernel on ARM11 MPcore, you must disable
SW_PAN.

It's a thread from 2016 and I don't know the current state, especially for openWRT.
A quick check shows that CONFIG_CPU_SW_DOMAIN_PAN is enabled for oxnas target, but CONFIG_UACCESS_WITH_MEMCPY is disabled. I've read something about an incompatibility between these two config variables, but this is a (maybe) outdated information, too. https://patchwork.kernel.org/project/linux-arm-kernel/patch/20151203110041.GK8644@n2100.arm.linux.org.uk/

For me disabling CONFIG_CPU_SW_DOMAIN_PAN is my current workaround.


-hermann

BTW, I could install pihole from the pihole repository without compiling a private version of pihole-FTL.
Hi hermann,

> after a little more googling I found
>
> CONFIG_CPU_SW_DOMAIN_PAN breakage on ARM11
> MPCore
.
>
>
Quote
https://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/400752.html
> Russel King:

> Having thought about this some more, I'm coming to
> the conclusion that
> the only sane solution here is to change the help
> text for SW_PAN such
> that if you want to run a kernel on ARM11 MPcore,
> you must disable
> SW_PAN.
>

Ah. That's more like it. It's must be part of the reason that OXNAS has been removed from mainline, because they don't want to fix the SMP supports for OXNAS anymore. ARM11 MPCore has quite a bit of problems. Kernel developers kept finding they have to patch the SMP code to run with ARM11 MPcore, so they gave up on it. Also because there has been no activity on OXNAS in mainline for a long time.

I'll give it a close read.

>
> It's a thread from 2016 and I don't know the
> current state, especially for openWRT.
> A quick check shows that
> CONFIG_CPU_SW_DOMAIN_PAN is enabled for
> oxnas target, but
> CONFIG_UACCESS_WITH_MEMCPY is disabled.
> I've read something about an incompatibility
> between these two config variables, but this is a
> (maybe) outdated information, too.
> https://patchwork.kernel.org/project/linux-arm-kernel/patch/20151203110041.GK8644@n2100.arm.linux.org.uk/

I think the solution to remove memcpy is a sound solution. Sooner or later, pihole should clean up their code to use the kernel standard.

> For me disabling CONFIG_CPU_SW_DOMAIN_PAN
> is my current workaround.
> pihole-FTL.

I'll take a look at OpenWrt kernel config.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Hi hermann,

Could you rebuild your kernel with these:

# CONFIG_UACCESS_WITH_MEMCPY is not set
CONFIG_CPU_SW_DOMAIN_PAN=y
And run pihole.

Without looking at pihole code, I hope it might switch to a different kernel call.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
at the risk of hijacking the thread - i personally would migrate away from pihole and move to a better solution - as shown below

Adguardhome : a self contained binary for arm5 architecture is available if you dont want to do the compiling yourself

or

Blocky : this would need some tweaking to the code

I appreciate that this doesnt contribute to the finding and reporting of optimisation of the Bhodi-acious kernel. but IMHO they are far superior to pihole

:HIJACK

============================
Breaking stuff since 1994 :-)
============================



Edited 1 time(s). Last edit at 04/18/2023 04:15PM by Gravelrash.
Gravelrash,

> at the risk of hijacking the thread

Not at all :)

> - i personally
> would migrate away from pihole and move to a
> better solution - as shown below
>
> Adguardhome
> : a self contained binary for arm5 architecture is
> available if you dont want to do the compiling
> yourself

It should run on armv6 which the Pogo Pro V3 is based on.

> or
>
> Blocky
> : this would need some tweaking to the code
>
> I appreciate that this doesnt contribute to the
> finding and reporting of optimisation of the
> Bhodi-acious kernel. but IMHO they are far
> superior to pihole
>
> :HIJACK

:)) thanks for the suggestion.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
bodhi Wrote:
-------------------------------------------------------
> It should run on armv6 which the Pogo Pro V3 is
> based on.
>
It runs on just about any hardware architecture you can imagine :)

if i was running it on the arm6 versions you support - id be looking to up the swap file to around 756MB and set the swappiness quite aggressively - it is a bit of a ram hog when you do a blocklist update so i would make allowance for it. ibe been testing it on the L-50 and it runs fine and dandy plenty of ram to spare and i only see spikes in the ram usage when i ask it to go get updates .. otherwise it should run well on 128M.. its runs like a champ on the L-50 :))

============================
Breaking stuff since 1994 :-)
============================



Edited 1 time(s). Last edit at 04/18/2023 04:47PM by Gravelrash.
Hi bodhi,

no success!

root@pihole:~# zcat < /proc/config.gz | egrep 'UACCES|DOMAIN_PAN'
CONFIG_CPU_SW_DOMAIN_PAN=y
# CONFIG_UACCESS_WITH_MEMCPY is not set
root@pihole:~#

and

root@pihole:~# strace pihole-FTL
execve("/usr/bin/pihole-FTL", ["pihole-FTL"], 0xbeb10dc0 /* 14 vars */) = 0
brk(NULL)                               = 0x1a75000
uname({sysname="Linux", nodename="pihole.fritz.box", ...}) = 0
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/lib/tls/v6l/libm.so.6", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat64("/usr/local/lib/tls/v6l", 0xbedc51b0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/lib/tls/libm.so.6", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat64("/usr/local/lib/tls", 0xbedc51b0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/lib/v6l/libm.so.6", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat64("/usr/local/lib/v6l", 0xbedc51b0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/local/lib/libm.so.6", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat64("/usr/local/lib", {st_mode=S_IFDIR|S_ISGID|0775, st_size=4096, ...}) = 0
openat(AT_FDCWD, "tls/v6l/libm.so.6", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/libm.so.6", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "v6l/libm.so.6", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "libm.so.6", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=29167, ...}) = 0
mmap2(NULL, 29167, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb6fd9000
close(3)                                = 0
openat(AT_FDCWD, "/lib/arm-linux-gnueabi/libm.so.6", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\300u\0\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0644, st_size=595472, ...}) = 0
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb6fd7000
mmap2(NULL, 659588, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb6f0e000
mprotect(0xb6f9f000, 61440, PROT_NONE)  = 0
mmap2(0xb6fae000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x90000) = 0xb6fae000
close(3)                                = 0
openat(AT_FDCWD, "/usr/local/lib/librt.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/v6l/librt.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/librt.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "v6l/librt.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "librt.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/arm-linux-gnueabi/librt.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0@\26\0\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0644, st_size=30668, ...}) = 0
mmap2(NULL, 94720, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb6ef6000
mprotect(0xb6efc000, 65536, PROT_NONE)  = 0
mmap2(0xb6f0c000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0xb6f0c000
close(3)                                = 0
openat(AT_FDCWD, "/usr/local/lib/libgcc_s.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/v6l/libgcc_s.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/libgcc_s.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "v6l/libgcc_s.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "libgcc_s.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/arm-linux-gnueabi/libgcc_s.so.1", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\370\323\0\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0644, st_size=124628, ...}) = 0
mmap2(NULL, 188840, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb6ec7000
mprotect(0xb6ee5000, 61440, PROT_NONE)  = 0
mmap2(0xb6ef4000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1d000) = 0xb6ef4000
close(3)                                = 0
openat(AT_FDCWD, "/usr/local/lib/libpthread.so.0", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/v6l/libpthread.so.0", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/libpthread.so.0", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "v6l/libpthread.so.0", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "libpthread.so.0", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/arm-linux-gnueabi/libpthread.so.0", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\230M\0\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=141496, ...}) = 0
mmap2(NULL, 180824, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb6e9a000
mprotect(0xb6eb4000, 61440, PROT_NONE)  = 0
mmap2(0xb6ec3000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x19000) = 0xb6ec3000
mmap2(0xb6ec5000, 4696, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb6ec5000
close(3)                                = 0
openat(AT_FDCWD, "/usr/local/lib/libc.so.6", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/v6l/libc.so.6", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "tls/libc.so.6", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "v6l/libc.so.6", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "libc.so.6", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/arm-linux-gnueabi/libc.so.6", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\\y\1\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1340264, ...}) = 0
mmap2(NULL, 1409596, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb6d41000
mprotect(0xb6e85000, 61440, PROT_NONE)  = 0
mmap2(0xb6e94000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x143000) = 0xb6e94000
mmap2(0xb6e97000, 8764, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb6e97000
close(3)                                = 0
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb6fd5000
set_tls(0xb6fd54d0)                     = 0
mprotect(0xb6e94000, 8192, PROT_READ)   = 0
mprotect(0xb6ec3000, 4096, PROT_READ)   = 0
mprotect(0xb6ef4000, 4096, PROT_READ)   = 0
mprotect(0xb6f0c000, 4096, PROT_READ)   = 0
mprotect(0xb6fae000, 4096, PROT_READ)   = 0
mprotect(0x4bb000, 3141632, PROT_READ|PROT_WRITE|PROT_EXEC) = 0
mprotect(0x4bb000, 3141632, PROT_READ|PROT_EXEC) = 0
cacheflush(0x4bb000, 0x7ba000, 0)       = ?
+++ killed by SIGSEGV +++
Segmentation fault

so I will stay with CONFIG_CPU_SW_DOMAIN_PAN disabled

-hermann
> so I will stay with CONFIG_CPU_SW_DOMAIN_PAN
> disabled

Sounds fine. Trading that security feature for more flexibility is not too bad. As long as you don't run strange software (strictly run Debian packages), then it's OK.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Hello bodhi,

I have to resurrect this thread. Today I updated my Pogoplug Pro from Debian Bullseye 11.8 to 11.9 via normal apt update & upgrade. I encountered several segfaults during the installation (configuration) of the Debian packages. With dmesg I got several errors like this:

[  278.326932] 8<--- cut here ---
[  278.330680] Unhandled fault: page domain fault (0x81b) at 0xb6cf9000
[  278.337692] pgd = 492709c5
[  278.341076] [b6cf9000] *pgd=626a1831, *pte=6121855f, *ppte=61218c7e
[  278.348011] Internal error: : 81b [#22] PREEMPT SMP ARM
[  278.353896] Modules linked in: dwmac_generic uas
[  278.359191] CPU: 1 PID: 1867 Comm: grep Tainted: G      D           5.4.224-oxnas-tld-1 #1.0
[  278.368273] Hardware name: Generic DT based system
[  278.373742] PC is at v6_coherent_kern_range+0x4/0x2c
[  278.379376] LR is at arm_syscall+0x188/0x3bc
[  278.384312] pc : [<c010e6d0>]    lr : [<c0106364>]    psr: 40000013
[  278.391237] sp : c269ff90  ip : 00000000  fp : 00000000
[  278.397125] r10: 00000000  r9 : c269e000  r8 : c01011e4
[  278.403013] r7 : b6cf9168  r6 : b6cf9168  r5 : c269e000  r4 : 00000002
[  278.410196] r3 : 00000160  r2 : c68f1c00  r1 : b6cf9168  r0 : b6cf9000
[  278.417382] Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[  278.425172] Control: 00c5787d  Table: 6269800a  DAC: 00000051
[  278.431577] Process grep (pid: 1867, stack limit = 0xd954a5e2)
[  278.438069] Stack: (0xc269ff90 to 0xc26a0000)
[  278.443097] ff80:                                     b6cf9008 ea000000 b6cf9168 000f0002
[  278.451932] ffa0: c01011e4 c0101000 b6cf9008 ea000000 b6cf9008 b6cf9168 00000000 00000160
[  278.460766] ffc0: b6cf9008 ea000000 b6cf9168 000f0002 005150c8 bedb6ac0 00000000 00000000
[  278.469601] ffe0: 000003ff bedb6a8c b6ecd1dc b6ef5a0c 60000010 b6cf9008 00000000 00000000
[  278.478443] [<c010e6d0>] (v6_coherent_kern_range) from [<c0106364>] (arm_syscall+0x188/0x3bc)
[  278.487626] [<c0106364>] (arm_syscall) from [<c0101000>] (ret_fast_syscall+0x0/0x54)
[  278.496017] Exception stack(0xc269ffa8 to 0xc269fff0)
[  278.501739] ffa0:                   b6cf9008 ea000000 b6cf9008 b6cf9168 00000000 00000160
[  278.510571] ffc0: b6cf9008 ea000000 b6cf9168 000f0002 005150c8 bedb6ac0 00000000 00000000
[  278.519400] ffe0: 000003ff bedb6a8c b6ecd1dc b6ef5a0c
[  278.525125] Code: ee070f15 e12fff1e e12fff1e e3c0001f (ee070f3a)
[  278.531883] ---[ end trace 416116106cc2f7ee ]---

I found this thread and realised I was running kernel 5.4.101. I updated to your latest 5.4.224 but these segfaults remain! With Debian 11.8 everything went smooth – even with the old kernel –, but the update to 11.9 seems to break something. I'm using solely packages from the Debian repositories so your advice to strictly run Debian packages seems to be broken now?

Maybe you can provide a custom 5.4-LTS-kernel with CONFIG_CPU_SW_DOMAIN_PAN disabled? I think most people are running these devices in private networks without direct connection to the internet. Me for example I'm doing backups of a public internet server via ssh/rsync to an USB drive on the Pogoplug. No need for higher security...

Thanks,
Andreas
Re: Unhandled fault: page domain fault (0x81b) at 0x00465000 when starting pihole-FTL on recent kernel
February 10, 2024 10:47PM
Andreas,

Quote

have to resurrect this thread. Today I updated my Pogoplug Pro from Debian Bullseye 11.8 to 11.9 via normal apt update & upgrade. I encountered several segfaults during the installation (configuration) of the Debian packages. With dmesg I got several errors like this

Sounds like a swap problem. What is your swap space? you do need swap to run apt upgrade.
free -h

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Hi bodhi,

that's a fast reply. But it's not that easy, output from "free -h":

               total        used        free      shared  buff/cache   available
Mem:           115Mi        15Mi        23Mi       0.0Ki        75Mi        89Mi
Swap:          127Mi       1.0Mi       126Mi

Swap as well as general memory was never a problem – there is almost nothing running on this device. Maybe the flash stick is dying and the coincidence with the update is just bad luck.

That's the terminal output of the update to 11.9:

Log started: 2024-02-11  04:15:22
Preparing to unpack .../base-files_11.1+deb11u9_armel.deb ...
Unpacking base-files (11.1+deb11u9) over (11.1+deb11u8) ...
Setting up base-files (11.1+deb11u9) ...
Installing new version of config file /etc/debian_version ...
Preparing to unpack .../libperl5.32_5.32.1-4+deb11u3_armel.deb ...
Unpacking libperl5.32:armel (5.32.1-4+deb11u3) over (5.32.1-4+deb11u2) ...
Preparing to unpack .../perl_5.32.1-4+deb11u3_armel.deb ...
Unpacking perl (5.32.1-4+deb11u3) over (5.32.1-4+deb11u2) ...
Preparing to unpack .../perl-base_5.32.1-4+deb11u3_armel.deb ...
Unpacking perl-base (5.32.1-4+deb11u3) over (5.32.1-4+deb11u2) ...
Setting up perl-base (5.32.1-4+deb11u3) ...
Preparing to unpack .../perl-modules-5.32_5.32.1-4+deb11u3_all.deb ...
Unpacking perl-modules-5.32 (5.32.1-4+deb11u3) over (5.32.1-4+deb11u2) ...
Preparing to unpack .../libc6_2.31-13+deb11u8_armel.deb ...
Unpacking libc6:armel (2.31-13+deb11u8) over (2.31-13+deb11u7) ...
Setting up libc6:armel (2.31-13+deb11u8) ...
Preparing to unpack .../tar_1.34+dfsg-1+deb11u1_armel.deb ...
Unpacking tar (1.34+dfsg-1+deb11u1) over (1.34+dfsg-1) ...
Setting up tar (1.34+dfsg-1+deb11u1) ...
Preparing to unpack .../libc-bin_2.31-13+deb11u8_armel.deb ...
Unpacking libc-bin (2.31-13+deb11u8) over (2.31-13+deb11u7) ...
Setting up libc-bin (2.31-13+deb11u8) ...
Preparing to unpack .../libgnutls-openssl27_3.7.1-5+deb11u4_armel.deb ...
Unpacking libgnutls-openssl27:armel (3.7.1-5+deb11u4) over (3.7.1-5+deb11u3) ...
Preparing to unpack .../libgnutls30_3.7.1-5+deb11u4_armel.deb ...
Unpacking libgnutls30:armel (3.7.1-5+deb11u4) over (3.7.1-5+deb11u3) ...
Setting up libgnutls30:armel (3.7.1-5+deb11u4) ...
Preparing to unpack .../0-tzdata_2024a-0+deb11u1_all.deb ...
Unpacking tzdata (2024a-0+deb11u1) over (2021a-1+deb11u11) ...
Preparing to unpack .../1-libc-l10n_2.31-13+deb11u8_all.deb ...
Unpacking libc-l10n (2.31-13+deb11u8) over (2.31-13+deb11u7) ...
Preparing to unpack .../2-locales_2.31-13+deb11u8_all.deb ...
Unpacking locales (2.31-13+deb11u8) over (2.31-13+deb11u7) ...
Preparing to unpack .../3-debian-security-support_1%3a11+2024.01.30_all.deb ...
Unpacking debian-security-support (1:11+2024.01.30) over (1:11+2023.05.04) ...
Preparing to unpack .../4-libglib2.0-0_2.66.8-1+deb11u1_armel.deb ...
Unpacking libglib2.0-0:armel (2.66.8-1+deb11u1) over (2.66.8-1) ...
Preparing to unpack .../5-libzmq5_4.3.4-1+deb11u1_armel.deb ...
Unpacking libzmq5:armel (4.3.4-1+deb11u1) over (4.3.4-1) ...
Preparing to unpack .../6-usb.ids_2024.01.20-0+deb11u1_all.deb ...
Unpacking usb.ids (2024.01.20-0+deb11u1) over (2023.01.16-0+deb11u1) ...
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Setting up libc-l10n (2.31-13+deb11u8) ...
Setting up libgnutls-openssl27:armel (3.7.1-5+deb11u4) ...
Setting up libzmq5:armel (4.3.4-1+deb11u1) ...
Setting up libglib2.0-0:armel (2.66.8-1+deb11u1) ...
No schema files found: doing nothing.
Setting up perl-modules-5.32 (5.32.1-4+deb11u3) ...
Setting up locales (2.31-13+deb11u8) ...
Generating locales (this might take a while)...
  en_US.UTF-8... done
Generation complete.
Setting up tzdata (2024a-0+deb11u1) ...

Current default time zone: 'Europe/Berlin'
Local time is now:      Sun Feb 11 04:21:04 CET 2024.
Universal Time is now:  Sun Feb 11 03:21:04 UTC 2024.
Run 'dpkg-reconfigure tzdata' if you wish to change it.

Setting up usb.ids (2024.01.20-0+deb11u1) ...
Setting up libperl5.32:armel (5.32.1-4+deb11u3) ...
Setting up debian-security-support (1:11+2024.01.30) ...
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Setting up perl (5.32.1-4+deb11u3) ...
Processing triggers for libc-bin (2.31-13+deb11u8) ...
Processing triggers for man-db (2.9.4-2) ...
Processing triggers for mailcap (3.69) ...
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Log ended: 2024-02-11  04:21:25
Re: Unhandled fault: page domain fault (0x81b) at 0x00465000 when starting pihole-FTL on recent kernel
February 11, 2024 12:44AM
Hi Andreas,

> Swap as well as general memory was never a problem
> – there is almost nothing running on this
> device. Maybe the flash stick is dying and the
> coincidence with the update is just bad luck.

It's hard to say what's the problem here. Debian 11.x is considered too old now.

So I guess it's time for me to release a new 5.4.x kernel for this box! It'll be a few days before I can get back to my normal routine after a long trip, and will release the latest kernel (with CONFIG_CPU_SW_DOMAIN_PAN deselected).

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Hi bodhi,

thank you again for your work. I upgraded to your new kernel release 5.4.268 and the seg faults went away.

I saw the last when installing the kernel deb-package. After a reboot, since February postponed Debian updates went smooth through. The new kernel seems to do the job and I suppose the deselection of the CONFIG_CPU_SW_DOMAIN_PAN option is the solution now.

But I'm still wondering what changed between Debian 11.8 and 11.9 that these errors suddenly kicked in? And I'm afraid that my Debian image is somewhat broken because these seg faults occurred during package updates. The file system is unaffected and every new and updated file seems to be in place but what can go wrong when some package set ups and triggers failed?

But never mind. If needed I will use the last rootfs to renew my flash drive.

It was a pleasure to get your support again.
Andreas
@Andreas Heyer,

> Hi bodhi,
>
> thank you again for your work. I upgraded to your
> new kernel release 5.4.268 and the seg faults went
> away.
>
> I saw the last when installing the kernel
> deb-package. After a reboot, since February
> postponed Debian updates went smooth through. The
> new kernel seems to do the job and I suppose the
> deselection of the CONFIG_CPU_SW_DOMAIN_PAN option
> is the solution now.
>
> But I'm still wondering what changed between
> Debian 11.8 and 11.9 that these errors suddenly
> kicked in? And I'm afraid that my Debian image is
> somewhat broken because these seg faults occurred
> during package updates. The file system is
> unaffected and every new and updated file seems to
> be in place but what can go wrong when some
> package set ups and triggers failed?

I think we are OK. CONFIG_CPU_SW_DOMAIN_PAN triggered the problem in the OXNAS kernel SMP code. And that could happen at any time. Especially Debian upgrade is quite intensive in CPU and memory.

It is a systemic problem with ARM11 core, and so that was one of the main reasons the OXNAS SoC has been removed in mainline Linux.

Just doing rootfs back up before upgrading would be enough to be safe with kernel 5.4.x.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Author:

Your Email:


Subject:


Spam prevention:
Please, enter the code that you see below in the input field. This is for blocking bots that try to post this form automatically. If the code is hard to read, then just try to guess it right. If you enter the wrong code, a new image is created and you get another chance to enter it right.
Message: