Welcome! Log In Create A New Profile

Advanced

System has not been booted with systemd as init system (PID 1). Can't operate. Failed to connect to bus: Host is down

Posted by bluzfanmr1 
Quote
System has not been booted with systemd as init system (PID 1). Can't operate. Failed to connect to bus: Host is down

I have a few Pogoplug Pro's and I keep having this same issue as stated above. It seemed to start when things went from 4.xx to 5.xx and hopefully someone smarter than I can help with a solution. After I get that message I can no longer ssh to the Pogoplug.

I have started from scratch many times and install Debian-4.14.180-oxnas-tld-1-rootfs-bodhi.tar.bz2 on the Pogoplug Pro. Everything seems to run well and I upgrade to Bullseye and the latest kernel, linux-5.4.224-oxnas-tld-1.bodhi.tar.bz2. I am using the Pogoplug Pro for Time Machine backups so I install netatalk and avahi-daemon and the backups work fine.

The message above starts after I create a swap file and reboot the machine. All this is done via ssh. The Pogoplug seems to reboot and come up, but I can no longer connect to it via ssh. My router shows it is connected and has an ip, though I cannot successfully ping that ip or log in. I end up starting from scratch again, hoping to pin down what it is but I'm not getting anywhere. I can connect the drive to another box and access everything but I don't know what to do to fix this.

Anybody run across this? Thanks for any help you may provide.
bluzfanmr1,

> I have started from scratch many times and install
> Debian-4.14.180-oxnas-tld-1-rootfs-bodhi.tar.bz2
> on the Pogoplug Pro. Everything seems to run well
> and I upgrade to Bullseye and the latest kernel,
> linux-5.4.224-oxnas-tld-1.bodhi.tar.bz2. I am
> using the Pogoplug Pro for Time Machine backups so
> I install netatalk and avahi-daemon and the
> backups work fine.

At this point, you could make a backup of the USB rootfs and then restore it later if and when the next step go bad.

> The message above starts after I create a swap
> file and reboot the machine. All this is done via
> ssh. The Pogoplug seems to reboot and come up,
> but I can no longer connect to it via ssh.

If you have serial console connected, the boot log might give some hints why. If you have not done this, it is quite easy (no soldering needed, just buy a serial module converter). And if you already have an rPi then you just need 3 wires (it has built-in converter).

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Forgot I could do that! Here's what's showing. The really weird thing is everything is normal via serial and I can log in just fine, but I cannot log in via SSH from my Macbook any more. I mentioned the swap file earlier but I don't think that's it. I removed the swap file via serial but still cannot SSH.

Just to note, I could login via SSH from my Macbook Air all along for years until I updated the kernel to the latest versions of 5.xx. I also have a few Pogoplug E02's using Kirkwood and they are working just fine and I can log into them via my Macbook Air.

Thank you for the help.

OX820> printenv
bootcmd=run bootcmd_uenv; run scan_disk; run set_bootargs; run bootcmd_exec; reset
bootcmd_exec=run load_uimage; if run load_initrd; then if run load_dtb; then bootm 0x60500000 0x60e00000 0x62c00000; else bootm 0x60500000 0x60e00000; fi; else if run load_dtb; then bootm 0x60500000 - 0x62c00000; else bootm 0x60500000; fi; fi
bootcmd_uenv=run uenv_load; if test $uenv_loaded -eq 1; then run uenv_import; fi
bootdelay=10
bootdev=usb
device=0:1
devices=usb ide
disks=0 1 2 3
dtb_file=/boot/dts/ox820-pogoplug-pro.dtb
ethact=dwmac.40400000
ethaddr=00:25:31:xx:xx:xx
if_netconsole=ping $serverip
ipaddr=10.1.1.197
load_dtb=echo loading DTB $dtb_file ...; load $bootdev $device 0x62c00000 $dtb_file
load_initrd=echo loading uInitrd ...; load $bootdev $device 0x60e00000 /boot/uInitrd
load_uimage=echo loading uImage ...; load $bootdev $device 0x60500000 /boot/uImage
mainlineLinux=yes
mtdids=nand0=41000000.nand
mtdparts=mtdparts=41000000.nand-controller:14m(boot),-(data)
nc_ready=0
preboot=run preboot_nc
preboot_nc=setenv nc_ready 0; for pingstat in 1 2 3 4 5; do; sleep 1; if run if_netconsole; then setenv nc_ready 1; fi; done; if test $nc_ready -eq 1; then run start_netconsole; fi
scan_disk=echo running scan_disk ...; scan_done=0; setenv scan_usb "usb start";  setenv scan_ide "ide reset";  setenv scan_mmc "mmc rescan"; for dev in $devices; do if test $scan_done -eq 0; then echo Scan device $dev; run scan_$dev; for disknum in $disks; do if test $scan_done -eq 0; then echo device $dev $disknum:1; if load $dev $disknum:1 0x60500000 /boot/uImage 1; then scan_done=1; echo Found bootable drive on $dev $disknum; setenv device $disknum:1; setenv bootdev $dev; fi; fi; done; fi; done
serverip=10.1.1.1
set_bootargs=setenv bootargs console=ttyS0,115200 root=LABEL=rootfs rootdelay=10 $mtdparts $custom_params
start_netconsole=setenv ncip $serverip; setenv bootdelay 10; setenv stdin nc; setenv stdout nc; setenv stderr nc; version;
stderr=serial
stdin=serial
stdout=serial
uenv_import=echo importing envs ...; env import -t 0x60510000  $filesize
uenv_init_devices=setenv init_usb "usb start";  setenv init_ide "ide reset";  setenv init_mmc "mmc rescan"; for devtype in $devices; do run init_$devtype; done;
uenv_load=run uenv_init_devices; setenv uenv_loaded 0; for devtype in $devices;  do for disknum in 0; do run uenv_read_disk; done; done;
uenv_read=echo loading envs from $devtype $disknum ...; if load $devtype $disknum:1 0x60500000 /boot/uEnv.txt; then setenv uenv_loaded 1; fi
uenv_read_disk=if test $devtype -eq mmc; then if $devtype part; then run uenv_read;  fi; else if $devtype part $disknum; then run uenv_read; fi;  fi
usb_custom_params=zswap.enabled=1
usb_ready_retry=15

Environment size: 2761/131068 bytes
OX820>

U-Boot SPL 2013.10-tld-4 (Sep 07 2014 - 14:10:12)
  Boot device: NAND
Attempting to set PLLA to 850 MHz ...
  plla_ctrl0 : 0000020a
  plla_ctrl1 : 00330000
  plla_ctrl2 : 0065008b
  plla_ctrl3 : 000000f1

PLLA Set


U-Boot 2015.10-tld-2 (Oct 21 2017 - 22:00:02 -0700)
OXNAS OX820

DRAM:  128 MiB
NAND:  128 MiB
In:    serial
Out:   serial
Err:   serial
Net:   dwmac.40400000
Speed: 1000, full duplex
Using dwmac.40400000 device
ping failed; host 10.1.1.1 is not alive
Speed: 1000, full duplex
Using dwmac.40400000 device
host 10.1.1.1 is alive
Speed: 1000, full duplex
Using dwmac.40400000 device
host 10.1.1.1 is alive
Speed: 1000, full duplex
Using dwmac.40400000 device
host 10.1.1.1 is alive
Speed: 1000, full duplex
Using dwmac.40400000 device
host 10.1.1.1 is alive
Uncompressing Linux... done, booting the kernel.
[    0.000000][    T0] Booting Linux on physical CPU 0x0
[    0.000000][    T0] Linux version 5.4.224-oxnas-tld-1 (root@tldDebian) (gcc version 10.2.1 20210110 (Debian 10.2.1-6)) #1.0 SMP PREEMPT Mon Nov 14 15:48:26 PST 2022
[    0.000000][    T0] CPU: ARMv6-compatible processor [410fb025] revision 5 (ARMv7), cr=00c5787d
[    0.000000][    T0] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
[    0.000000][    T0] OF: fdt: Machine model: Pogoplug V3 Pro
[    0.000000][    T0] Memory policy: Data cache writealloc
[    0.000000][    T0] percpu: Embedded 15 pages/cpu s30636 r8192 d22612 u61440
[    0.000000][    T0] Built 1 zonelists, mobility grouping on.  Total pages: 32512
[    0.000000][    T0] Kernel command line: console=ttyS0,115200 root=LABEL=rootfs rootdelay=10 mtdparts=41000000.nand-controller:14m(boot),-(data)
[    0.000000][    T0] Dentry cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
[    0.000000][    T0] Inode-cache hash table entries: 8192 (order: 3, 32768 bytes, linear)
[    0.000000][    T0] mem auto-init: stack:off, heap alloc:on, heap free:off
[    0.000000][    T0] Memory: 110676K/131072K available (8192K kernel code, 234K rwdata, 2068K rodata, 1024K init, 258K bss, 20396K reserved, 0K cma-reserved)
[    0.000000][    T0] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
[    0.000000][    T0] rcu: Preemptible hierarchical RCU implementation.
[    0.000000][    T0]  Tasks RCU enabled.
[    0.000000][    T0] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
[    0.000000][    T0] NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
[    0.000000][    T0] FPGA IRQ chip 0 "interrupt-controller" @ (ptrval), 32 irqs, parent IRQ: 16
[    0.000020][    T0] sched_clock: 24 bits at 390kHz, resolution 2560ns, wraps every 21474835200ns
[    0.000051][    T0] clocksource: oxnas_rps_clocksource_timer: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 19112603332 ns
[    0.000066][    T0] timer_oxnas_rps: Registered clocksource rate 390625Hz
[    0.000094][    T0] timer_oxnas_rps: Registered clock event rate 6250000Hz prescaler 0 period 62500
[    0.000256][    T0] GIC: PPI13 is secure or misconfigured
[    0.000314][    T0] GIC: PPI13 is secure or misconfigured
[    0.000634][    T0] Console: colour dummy device 80x30
[    0.000711][    T0] Calibrating delay loop... 339.14 BogoMIPS (lpj=1695744)
[    0.050403][    T0] pid_max: default: 32768 minimum: 301
[    0.051115][    T0] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.051141][    T0] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.051412][    T0] *** VALIDATE tmpfs ***
[    0.052769][    T0] *** VALIDATE proc ***
[    0.052989][    T0] *** VALIDATE cgroup1 ***
[    0.053015][    T0] *** VALIDATE cgroup2 ***
[    0.053212][    T0] CPU: Testing write buffer coherency: ok
[    0.055677][    T1] Setting up static identity map for 0x60100000 - 0x60100038
[    0.056010][    T1] rcu: Hierarchical SRCU implementation.
[    0.057044][    T1] smp: Bringing up secondary CPUs ...
[    0.058337][    T0] GIC: PPI13 is secure or misconfigured
[    0.110668][    T1] smp: Brought up 1 node, 2 CPUs
[    0.110707][    T1] SMP: Total of 2 processors activated (678.29 BogoMIPS).
[    0.111892][    T1] devtmpfs: initialized
[    0.118371][    T1] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.118435][    T1] futex hash table entries: 512 (order: 2, 16384 bytes, linear)
[    0.118589][    T1] prandom: seed boundary self test passed
[    0.122892][    T1] prandom: 100 self tests passed
[    0.122913][    T1] pinctrl core: initialized pinctrl subsystem
[    0.125012][    T1] NET: Registered protocol family 16
[    0.126796][    T1] DMA: preallocated 256 KiB pool for atomic coherent allocations
[    0.129912][    T1] cpuidle: using governor ladder
[    0.129991][    T1] cpuidle: using governor menu
[    6.082112][    T1] vgaarb: loaded
[    6.083072][    T1] SCSI subsystem initialized
[    6.083921][    T1] usbcore: registered new interface driver usbfs
[    6.084078][    T1] usbcore: registered new interface driver hub
[    6.084160][    T1] usbcore: registered new device driver usb
[    6.084462][    T1] pps_core: LinuxPPS API ver. 1 registered
[    6.084480][    T1] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[    6.084513][    T1] PTP clock support registered
[    6.086236][    T1] clocksource: Switched to clocksource oxnas_rps_clocksource_timer
[    6.086292][    T1] *** VALIDATE bpf ***
[    6.086809][    T1] *** VALIDATE ramfs ***
[    6.106667][    T1] NET: Registered protocol family 2
[    6.106928][    T1] IP idents hash table entries: 2048 (order: 2, 16384 bytes, linear)
[    6.107947][    T1] tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 6144 bytes, linear)
[    6.108267][    T1] TCP established hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    6.108318][    T1] TCP bind hash table entries: 1024 (order: 1, 8192 bytes, linear)
[    6.108354][    T1] TCP: Hash tables configured (established 1024 bind 1024)
[    6.108595][    T1] UDP hash table entries: 256 (order: 1, 8192 bytes, linear)
[    6.108651][    T1] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes, linear)
[    6.108966][    T1] NET: Registered protocol family 1
[    6.110220][    T1] RPC: Registered named UNIX socket transport module.
[    6.110246][    T1] RPC: Registered udp transport module.
[    6.110259][    T1] RPC: Registered tcp transport module.
[    6.110272][    T1] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    6.110295][    T1] PCI: CLS 0 bytes, default 32
[    6.110753][    T1] Unpacking initramfs...
[    6.845189][    T1] Freeing initrd memory: 6288K
[    6.846673][    T1] Initialise system trusted keyrings
[    6.846801][    T1] Key type blacklist registered
[    6.847234][    T1] workingset: timestamp_bits=30 max_order=15 bucket_order=0
[    6.860152][    T1] zbud: loaded
[    6.865914][    T1] NFS: Registering the id_resolver key type
[    6.866004][    T1] Key type id_resolver registered
[    6.866022][    T1] Key type id_legacy registered
[    6.866053][    T1] nfs4filelayout_init: NFSv4 File Layout Driver Registering...
[    6.866071][    T1] nfs4flexfilelayout_init: NFSv4 Flexfile Layout Driver Registering...
[    6.866086][    T1] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
[    6.867765][    T1] fuse: init (API version 7.31)
[    6.867985][    T1] *** VALIDATE fuse ***
[    6.868016][    T1] *** VALIDATE fuse ***
[    6.868986][    T1] orangefs_debugfs_init: called with debug mask: :none: :0:
[    6.869347][    T1] orangefs_init: module version upstream loaded
[    6.918599][    T1] Key type asymmetric registered
[    6.918632][    T1] Asymmetric key parser 'x509' registered
[    6.918727][    T1] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 248)
[    6.926295][    T1] Serial: 8250/16550 driver, 1 ports, IRQ sharing disabled
[    6.927918][    T1] printk: console [ttyS0] disabled
[    6.928089][    T1] 44200000.serial: ttyS0 at MMIO 0x44200000 (irq = 55, base_baud = 390625) is a 16550A
[    7.566059][    T1] printk: console [ttyS0] enabled
[    7.586265][    T1] loop: module loaded
[    7.592476][    T1] scsi host0: sata_oxnas
[    7.597358][    T1] ata1: SATA max UDMA/133 irq 57
[    7.602268][   T75] sata_oxnas: resetting SATA core
[    7.666593][    T1] Could not find a valid ONFI parameter page, trying bit-wise majority to recover it
[    7.676006][    T1] ONFI parameter recovery failed, aborting
[    7.681715][    T1] nand: device found, Manufacturer ID: 0xad, Chip ID: 0xf1
[    7.688770][    T1] nand: Hynix NAND 128MiB 3,3V 8-bit
[    7.693890][    T1] nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
[    7.702228][    T1] Scanning device for bad blocks
[    7.788794][    T1] 2 cmdlinepart partitions found on MTD device 41000000.nand-controller
[    7.796974][    T1] Creating 2 MTD partitions on "41000000.nand-controller":
[    7.804001][    T1] 0x000000000000-0x000000e00000 : "boot"
[    7.810485][    T1] 0x000000e00000-0x000008000000 : "data"
[    7.818137][    T1] oxnas-dwmac 40400000.ethernet: IRQ eth_lpi not found
[    7.825021][    T1] oxnas-dwmac 40400000.ethernet: PTP uses main clock
[    7.831608][    T1] oxnas-dwmac 40400000.ethernet: no reset control found
[    7.906677][    T1] oxnas-dwmac 40400000.ethernet: User ID: 0x12, Synopsys ID: 0x35
[    7.914321][    T1] oxnas-dwmac 40400000.ethernet:   DWMAC1000
[    7.920102][    T1] oxnas-dwmac 40400000.ethernet: DMA HW capability register supported
[    7.928107][    T1] oxnas-dwmac 40400000.ethernet: RX Checksum Offload Engine supported
[    7.936079][    T1] oxnas-dwmac 40400000.ethernet: COE Type 2
[    7.941829][    T1] oxnas-dwmac 40400000.ethernet: TX Checksum insertion supported
[    7.948116][   T75] ata1: SATA link down (SStatus 0 SControl 310)
[    7.949396][    T1] oxnas-dwmac 40400000.ethernet: Wake-Up On Lan supported
[    7.962393][    T1] oxnas-dwmac 40400000.ethernet: Enhanced/Alternate descriptors
[    7.969873][    T1] oxnas-dwmac 40400000.ethernet: Enabled extended descriptors
[    7.977185][    T1] oxnas-dwmac 40400000.ethernet: Ring mode enabled
[    7.983513][    T1] oxnas-dwmac 40400000.ethernet: Enable RX Mitigation via HW Watchdog Timer
[    7.992064][    T1] oxnas-dwmac 40400000.ethernet: device MAC address 72:2b:2e:7a:56:4a
[    8.044894][    T1] mdio_bus stmmac-0:00: attached PHY driver [unbound] (mii_bus:phy_addr=stmmac-0:00, irq=POLL)
[    8.055144][    T1] mdio_bus stmmac-0:03: attached PHY driver [unbound] (mii_bus:phy_addr=stmmac-0:03, irq=POLL)
[    8.066493][    T1] PPP generic driver version 2.4.2
[    8.071854][    T1] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    8.079170][    T1] ehci-pci: EHCI PCI platform driver
[    8.286295][    T1] oxnas-ehci 40200100.ehci: EHCI Host Controller
[    8.292561][    T1] oxnas-ehci 40200100.ehci: new USB bus registered, assigned bus number 1
[    8.301199][    T1] oxnas-ehci 40200100.ehci: irq 52, io mem 0x40200100
[    8.336296][    T1] oxnas-ehci 40200100.ehci: USB 2.0 started, EHCI 1.00
[    8.343303][    T1] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 5.04
[    8.352296][    T1] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    8.360217][    T1] usb usb1: Product: EHCI Host Controller
[    8.365767][    T1] usb usb1: Manufacturer: Linux 5.4.224-oxnas-tld-1 ehci_hcd
[    8.372989][    T1] usb usb1: SerialNumber: 40200100.ehci
[    8.379386][    T1] hub 1-0:1.0: USB hub found
[    8.383895][    T1] hub 1-0:1.0: 2 ports detected
[    8.389872][    T1] usbcore: registered new interface driver cdc_wdm
[    8.396492][    T1] usbcore: registered new interface driver usb-storage
[    8.403397][    T1] usbcore: registered new interface driver usbserial_generic
[    8.410688][    T1] usbserial: USB Serial support registered for generic
[    8.418048][    T1] mousedev: PS/2 mouse device common for all mice
[    8.424422][    T1] i2c /dev entries driver
[    8.430187][    T1] hidraw: raw HID events driver (C) Jiri Kosina
[    8.436523][    T1] usbcore: registered new interface driver usbhid
[    8.442762][    T1] usbhid: USB HID core driver
[    8.449259][    T1] NET: Registered protocol family 10
[    8.456340][    T1] Segment Routing with IPv6
[    8.460851][    T1] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[    8.469806][    T1] NET: Registered protocol family 17
[    8.475353][    T1] Key type dns_resolver registered
[    8.481052][    T1] registered taskstats version 1
[    8.485831][    T1] Loading compiled-in X.509 certificates
[    8.491627][    T1] zswap: loaded using pool lzo/zbud
[    8.497351][    T1] Key type ._fscrypt registered
[    8.502044][    T1] Key type .fscrypt registered
[    8.596016][    T1] Key type big_key registered
[    8.639994][    T1] Key type encrypted registered
[    8.676318][   T21] usb 1-1: new high-speed USB device number 2 using oxnas-ehci
[    8.878182][   T21] usb 1-1: New USB device found, idVendor=05e3, idProduct=0608, bcdDevice=77.64
[    8.887098][   T21] usb 1-1: New USB device strings: Mfr=0, Product=1, SerialNumber=0
[    8.894924][   T21] usb 1-1: Product: USB2.0 Hub
[    8.899601][   T87] oxnas-pcie 47c00000.pcie: PCIe version/deviceID 0x82510b5
[    8.907220][   T87] oxnas-pcie 47c00000.pcie: link up
[    8.912849][   T87] oxnas-pcie 47c00000.pcie: PCI host bridge to bus 0000:00
[    8.920230][   T21] hub 1-1:1.0: USB hub found
[    8.924736][   T87] pci_bus 0000:00: root bus resource [mem 0x48000000-0x49ffffff]
[    8.932392][   T21] hub 1-1:1.0: 4 ports detected
[    8.937185][   T87] pci_bus 0000:00: root bus resource [mem 0x4a000000-0x4bdfffff pref]
[    8.945277][   T87] pci_bus 0000:00: root bus resource [io  0x0000-0xfffff]
[    8.952916][   T87] pci_bus 0000:00: root bus resource [bus 00-7f]
[    8.959278][   T87] pci 0000:00:00.0: [1814:3090] type 00 class 0x028000
[    8.966100][   T87] pci 0000:00:00.0: reg 0x10: [mem 0x00000000-0x0000ffff]
[    8.975088][   T87] PCI: bus0: Fast back to back transfers disabled
[    8.981447][   T87] pci 0000:00:00.0: BAR 0: assigned [mem 0x48000000-0x4800ffff]
[    8.989752][    T1] hctosys: unable to open rtc device (rtc0)
[    8.997611][    T1] Freeing unused kernel memory: 1024K
[    9.004121][    T1] Checked W+X mappings: passed, no W+X pages found
[    9.010508][    T1] Run /init as init process
Loading, please wait...
Starting version 247.3-7+deb11u1
[    9.286336][   T87] usb 1-1.3: new high-speed USB device number 3 using oxnas-ehci
[    9.518727][   T87] usb 1-1.3: New USB device found, idVendor=152d, idProduct=0578, bcdDevice= 2.04
[    9.536345][   T87] usb 1-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[    9.544335][   T87] usb 1-1.3: Product: External USB 3.0
[    9.566359][   T87] usb 1-1.3: Manufacturer: JMicron
[    9.571317][   T87] usb 1-1.3: SerialNumber: 201703310007F
[    9.695828][  T117] scsi host1: uas
[    9.704294][   T26] scsi 1:0:0:0: Direct-Access     External USB3.0           0204 PQ: 0 ANSI: 6
[    9.714342][  T117] usbcore: registered new interface driver uas
[    9.730552][    T7] sd 1:0:0:0: [sda] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
[    9.752389][    T7] sd 1:0:0:0: [sda] 4096-byte physical blocks
[    9.761305][    T7] sd 1:0:0:0: [sda] Write Protect is off
[    9.778920][    T7] sd 1:0:0:0: [sda] Disabling FUA
[    9.783810][    T7] sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    9.831672][    T7] sd 1:0:0:0: [sda] Optimal transfer size 33553920 bytes not a multiple of physical block size (4096 bytes)
[   10.358120][    T7]  sda: sda1 sda2
[   10.369431][    T7] sd 1:0:0:0: [sda] Attached SCSI disk
Begin: Loading essential drivers ... done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
Begin: Running /scripts/local-premount ... done.
Begin: Will now check root file system ... fsck from util-linux 2.36.1
[/sbin/fsck.ext3 (1) -- /dev/sda1] fsck.ext3 -a -C0 /dev/sda1 
rootfs: recovering journal
rootfs: clean, 21864/1048576 files, 382514/4194304 blocks
done.
[   21.007705][  T153] EXT4-fs (sda1): mounting ext3 file system using the ext4 subsystem
[   21.078942][  T153] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
done.
Begin: Running /scripts/local-bottom ... done.
Begin: Running /scripts/init-bottom ... done.
[   21.736583][    T1] systemd[1]: System time before build time, advancing clock.
[   21.779799][    T1] systemd[1]: Inserted module 'autofs4'
[   21.808593][    T1] cgroup2: Unknown parameter 'memory_recursiveprot'
[   21.841556][    T1] systemd[1]: systemd 247.3-7+deb11u1 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
[   21.872317][    T1] systemd[1]: Detected architecture arm.

Welcome to Debian GNU/Linux 11 (bullseye)!

[   21.917760][    T1] systemd[1]: Set hostname to <PogoplugPro2>.
[   23.663196][    T1] systemd[1]: Queued start job for default target Graphical Interface.
[   23.674874][    T1] random: systemd: uninitialized urandom read (16 bytes read)
[   23.682769][    T1] systemd[1]: system-getty.slice: unit configures an IP firewall, but the local system does not support BPF/cgroup firewalling.
[   23.695884][    T1] systemd[1]: (This warning is only shown for the first unit using IP firewalling.)
[   23.707192][    T1] systemd[1]: Created slice system-getty.slice.
[  OK  ] Created slice system-getty.slice.
[   23.746636][    T1] random: systemd: uninitialized urandom read (16 bytes read)
[   23.755328][    T1] systemd[1]: Created slice system-modprobe.slice.
[  OK  ] Created slice system-modprobe.slice.
[   23.796715][    T1] random: systemd: uninitialized urandom read (16 bytes read)
[   23.805427][    T1] systemd[1]: Created slice system-serial\x2dgetty.slice.
[  OK  ] Created slice system-serial\x2dgetty.slice.
[   23.847777][    T1] systemd[1]: Created slice User and Session Slice.
[  OK  ] Created slice User and Session Slice.
[   23.887395][    T1] systemd[1]: Started Dispatch Password Requests to Console Directory Watch.
[  OK  ] Started Dispatch Password …ts to Console Directory Watch.
[   23.927339][    T1] systemd[1]: Started Forward Password Requests to Wall Directory Watch.
[  OK  ] Started Forward Password R…uests to Wall Directory Watch.
[   23.969464][    T1] systemd[1]: Set up automount Arbitrary Executable File Formats File System Automount Point.
[  OK  ] Set up automount Arbitrary…s File System Automount Point.
[   24.017303][    T1] systemd[1]: Reached target Local Encrypted Volumes.
[  OK  ] Reached target Local Encrypted Volumes.
[   24.057548][    T1] systemd[1]: Reached target Paths.
[  OK  ] Reached target Paths.
[   24.097187][    T1] systemd[1]: Reached target Slices.
[  OK  ] Reached target Slices.
[   24.137052][    T1] systemd[1]: Reached target Swap.
[  OK  ] Reached target Swap.
[   24.179507][    T1] systemd[1]: Listening on RPCbind Server Activation Socket.
[  OK  ] Listening on RPCbind Server Activation Socket.
[   24.217845][    T1] systemd[1]: Listening on fsck to fsckd communication Socket.
[  OK  ] Listening on fsck to fsckd communication Socket.
[   24.257431][    T1] systemd[1]: Listening on initctl Compatibility Named Pipe.
[  OK  ] Listening on initctl Compatibility Named Pipe.
[   24.332382][    T1] systemd[1]: Condition check resulted in Journal Audit Socket being skipped.
[   24.342855][    T1] systemd[1]: Listening on Journal Socket (/dev/log).
[  OK  ] Listening on Journal Socket (/dev/log).
[   24.378245][    T1] systemd[1]: Listening on Journal Socket.
[  OK  ] Listening on Journal Socket.
[   24.418401][    T1] systemd[1]: Listening on udev Control Socket.
[  OK  ] Listening on udev Control Socket.
[   24.457832][    T1] systemd[1]: Listening on udev Kernel Socket.
[  OK  ] Listening on udev Kernel Socket.
[   24.497912][    T1] systemd[1]: Condition check resulted in Huge Pages File System being skipped.
[   24.513167][    T1] systemd[1]: Mounting POSIX Message Queue File System...
         Mounting POSIX Message Queue File System...
[   24.551767][    T1] systemd[1]: Mounting RPC Pipe File System...
         Mounting RPC Pipe File System...
[   24.592724][    T1] systemd[1]: Mounting Kernel Debug File System...
         Mounting Kernel Debug File System...
[   24.627630][    T1] systemd[1]: Condition check resulted in Kernel Trace File System being skipped.
[   24.637568][    T1] systemd[1]: Condition check resulted in Kernel Module supporting RPCSEC_GSS being skipped.
[   24.655569][    T1] systemd[1]: Starting Create list of static device nodes for the current kernel...
         Starting Create list of st…odes for the current kernel...
[   24.697715][    T1] random: systemd: uninitialized urandom read (16 bytes read)
[   24.711132][    T1] systemd[1]: Starting Load Kernel Module configfs...
         Starting Load Kernel Module configfs...
[   24.747681][    T1] random: systemd: uninitialized urandom read (16 bytes read)
[   24.760952][    T1] systemd[1]: Starting Load Kernel Module drm...
         Starting Load Kernel Module drm...
[   24.797619][    T1] random: systemd: uninitialized urandom read (16 bytes read)
[   24.811046][    T1] systemd[1]: Starting Load Kernel Module fuse...
         Starting Load Kernel Module fuse...
[   24.850634][    T1] systemd[1]: Condition check resulted in Set Up Additional Binary Formats being skipped.
[   24.861227][    T1] systemd[1]: Condition check resulted in File System Check on Root Device being skipped.
[   24.877831][    T1] systemd[1]: Starting Journal Service...
         Starting Journal Service...
[   24.928330][    T1] systemd[1]: Starting Load Kernel Modules...
         Starting Load Kernel Modules...
[   24.953822][    T1] systemd[1]: Starting Remount Root and Kernel File Systems...
         Starting Remount Root and Kernel File Systems...
[   25.014200][    T1] systemd[1]: Starting Coldplug All udev Devices...
         Starting Coldplug All udev Devices...
[   25.166630][  T193] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro
[   25.204239][    T1] systemd[1]: Mounted POSIX Message Queue File System.
[  OK  ] Mounted POSIX Message Queue File System.
[   25.258641][    T1] systemd[1]: Mounted RPC Pipe File System.
[  OK  ] Mounted RPC Pipe File System.
[   25.279206][    T1] systemd[1]: Mounted Kernel Debug File System.
[  OK  ] Mounted Kernel Debug File System.
[   25.300746][    T1] systemd[1]: Finished Create list of static device nodes for the current kernel.
[  OK  ] Finished Create list of st… nodes for the current kernel.
[   25.385292][    T1] systemd[1]: modprobe@configfs.service: Succeeded.
[   25.394229][    T1] systemd[1]: Finished Load Kernel Module configfs.
[  OK  ] Finished Load Kernel Module configfs.
[   25.438056][    T1] systemd[1]: Started Journal Service.
[  OK  ] Started Journal Service.
[  OK  ] Finished Load Kernel Module drm.
[  OK  ] Finished Load Kernel Module fuse.
[  OK  ] Finished Load Kernel Modules.
[  OK  ] Finished Remount Root and Kernel File Systems.
         Mounting FUSE Control File System...
         Mounting Kernel Configuration File System...
         Starting pNFS block layout mapping daemon...
[   25.725196][  T189] random: systemd-journal: uninitialized urandom read (16 bytes read)
[   25.748167][    T1] random: systemd: uninitialized urandom read (16 bytes read)
         Starting Flush Journal to Persistent Storage...
[   25.787929][    T1] random: systemd: uninitialized urandom read (16 bytes read)
         Starting Load/Save Random Seed...
         Starting Apply Kernel Variables...
         Starting Create System Users...
[  OK  ] Mounted FUSE Control File System.
[  OK  ] Mounted Kernel Configuration File System.
[   26.048737][  T189] systemd-journald[189]: Received client request to flush runtime journal.
[  OK  ] Started pNFS block layout mapping daemon.
[   26.105571][  T189] systemd-journald[189]: File /var/log/journal/c4334d5b2fba8bce5294b56551baee86/system.journal corrupted or uncleanly shut down, renaming and replacing.
[  OK  ] Finished Apply Kernel Variables.
[  OK  ] Finished Create System Users.
         Starting Create Static Device Nodes in /dev...
[  OK  ] Finished Coldplug All udev Devices.

[  OK  ] Finished Create Static Device Nodes in /dev.
[  OK  ] Reached target Local File Systems (Pre).
         Mounting /tmp...
         Starting Rule-based Manage…for Device Events and Files...
[  OK  ] Mounted /tmp.
[  OK  ] Reached target Local File Systems.
[   26.727349][    T1] random: systemd: uninitialized urandom read (16 bytes read)
         Starting Preprocess NFS configuration...
[  OK  ] Finished Flush Journal to Persistent Storage.
[  OK  ] Finished Preprocess NFS configuration.
[  OK  ] Reached target NFS client services.
[  OK  ] Reached target Remote File Systems (Pre).
[  OK  ] Reached target Remote File Systems.
         Starting Create Volatile Files and Directories...
[  OK  ] Started Rule-based Manager for Device Events and Files.
[  OK  ] Finished Create Volatile Files and Directories.
         Starting Update UTMP about System Boot/Shutdown...
[   28.125923][  T213] cfg80211: Loading compiled-in X.509 certificates for regulatory database
[  OK  ] Found device /dev/ttyS0.
[  OK  ] Finished Update UTMP about System Boot/Shutdown.
[  OK  ] Reached target System Initialization.
[  OK  ] Started Daily apt download activities.
[  OK  ] Started Daily apt upgrade and clean activities.
[  OK  ] Started Periodic ext4 Onli…ata Check for All Filesystems.
[  OK  ] Started Discard unused blocks once a week.
[  OK  ] Started Daily man-db regeneration.
[  OK  ] Started Daily Cleanup of Temporary Directories.
[  OK  ] Reached target Timers.
[  OK  ] Listening on Avahi mDNS/DNS-SD Stack Activation Socket.
[  OK  ] Listening on D-Bus System Message Bus Socket.
[  OK  ] Reached target Sockets.
[  OK  ] Reached target Basic System.
         Starting Avahi mDNS/DNS-SD Stack...
         Starting LSB: Starts syslogd...
[  OK  ] Started D-Bus System Message Bus.
         Starting Remove Stale Onli…t4 Metadata Check Snapshots...
         Starting Network Time Service...
         Starting /etc/rc.local Compatibility...
         Starting Self Monitoring a…g Technology (SMART) Daemon...
         Starting OpenBSD Secure Shell server...
[   29.653588][  T213] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
         Starting User Login Management...
[   29.701473][  T128] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
[   29.715768][  T128] cfg80211: failed to load regulatory.db
         Starting Permit User Sessions...
[  OK  ] Started /etc/rc.local Compatibility.
[   29.928025][  T213] rt2800pci 0000:00:00.0: enabling device (0140 -> 0142)
[   29.971699][  T213] ieee80211 phy0: rt2x00_set_rt: Info - RT chipset 3090, rev 3213 detected
[   30.000599][  T213] ieee80211 phy0: rt2x00_set_rf: Info - RF chipset 0005 detected
[   30.116119][  T218] rt2800pci 0000:00:00.0 wlp0s0: renamed from wlan0
[  OK  ] Started N268] proc: Bad value for 'hidepid'
1;39mPermit User Sessions.
[   31.036380][    C1] random: crng init done
[   31.040494][    C1] random: 123 urandom warning(s) missed due to ratelimiting
[  OK  ] Finished Load/Save Random Seed.
[  OK  ] Started LSB: Starts syslogd.
[  OK  ] Started Avahi mDNS/DNS-SD Stack.
[  OK  ] Started Self Monitoring an…ing Technology (SMART) Daemon.
[  OK  ] Started User Login Management.
[  OK  ] Listening on Load/Save RF …itch Status /dev/rfkill Watch.
         Starting LSB: Starts klogd...
[  OK  ] Started Getty on tty1.
         Starting Netatalk AFP fileserver for Macintosh clients...
[  OK  ] Started Serial Getty on ttyS0.
[  OK  ] Reached target Login Prompts.
[  OK  ] Started OpenBSD Secure Shell server.
[  OK  ] Finished Remove Stale Onli…ext4 Metadata Check Snapshots.
[  OK  ] Started LSB: Starts klogd.
         Starting Load/Save RF Kill Switch Status...
[  OK  ] Started Netatalk AFP fileserver for Macintosh clients.
[  OK  ] Reached target Multi-User System.
[  OK  ] Reached target Graphical Interface.
         Starting Update UTMP about System Runlevel Changes...
[  OK  ] Started Load/Save RF Kill Switch Status.
[  OK  ] Finished Update UTMP about System Runlevel Changes.

Debian GNU/Linux 11 PogoplugPro2 ttyS0

PogoplugPro2 login:
bluzfanmr1.

I think because you are running systemd, it is harder to get evrything going. My basic rootfs is running sysvinit.

However, I can see that the network is not running. You don't have Ethernet.

> PogoplugPro2 login:

Log in through serial console at this prompt and,

cat /etc/initramfs-tools/modules
cat /etc/network/interfaces
ifconfig -a

And then
ifdown eth0
ifup eth0
Note that you might not have eth0 when you are running systemd. It is likely named something else.

=====

Try that, and there is another step that I can show you after I see the info above.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)



Edited 2 time(s). Last edit at 03/15/2023 06:38PM by bodhi.
System has not been booted with systemd as init system (PID 1). Can't operate. Failed to connect to bus

I've just walked this route, it's the other way around. He is using sysv.

You get this message when you try to use systemctl tool to start things, if you booted with sysv.
But, it doesn't show up in the logs posted anywhere, so need more evidence to sort out the problem.

In my case it was because
- A program required systemd
- I was installing with APT
- Program installs fine, since systemd is available, just not used..
- Apt tries to start/stop/restart services since it thinks we have systemd it runs the systemctl tool.

In this case i don't understand
Quote

The message above starts after I create a swap file and reboot the machine. All this is done via ssh.
How are you making the swap file? Where, and what are your steps to do this?
Does it show the error at swapon ? mkswap? fallocate?

You wrote you update the kernel, did this all go fine, and did you reboot a few times to check afterwards?
I suggest to break it down in steps. Working situation? Reboot few times, all is ok? Upgrade kernel. Reboot few times, check if it's ok.
In this case it could be kernel upgrade breaks, but you don't reboot until after the swap file.. :)
I got pulled away from things but am about to work on this. I will post my findings shortly. Thank you!
Asure Wrote:
-------------------------------------------------------
>
> System has not been booted with systemd as init
> system (PID 1). Can't operate. Failed to connect
> to bus
>
>
> I've just walked this route, it's the other way
> around. He is using sysv.
>
> You get this message when you try to use systemctl
> tool to start things, if you booted with sysv.
> But, it doesn't show up in the logs posted
> anywhere, so need more evidence to sort out the
> problem.
>
> In my case it was because
> - A program required systemd
> - I was installing with APT
> - Program installs fine, since systemd is
> available, just not used..
> - Apt tries to start/stop/restart services since
> it thinks we have systemd it runs the systemctl
> tool.
>
> In this case i don't understand
>
Quote

The message above starts after I create a
> swap file and reboot the machine. All this is done
> via ssh.
> How are you making the swap file? Where, and what
> are your steps to do this?
> Does it show the error at swapon ? mkswap?
> fallocate?
>
> You wrote you update the kernel, did this all go
> fine, and did you reboot a few times to check
> afterwards?
> I suggest to break it down in steps. Working
> situation? Reboot few times, all is ok? Upgrade
> kernel. Reboot few times, check if it's ok.
> In this case it could be kernel upgrade breaks,
> but you don't reboot until after the swap file..
> :)

My experience is similar to yours. I don't believe it has anything to do with the swap file. I DO believe it starts after I install Netatalk. I didn't do anything other than install Netatalk and I did not install anything else or change anything else like sysvinit or systemv. I just used the instructions here with default settings, other than installing Netatalk. You are right that I had to restart Netatalk with either systemctl, which didn't work and I had to use
service netatalk restart
Once I do that I got that error message and can no longer log in via SSH. Like I said yesterday, I CAN log in via serial console without issue.

Just wanted to get this out there. I am about to work on bodhi's suggestions now and will post my findings.

Thanks for the help.



Edited 3 time(s). Last edit at 03/16/2023 03:46PM by bluzfanmr1.
bodhi Wrote:
-------------------------------------------------------
> bluzfanmr1.
>
> I think because you are running systemd, it is
> harder to get evrything going. My basic rootfs is
> running sysvinit.
>

I am using your rootfs and did not change this. Do you know how this would happen?
bodhi Wrote:
-------------------------------------------------------
> bluzfanmr1.
>
> I think because you are running systemd, it is
> harder to get evrything going. My basic rootfs is
> running sysvinit.
>
> However, I can see that the network is not
> running. You don't have Ethernet.
>
> > PogoplugPro2 login:
>
> Log in through serial console at this prompt and,
>
>
> cat /etc/initramfs-tools/modules
> cat /etc/network/interfaces
> ifconfig -a
>
>
> And then
>
> ifdown eth0
> ifup eth0
>
> Note that you might not have eth0 when you are
> running systemd. It is likely named something
> else.
>
> =====
>
> Try that, and there is another step that I can
> show you after I see the info above.

I followed your commands and that worked! I was able to log in via ssh! Did you need to see any of the results? That last command caused the ethernet to come up. I really think this happens after I install Netatalk, so I'm wondering if that's the cause? I did not have ethernet issues and could log in via ssh before installing Netatalk.

The question now is how do I make this permanent?

root@PogoplugPro2:~# ifdown eth0
ifdown: interface eth0 not configured
root@PogoplugPro2:~# ifup eth0
[  350.094814][  T342] oxnas-dwmac 40400000.ethernet eth0: PHY [stmmac-0:00] driver [Generic PHY]
[  350.108526][  T342] oxnas-dwmac 40400000.ethernet eth0: No Safety Features support found
[  350.116782][  T342] oxnas-dwmac 40400000.ethernet eth0: IEEE 1588-2008 Advanced Timestamp supported
[  350.128261][  T342] oxnas-dwmac 40400000.ethernet eth0: registered PTP clock
[  350.135813][  T342] oxnas-dwmac 40400000.ethernet eth0: configuring for phy/rgmii link mode
root@PogoplugPro2:~# [  353.277288][  T295] oxnas-dwmac 40400000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
[  353.286251][  T295] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready



Edited 1 time(s). Last edit at 03/16/2023 05:16PM by bluzfanmr1.
bluzfanmr1,

> The question now is how do I make this permanent?

You might not need to do ifdown/up if you can avoid the problem with systemd started running somehow.

Instead of using the above commands to start Netatalk, you can do the sysv initscripts way.
/etc/init.d/netatalk restart
See if it will behave the same way.

If you still have problem, then perhaps remove netatalk and then reinstall it. While installing, keep the log so we can see whether systemd plays a part in it.

I would reboot right after installing netatalk, and then see if you have network and can ssh in.

======

To force the network up (as a workaround only), you would add these to the beginning of /etc/rc.local
ifdown eth0
ifup eth0

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
I am experiencing a similar issue.

Operating System: Debian GNU/Linux 11 (bullseye)
Kernel: Linux 5.10.0-21-amd64

I have a headless mini-PC running Deb 11 that is used as a router and also for running various services inside LXC containers.

It all works OK, except after a few days or maybe a few weeks go by, ssh will stop working, and when I try running "systemctl restart ssh" I get the message "System has not been booted with systemd as init system (PID 1). Can't operate. Failed to connect to bus: Host is down"

It's completely random. For example I was working away on the machine last night without any issues, I go to bed and when I wake up in the morning I could keep using my previous ssh connection left open in a terminal window, and only notice there's a problem when I try a new connection from somewhere else and ssh is refused (I have several machines - none with this issue). On the still active ssh connection, I try "systemctl restart ssh" but no luck, I get the "System has not ..." error. If I run /etc/init.d/ssh restart that will get ssh working again but it does not solve the underlying problem.

Rebooting is my temporary solution, but rebooting safely becomes difficult because the usual commands will not work without systemd functioning correctly so I end up doing a hard reset. On reboot everything is fine again until days maybe weeks later.

Anyway, I've been digging high and low to see who else has this problem and if anyone has found a solution. No luck so far.

I suppose I'll have to perform a full wipe and installation from scratch and hope it fixes whatever has gone wrong. It could be faulty hardware, I just have no idea. All I know is nothing else seems to break. For example I can connect to one of the LXC containers running on the same broken machine, and it's all good, including systemd. All my other machines work fine, I've been doing this stuff for years and have never seen this problem before.
robertt,

> Operating System: Debian GNU/Linux 11 (bullseye)
> Kernel: Linux 5.10.0-21-amd64

Thanks for the info. That's the mainline kernel and you are running systemd, not sysvinit I suppose?. If that's true this is a Debian problem.

But that error is suspicious. On this box, did you make sure that systemd is PID 1. Use htop or top to see this info. I would use htop and display-by-thread.

Whar bluzfanmr1 experienced above might be a different but related issue.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
This what top says - but I've reset the device since reporting the problem here, so ATM it's working correctly.

1 root 20 0 164224 10724 7980 S 0.0 0.0 0:01.30 systemd

I'm definitely running systemd rather than sysvinit.

To rule out installation weirdness I've also run ps -A | grep sysvinit and nothing shows up.

The next time it fails, I'll check to see if the process ID has changed or if systemd has dropped out. It could be a few days from now or a few weeks though. These kind of errors are always a time consuming process to track down and resolve.

What bothers me most is that no one else seems to have a similar problem, the OP in this forum is the closest match I've found so far, most reports of the issue revolve around WSL and Docker which appear to be unrelated.

In my case, the mini-PC is a new machine acquired in late Feb 2023, and my concern is there's a HW fault with it, but the kind of problem showing up does not suggest a HW fault, I'd expect other services to fail rather than only this one, and so far it's been very consistent affecting systemd and nothing else that I'm aware of. What's even weirder is that all of the LXC containers keep chugging along just fine, including systemd inside the containers. It's been only the host instance that is affected. If it was not for ssh refusing new connections on the host, I would not have noticed the problem until I running a systemctl command on the host.

The one thing I can think of that may have something to do with the problem, is I'm using a privileged LXC container with xfce4 and x2goserver installed. On my main workstation, using the x2go client I can connect to the LXC container with xfce4 as the interface to the whole thing, it simplifies configurations of all the LXC containers and it gets around having to install xfce4 on the host directly which I prefer not to do. If all I have to do is copy over an LXC container on to a minimal Debian host that has all I need pre-installed, that's a huge win for me. The technique is relatively new, but I've done the same thing on two other machines prior to this one and it's been working fine.

BTW all other containers are unprivileged except for the one I use to manage the entire operation.

I'd post a bug to let the Debian devs know about it, but I cannot reproduce it reliably and have no idea what conditions are responsible for it.

Any hints or suggestions to investigate are appreciated!
I've been unable to pin this down either. First I thought it was related to swap, then Netatalk. It does seem to be random. SSH works fine at first but stops within a few hours and seems to take out the ethernet availability. I don't understand why I would now have to do these special things just to get ethernet to work as that's never been the case before.
After the SSH lockout occurs, log in with serial console and see if there are recent change in ssh.

ls -lart /etc/ssh/

And if the timestamp shows there are recent updates, then look for anything fishy in these 2 files
sshd_config
ssh_config

Also see if sshd is running
ps -deaf | grep -i ssh
And ping the router (if is 192.168.0.1)
ping 192.168.0.1

If sshd is not running, or the ping fails, post or attach 2 log files here (/var/log/message is quite long).

dmesg
cat /var/log/message

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)



Edited 3 time(s). Last edit at 03/23/2023 03:34PM by bodhi.
I think I found the cause, at least in my case, but the exact reason why it happens is not yet known.

As weird as it may be, this is what happened (see below) and it even though I'm taking about LXC specifically, I think there's a root cause that can affect other situations that do not involve LXC containers.

Everything was operating normally, in fact I successfully ssh'd into the host machine just before sysemd + ssh broke.

After ssh'ing in I copied an LXC container using rsync from a remote machine to the host I ssh'd into. I'm lazy so I just copied over the raw container files in /var/lib/lxc/<containername>/ over to the different machine's /var/lib/lxc/<containername>/. I do this often without issues. In this case I forgot to shut it down which may be an important detail or maybe not (to be confirmed later), but I've also copied running containers before many times without issues.

Command is
rsync -Pav root@<machineip>:/var/lib/lxc/<containername>/ /var/lib/lxc/<containername>/

The rsync copy went as expected, nothing unusual, but when I checked the folder size taken up by the container, it was huge, about 2x what it is on the remote machine. I've vaguely remember seeing the 2x increase happen before and it is only on the machine that has the issue with systemd failing.

I tried to figure out what increased the file size, it appeared to be with the /usr/ and /var folders that doubled up, I figure the file counts and sizes were including linked folders or something.

Since the container size was 2x I decided to delete the whole thing and try a different method to copy the container.

But here's probably an important detail, I used the LXC container with xfce4 on it to perform all of this work. My long detailed post above explained what I am doing with this xfce4 container (it's a relatively new thing I started doing). What it does, it uses sshfs to mount the hosts root file system, that way the container will have access to the hosts file system to make it easy for me to manipulate things. Generally speaking it's been working really well but it may end up the cause of the problems I'm having.

So, using xfce4's Thunar on the LXC container, I selected the container folder (over the hosts sshfs mount), and told it to delete the folder, which it did although it complained that some folders could not be removed.

I had a bad feeling about doing the delete, and I thought maybe this is what's causes systemd to break. I remembered the last time systemd broke, I was copying over containers and setting up new ones, so maybe the folder deletion was responsible as I likely did delete at least one container folder before it happened that last time. I do not think it's Thunar, because I can remember deleting a container folder using rm -R from the command line, it's also unlikely to be due to sshfs because I would do the rm -R on the host via the ssh connection.

Anyway, after the deletion I had a bad feeling and quickly checked if things broke again, and sure enough running systemd I get "System has not been booted with systemd as init system (PID 1). Can't operate. Failed to connect to bus: Host is down"!

Also ssh is busted, I can no longer ssh in from another machine.
I can get ssh working again by not using systemctl and instead restart the old way:
/etc/init.d/ssh restart

It's very clear now what the actions were prior to the failure, and what is described I do not do very often, it explains why the system works fine for days on end before I notice a failure. It's because it is a new machine and it's been slowly being configured replacing an older machine with containers being moved over one or two at a time, so once in a while I'll do a big effort to copy over a few containers from the old machine to the new one and set them up, this is why it only happens once in a while.

BTW, I did not start the copied container, I only copied the files over then deleted them soon after before doing anything with it.

How can this affect situations that do not involve LXC? That's a question I cannot yet answer, but I suspect it's not an issue specific to LXC alone, for example there are reports that Docker is affected with a similar issue. It seems to be specific to the deletion of the container, but why? We know containers share files with the host, and this is likely what caused the problem somehow.

For your situation, are there any similarities you can think of that compare with my situation? I'd love to pin this item down to something more specific so a bug report can be filed.



Edited 4 time(s). Last edit at 03/23/2023 06:07PM by robertt.
Here are the commands your suggested I run

ls -lart /etc/ssh/

host.gw.h0:~# ls -lart /etc/ssh/
total 600
drwxr-xr-x 1 root root      0 Jul  1  2022 sshd_config.d
drwxr-xr-x 1 root root      0 Jul  1  2022 ssh_config.d
-rw-r--r-- 1 root root   1650 Jul  1  2022 ssh_config
-rw-r--r-- 1 root root 577771 Jul  1  2022 moduli
-rw-r--r-- 1 root root    561 Jan 14 19:48 ssh_host_rsa_key.pub
-rw------- 1 root root   2590 Jan 14 19:48 ssh_host_rsa_key
-rw-r--r-- 1 root root    169 Jan 14 19:48 ssh_host_ecdsa_key.pub
-rw------- 1 root root    492 Jan 14 19:48 ssh_host_ecdsa_key
-rw-r--r-- 1 root root     89 Jan 14 19:48 ssh_host_ed25519_key.pub
-rw------- 1 root root    399 Jan 14 19:48 ssh_host_ed25519_key
-rw-r--r-- 1 root root   3309 Jan 14 21:33 sshd_config
drwxr-xr-x 1 root root    344 Jan 14 21:33 .
drwxr-xr-x 1 root root   2962 Mar 22 11:51 ..


I do not see anything was changed.

host.gw.h0:~# ps -deaf | grep -i ssh
root        2073       1  0 Mar22 ?        00:00:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
100000      2537    2372  0 Mar22 ?        00:00:01 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
100000      2974    2474  0 Mar22 ?        00:00:01 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
root        3394    2924  0 Mar22 ?        00:00:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
sshd        3746    2924  0 Mar22 ?        00:00:00 /usr/sbin/exim4 -bd -q30m
root        3785    2073  0 Mar22 ?        00:00:00 sshd: root@pts/3
100000      3984    3855  0 Mar22 ?        00:00:01 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
100000      4443    3997  0 Mar22 ?        00:00:01 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
100000      4883    4459  0 Mar22 ?        00:00:01 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
100000      6222    6069  0 Mar22 ?        00:00:08 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
100000      6565    6418  0 Mar22 ?        00:00:02 sshd: /usr/sbin/sshd -D [listener] 1 of 10-100 startups
root       16733    3394  0 Mar22 ?        00:07:55 sshd: root@notty
root       17540    2924  0 Mar22 ?        00:00:00 /usr/bin/ssh-agent -s
root       17642    2924  0 Mar22 ?        00:00:19 ssh -x -a -oClearAllForwardings=yes -ocompression=no -2 root@controlhost.internal -s sftp
root       17647    2073  0 Mar22 ?        00:00:16 sshd: root@notty
root       17713   17647  0 Mar22 ?        00:00:13 /usr/lib/openssh/sftp-server
root       17714    2924  0 Mar22 ?        00:00:37 sshfs root@controlhost.internal:/ /mnt/mount_controlhost/ -o reconnect,compression=no,follow_symlinks,transform_symlinks
root       17961   17907  0 Mar22 pts/5    00:00:00 ssh root@controlhost.internal
root       17964    2073  0 Mar22 ?        00:00:00 sshd: root@pts/9
root      531151  531036  0 Mar22 pts/7    00:00:00 ssh root@controlhost.internal
root      531152    2073  0 Mar22 ?        00:00:00 sshd: root@pts/11
100000    690050  689849  0 Mar22 ?        00:00:05 sshd: /usr/sbin/sshd -D [listener] 2 of 10-100 startups
root     1622342   17907  0 14:05 pts/4    00:00:02 ssh root@controlhost.internal
root     1622343    2073  0 14:05 ?        00:00:02 sshd: root@pts/12
100000   1688875  690050  0 15:09 ?        00:00:00 sshd: [accepted]
100000   1689596    6565  0 15:10 ?        00:00:00 sshd: [accepted]
100000   1690327  690050  0 15:11 ?        00:00:00 sshd: [accepted]
root     1690338 1622350  0 15:11 pts/12   00:00:00 grep -i ssh


Note that the ID 10000 are root processes running on unprivilaged LXC containers, however there's also one privileged containers running with root ID.

robertt@hrt8:~$ ping 192.168.0.1
PING 192.168.0.1 (192.168.0.1) 56(84) bytes of data.
64 bytes from 192.168.0.1: icmp_seq=1 ttl=64 time=0.831 ms
64 bytes from 192.168.0.1: icmp_seq=2 ttl=64 time=0.215 ms


(works fine as expected since I'm still connected on the previous ssh connection)

I've made a backup copy of all the log files, a quick look at the tail end showed no obvious errors or anything unusual. I can post details later.

host.gw.h0:~# ps -A | grep systemd <- still running several containers so a lot shows up
1 ?        00:00:03 systemd
    338 ?        00:00:00 systemd-journal
   2372 ?        00:00:01 systemd
   2442 ?        00:00:02 systemd-journal
   2458 ?        00:00:00 systemd-udevd
   2474 ?        00:00:01 systemd
   2495 ?        00:00:00 systemd-logind
   2564 ?        00:00:00 systemd
   2755 ?        00:00:02 systemd-journal
   2894 ?        00:00:00 systemd-udevd
   2924 ?        00:00:01 systemd
   2942 ?        00:00:00 systemd-logind
   3025 ?        00:00:00 systemd
   3237 ?        00:00:00 systemd-journal
   3377 ?        00:00:00 systemd-logind
   3378 ?        00:00:00 systemd-machine
   3422 ?        00:00:00 systemd
   3424 ?        00:00:00 systemd
   3855 ?        00:00:01 systemd
   3927 ?        00:00:02 systemd-journal
   3943 ?        00:00:00 systemd-udevd
   3967 ?        00:00:00 systemd-logind
   3997 ?        00:00:01 systemd
   4030 ?        00:00:00 systemd
   4374 ?        00:00:02 systemd-journal
   4393 ?        00:00:00 systemd-udevd
   4416 ?        00:00:00 systemd-logind
   4459 ?        00:00:01 systemd
   4496 ?        00:00:00 systemd
   4831 ?        00:00:02 systemd-journal
   4846 ?        00:00:00 systemd-udevd
   4873 ?        00:00:00 systemd-logind
   4953 ?        00:00:00 systemd
   5290 ?        00:00:00 systemd
   6069 ?        00:00:02 systemd
   6167 ?        00:00:20 systemd-journal
   6181 ?        00:00:00 systemd-udevd
   6182 ?        00:00:02 systemd-network
   6210 ?        00:00:00 systemd-logind
   6418 ?        00:00:02 systemd
   6527 ?        00:00:08 systemd-journal
   6536 ?        00:00:00 systemd-udevd
   6539 ?        00:00:03 systemd-network
   6554 ?        00:00:00 systemd-logind
 689849 ?        00:00:04 systemd
 689987 ?        00:00:19 systemd-journal
 690001 ?        00:00:00 systemd-udevd
 690002 ?        00:01:30 systemd-network
 690047 ?        00:00:00 systemd-logind
1699735 ?        00:00:00 systemd-udevd

====
bodhi's edit: please use code tags (formatted code) to post log.



Edited 1 time(s). Last edit at 03/23/2023 05:41PM by bodhi.
Quote
bodhi's edit: please use code tags (formatted code) to post log.

OK, I will use the tags from now on, it looks much better that way.
> The rsync copy went as expected, nothing unusual,
> but when I checked the folder size taken up by the
> container, it was huge, about 2x what it is on the
> remote machine. I've vaguely remember seeing the
> 2x increase happen before and it is only on the
> machine that has the issue with systemd failing.

The fact that the container is 2x its source on the remote machine could be the /usr merge incermentally affect Debian. You might be running the old version of the binaries that have been migrated in Debian.

What is the Debian version for these containers? Are thay still bullseye or having some bookworm mixed in?

See the /usr merge in progress in bullseyes and will be required in bookworm
https://wiki.debian.org/UsrMerge

The tell is is if a binary foo exist on both folders as a real file (if the /usr merge is going on, /bin/foo should be a symlink).
ls -l /usr/bin
ls -l /bin

Might be a systemd bug.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Deb 11 version on the remote host that the existing proxy container runs on

root@host.gw:~# hostnamectl
   Static hostname: host.gw
         Icon name: computer-desktop
           Chassis: desktop
        Machine ID: aad73c1b4418460d8376b84cfcb6c1d1
           Boot ID: 92cffc5af66b4bc5b1571d18a5541e2b
  Operating System: Debian GNU/Linux 11 (bullseye)
            Kernel: Linux 5.10.0-19-amd64
      Architecture: x86-64

Deb 11 version that is installed on the remote proxy container that was copied over

root@proxy:~$ hostnamectl
   Static hostname: proxy
         Icon name: computer-container
           Chassis: container
        Machine ID: 93dbc710e79343bda4e1fd39b36d68cf
           Boot ID: 3c6fca9c06a64b3e90ecedb3040c9bbf
    Virtualization: lxc
  Operating System: Debian GNU/Linux 11 (bullseye)
            Kernel: Linux 5.10.0-19-amd64
      Architecture: x86-64

Deb 11 version that is on the new host, this is the one that is now messed up

host.gw.h0:~# hostnamectl
hostnamectl
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down

Try other methods ...

host.gw.h0:~# lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 11 (bullseye)
Release:	11
Codename:	bullseye

host.gw.h0:~# uname -a
Linux host.gw.h0 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux

Quote
The tell is is if a binary foo exist on both folders as a real file (if the /usr merge is going on, /bin/foo should be a symlink).

A check shows that on the remote host with older containers /bin is symlink

root@host.gw:~# ls -l /bin
lrwxrwxrwx 1 root root 7 Aug 28  2021 /bin -> usr/bin

/bin is also a symlink on the new host and also on the new containers created directly on it
host.gw.h0:~# ls -l /bin
lrwxrwxrwx 1 root root 7 Jan 14 19:44 /bin -> usr/bin

This is what I see on the remote host proxy container
root@proxy:~$ ls -l /bin
total 11976
-rwxr-xr-x 1 root root 1234376 Mar 27  2022 bash
-rwxr-xr-x 1 root root   38984 Jul 20  2020 bunzip2
-rwxr-xr-x 1 root root  715480 Jul 25  2021 busybox
-rwxr-xr-x 1 root root   38984 Jul 20  2020 bzcat
lrwxrwxrwx 1 root root       6 Jul 20  2020 bzcmp -> bzdiff
-rwxr-xr-x 1 root root    2225 Jul 20  2020 bzdiff
lrwxrwxrwx 1 root root       6 Jul 20  2020 bzegrep -> bzgrep

(truncated)

On the older container /bin is NOT a symlink, and probably explains why the files appeared to double up.

Edit: But how? All I can think of are there's other symlinks under the conatiners /bin/ that may point to a larger number of files than are on the remote host, or it got into a loop when checking - it did take a very long time to count the files and I killed the count.

What I'll do is make brand new containers rather than copy over the older ones.

In any case, it's a mystery why doing the rsync and the deletion put systemd into a bad state. I'll have to do experiments to recreate the rsync step first, then see if that alone broke systemd, then do a delete of the container folder as before and see if that breaks systemd. For all I know it was pure coincidence things broke until I can reproduce it reliably.

Trouble is I'm about to go away on a trip for a few weeks and won't have time to experiment, so this will have to wait until I get back. I also have to build a test system to reproduce the situation because it's too disruptive to experiment on the production host.



Edited 2 time(s). Last edit at 03/23/2023 07:12PM by robertt.
Quote

On the older container /bin is NOT a symlink, and probably explains why the files appeared to double up.

I think that probably mess up somehow. It is probably still an issue to be resolved in this "/usr merge"

Quote

Edit: But how? All I can think of are there's other symlinks under the conatiners /bin/ that may point to a larger number of files than are on the remote host, or it got into a loop when checking - it did take a very long time to count the files and I killed the count.

No, I think you need to check your rsync command. Can it replace the files with symlinks?

Quote

In any case, it's a mystery why doing the rsync and the deletion put systemd into a bad state.

It could, if systemd runs binaries older than when it assumes new version.

Quote

What I'll do is make brand new containers rather than copy over the older ones.

Good idea to prove this was the cause.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)



Edited 2 time(s). Last edit at 03/23/2023 07:40PM by bodhi.
Not sure if this will help but the only way I get SystemD to load correctly on my POGO was to install the systemd helper - systemd-sysv as well as systemd. That fixed the problem where systemd was started first with PID 1.

I use the pogo as NAS and gerbera media servers and they seem pretty stable until I break them :( - BobM
BobM Wrote:
-------------------------------------------------------
> Not sure if this will help but the only way I get
> SystemD to load correctly on my POGO was to
> install the systemd helper - systemd-sysv as well
> as systemd. That fixed the problem where systemd
> was started first with PID 1.
>
> I use the pogo as NAS and gerbera media servers
> and they seem pretty stable until I break them :(
> - BobM

Thanks for the tips, but these items were already installed. I believe my issues have been fixed and I will cover it in another post.
bodhi Wrote:
-------------------------------------------------------
> bluzfanmr1,
>
> > The question now is how do I make this
> permanent?
>
> You might not need to do ifdown/up if you can
> avoid the problem with systemd started running
> somehow.
>
> Instead of using the above commands to start
> Netatalk, you can do the sysv initscripts way.
>
> /etc/init.d/netatalk restart
>
> See if it will behave the same way.
>
> If you still have problem, then perhaps remove
> netatalk and then reinstall it. While installing,
> keep the log so we can see whether systemd plays a
> part in it.
>
> I would reboot right after installing netatalk,
> and then see if you have network and can ssh in.
>
> ======
>
> To force the network up (as a workaround only),
> you would add these to the beginning of
> /etc/rc.local
>
> ifdown eth0
> ifup eth0
>

This last part seems to have fixed things. After modifying the /etc/rc.local file the network no longer disappears, nor ssh access, and everything is working as expected. I'm just really confused as to why this happened. I never had to modify that file in the 10 or more years of using Debian on my Pogoplug's and never had a situation where the network would just disappear.

Thanks for all the tips and help, I really appreciate it!
bluzfanmr1,

> This last part seems to have fixed things. After
> modifying the /etc/rc.local file the network no
> longer disappears, nor ssh access, and everything
> is working as expected. I'm just really confused
> as to why this happened. I never had to modify
> that file in the 10 or more years of using Debian
> on my Pogoplug's and never had a situation where
> the network would just disappear.

I think there is a problem somewhere in the init scripts that got a systemd leak. The Devuan project maintains and upstream those scripts regularly. You might want to report this as a bug to either Debian or Devuan.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
bodhi Wrote:
-------------------------------------------------------
> bluzfanmr1,
>
> > This last part seems to have fixed things.
> After
> > modifying the /etc/rc.local file the network no
> > longer disappears, nor ssh access, and
> everything
> > is working as expected. I'm just really
> confused
> > as to why this happened. I never had to modify
> > that file in the 10 or more years of using
> Debian
> > on my Pogoplug's and never had a situation
> where
> > the network would just disappear.
>
> I think there is a problem somewhere in the init
> scripts that got a systemd leak. The Devuan
> project maintains and upstream those scripts
> regularly. You might want to report this as a bug
> to either Debian or Devuan.

I will check into doing that. I really appreciate your guidance and finding the issue!
Looks like I spoke too soon. After working fine for the past week, this morning I'm unable to ssh in again, even after multiple reboots. I'll have to dig back into it when I have more time.
Another update on this. I was gone for about 10 days and when I came home I decided to move things from the 2GB ssd in a usb enclosure to a random usb flash drive. To my surprise it fired right up and has been working for a couple of weeks now without any issues.

I'm wondering if my issues were caused by the large ssd and/or the enclosure being the main drive. The ssd and enclosure seems to work fine as a data drive for backups, etc but the Pogoplug doesn't seem to like it as the main drive. Hopefully this lasts and doesn't come back again!



Edited 1 time(s). Last edit at 05/10/2023 01:52PM by bluzfanmr1.
> Another update on this. I was gone for about 10
> days and when I came home I decided to move things
> from the 2GB ssd in a usb enclosure to a random
> usb flash drive. To my surprise it fired right up
> and has been working for a couple of weeks now
> without any issues.

> I'm wondering if my issues were caused by the
> large ssd and/or the enclosure being the main
> drive. The ssd and enclosure seems to work fine
> as a data drive for backups, etc but the Pogoplug
> doesn't seem to like it as the main drive.
> Hopefully this lasts and doesn't come back again!

It could be the issue with power. If the SSD is inside a USB enclosure then it is suspected. It might be a good idea to use powered eSATA enclosure for this SSD with Pogo Pro.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Author:

Subject:


Spam prevention:
Please, enter the code that you see below in the input field. This is for blocking bots that try to post this form automatically. If the code is hard to read, then just try to guess it right. If you enter the wrong code, a new image is created and you get another chance to enter it right.
Message: