Welcome! Log In Create A New Profile

Advanced

"fixable bit-flip" error after rebooting

Posted by callidus 
"fixable bit-flip" error after rebooting
December 05, 2011 01:52AM
Hi,
I rebooted my dockstar for the first time in two month (ssh/reboot). After this it did not boot up, led is flashing green. It does not seem to be the usb drive, as it doesn't boot my old drive or the rescue system, too. Here's the error log I could read via netconsole:
U-Boot 2010.09 (Oct 23 2010 - 11:49:22)
Marvell-Dockstar/Pogoplug by Jeff Doozan
Hit any key to stop autoboot:  0 
(Re)start USB...
USB:   Register 10011 NbrPorts 1
USB EHCI 1.00
scanning bus for devices... 3 USB Device(s) found
       scanning bus for storage devices... 1 Storage Device(s) found
Loading file "/rescueme" from usb device 0:1 (usbda1)
** File not found /rescueme
reading /rescueme.txt

** Unable to read "/rescueme.txt" from usb 0:1 **
Creating 1 MTD partitions on "nand0":
0x000002500000-0x000010000000 : "mtd=3"
UBI: attaching mtd1 to ubi0
UBI: physical eraseblock size:   131072 bytes (128 KiB)
UBI: logical eraseblock size:    129024 bytes
UBI: smallest flash I/O unit:    2048
UBI: sub-page size:              512
UBI: VID header offset:          512 (aligned 512)
UBI: data offset:                2048
UBI: fixable bit-flip detected at PEB 1

After spending some time with google and a few "dockstar rescue" pages, I'm not quite sure about what to do in order to recover my (half bricked?) dockstar. I hope you could fill the gaps or missing links, as I 've never worked with the dockstar in this state.

I'd think I have to
  • open a read/write netconsole to the box
  • stop autoboot (start the marvel system?)
  • reflash uboot (how out of the marvel system? Which image/commands?)
  • ensure somehow that the netconsole access survives the reflashing
  • pray that everything works and my whole system will come up again as before.

-Sascha
Re: "fixable bit-flip" error after rebooting
December 07, 2011 01:26PM
Ok, I've managed to boot my system... But it will only last until the next reboot.

I've started the marvell prompt via netconsole and ran the commands from bootcmd manually. The mentioned error is a result of "run ubifs_bootcmd". Running "usb start; run usb_bootcmd" started the OS from my usb drive. But that won't last.

Here's the environment:
Marvell>> printenv
printenv
ethact=egiga0
baudrate=115200
mainlineLinux=yes
console=ttyS0,115200
led_init=green blinking
led_exit=green off
led_error=orange blinking
mtdparts=mtdparts=orion_nand:1M(u-boot),4M(uImage),32M(rootfs),-(data)
mtdids=nand0=orion_nand
partition=nand0,2
rescue_set_bootargs=setenv bootargs console=$console ubi.mtd=2 root=ubi0:rootfs ro rootfstype=ubifs $mtdparts $rescue_custom_params
rescue_bootcmd=if test $rescue_installed -eq 1; then run rescue_set_bootargs; nand read.e 0x800000 0x100000 0x400000; bootm 0x800000; else run pogo_bootcmd; fi
pogo_bootcmd=if fsload uboot-original-mtd0.kwb; then go 0x800200; fi
force_rescue=0
force_rescue_bootcmd=if test $force_rescue -eq 1 || ext2load usb 0:1 0x1700000 /rescueme 1 || fatload usb 0:1 0x1700000 /rescueme.txt 1; then run rescue_bootcmd; fi
ubifs_mtd=3
ubifs_set_bootargs=setenv bootargs console=$console ubi.mtd=$ubifs_mtd root=ubi0:rootfs rootfstype=ubifs $mtdparts $ubifs_custom_params
ubifs_bootcmd=run ubifs_set_bootargs; if ubi part data && ubifsmount rootfs && ubifsload 0x800000 /boot/uImage && ubifsload 0x1100000 /boot/uInitrd; then bootm 0x800000 0x1100000; fi
usb_scan=usb_scan_done=0;for scan in $usb_scan_list; do run usb_scan_$scan; if test $usb_scan_done -eq 0 && ext2load usb $usb 0x800000 /boot/uImage 1; then usb_scan_done=1; echo "Found bootable drive on usb $usb"; setenv usb_device $usb; setenv usb_root /dev/$dev; fi; done
usb_scan_list=1 2 3 4
usb_scan_1=usb=0:1 dev=sda1
usb_scan_2=usb=1:1 dev=sdb1
usb_scan_3=usb=2:1 dev=sdc1
usb_scan_4=usb=3:1 dev=sdd1
usb_device=0:1
usb_root=/dev/sda1
usb_rootfstype=ext2
usb_rootdelay=10usb_set_bootargs=setenv bootargs console=$console root=$usb_root rootdelay=$usb_rootdelay rootfstype=$usb_rootfstype $mtdparts $usb_custom_params
usb_bootcmd=run usb_init; run usb_set_bootargs; run usb_boot
usb_boot=mw 0x800000 0 1; ext2load usb $usb_device 0x800000 /boot/uImage; if ext2load usb $usb_device 0x1100000 /boot/uInitrd; then bootm 0x800000 0x1100000; else bootm 0x800000; fi
bootcmd=usb start; run force_rescue_bootcmd; run ubifs_bootcmd; run usb_bootcmd; usb stop; run rescue_bootcmd; run pogo_bootcmd; reset
ethaddr=00:10:75:1A:CD:EE
arcNumber=2097
rescue_installed=1
serverip=192.168.1.190
ipaddr=192.168.1.20
if_netconsole=ping $serverip
start_netconsole=setenv ncip $serverip; setenv bootdelay 10; setenv stdin nc; setenv stdout nc; setenv stderr nc; version;
preboot=run if_netconsole start_netconsole
usb_init=if usb start; then version; else usb stop; usb start;fi;
ncip=192.168.1.190
bootdelay=10
stdin=nc
stdout=nc
stderr=nc

Environment size: 2551/131068 bytes

Any idea what's broken and how to fix it
Andy123
Re: "fixable bit-flip" error after rebooting
January 09, 2012 12:41PM
Hi,

after a power fail I have the same error :-/. Have you found any solution?
Re: "fixable bit-flip" error after rebooting
January 10, 2012 01:31AM
@Andy123
Not really. I've modified my bootcmd (with my netconsole access) and switched positions of "run ubifs_bootcmd" and "run usb_bootcmd". Now my dockstar boots successfully from my thumbdrive and everything seems to work. But a bad feeling stays as I don't know if the workaround is ok (what does "ubifs_bootcmd"?) or if I've got to reflash something (somehow).
Andy123
Re: "fixable bit-flip" error after rebooting
January 10, 2012 04:10AM
Can you explain me how you have switch the position? I'm a little bit confused, have you switch the physical position or the logical in a script?


The question related to this... is the NAND broken oder only a logical error ?
Andy123
Re: "fixable bit-flip" error after rebooting
January 10, 2012 05:30AM
I cloud fix the issue for me.

I have format all /dev/mtX and reinstall uboot and rescure. Now it boots correct.
Re: "fixable bit-flip" error after rebooting
January 11, 2012 06:38AM
Andy123 Wrote:
-------------------------------------------------------
> Can you explain me how you have switch the
> position? I'm a little bit confused, have you
> switch the physical position or the logical in a
> script?
I've connected to my box and executed
fw_setenv bootcmd 'usb start; usb stop; usb start; run force_rescue_bootcmd; run usb_bootcmd; run ubifs_bootcmd; usb stop; run rescue_bootcmd; run pogo_bootcmd; reset'

> I cloud fix the issue for me.
>
> I have format all /dev/mtX and reinstall uboot
> and rescure. Now it boots correct.
I tried to re-install the rescue system as described in http://forum.doozan.com/read.php?4,831. What exactly have you done additionally?
Andy123
Re: "fixable bit-flip" error after rebooting
January 11, 2012 07:35AM
Hello callidus,

first boot into debian and install mtd-utils and mtd-tools.

Test the flash device:
nandtest -p 10 /dev/mtd0
nandtest -p 10 /dev/mtd1
nandtest -p 10 /dev/mtd2
nandtest -p 10 /dev/mtd4

Now you see if the device has bad blocks. Then erase all devices:
flash_erase /dev/mtd0
flash_erase /dev/mtd1
flash_erase /dev/mtd2
flash_erase /dev/mtd3

and now i have simple reinstall everything:

#uboot
wget http://jeff.doozan.com/debian/uboot/install_uboot_mtd0.sh
chmod +x install_uboot_mtd0.sh 
./install_uboot_mtd0.sh   --no-uboot-check

#rescue
wget http://jeff.doozan.com/debian/rescue/install_rescue.sh
chmod +x install_rescue.sh 
./install_rescue.sh 
Re: "fixable bit-flip" error after rebooting
January 13, 2012 02:26AM
Ok, the best things first: my box is up and running, the bit-flip error is gone.

But to get there it was very neccessary to stay calm ;-)

First: nandtest did not show any error. Strange.
Second: If you run nandtest without option -k (as I did) it will erase everything in the device you point to. Therefor the test was a point of no return...

The rest worked without problems (erase, uBoot and Rescue installation). Prior to rebooting I reinstalled the possibility for netconsole.
fw_setenv serverip 192.168.1.190
fw_setenv ipaddr 192.168.1.160
fw_setenv if_netconsole 'ping $serverip'
fw_setenv start_netconsole 'setenv ncip $serverip; setenv bootdelay 10; setenv stdin nc; setenv stdout nc; setenv stderr nc; version;'
fw_setenv preboot 'run if_netconsole start_netconsole'
fw_setenv preboot 'run if_netconsole start_netconsole'
After reboot: shock! No connection! Quick check with netconsole: everythings fine... A look in my dhcp log showed that it didn't get the configured static address assigned because the mac address has changed. Luckily this was a known problem and could be quickly fixed. Problem solved.
Author:

Your Email:


Subject:


Spam prevention:
Please, enter the code that you see below in the input field. This is for blocking bots that try to post this form automatically. If the code is hard to read, then just try to guess it right. If you enter the wrong code, a new image is created and you get another chance to enter it right.
Message: