Friday, July 25, 2008

Hard Drive Failure

A few days ago we had an electrical storm, so I powered off my computer. After the storm was over I was unable to boot into linux. It had been shutdown cleanly, no signs of errors, but it refused to boot. I immediately suspected the hard drive, a Seagate 160GB, which had been giving me trouble since I bought it about a year ago. My BIOS has an option to check the SMART status of the installed drives, and can even run a short and long self test. Sure enough, the short self test was failing.

It was my belief that the boot was failing while probing the hardware, and my hope that I could just plug in a new drive, copy my files over, and be done with it. I should have known better of course.

I got a new hard drive, a 500GB Western Digital on sale at Circuit City. It installed without a hitch, and I was able to boot an ubuntu Live CD with both the old drive and the new drive installed. That bothered me a little; if the bad disk was causing the hardware probe to crash, then it should have crashed even when booting from a Live CD. Maybe it was booting from the bad disk that was causing the problem.

I partitioned and formatted the new drive, and copied all the files over from the old drive. Since the old drive had an IDE interface, and the new drive was SATA, I needed to update /etc/fstab and GRUB's menu.lst. After that, I should just be able to install GRUB on the new drive and reboot.

That's where I hit my first real snag. I planned to chroot to the new disk to install GRUB, but that didn't work. The chroot command kept giving me an error, saying it couldn't execute /bin/bash. That didn't make any sense at all, I could execute bash on the new disk without problems. Google was no help with the error, but I did find an alternate method for installing GRUB on the ubuntu forums. I installed GRUB and chalked up the chroot failure as an oddity with the Live CD.

Finally, it was time to boot my new disk. I restarted and got to the GRUB splash screen, which was a good sign. The first entry wouldn't boot, but it was no big deal, just the wrong hard drive number in menu.lst. I edited the entry and booted, and got the familiar ubuntu splash screen. Success! Except that it froze, in exactly the same way it had with the old drive. Dammit.

Now, I've been working with computer hardware long enough that I wasn't surprised by this outcome. And, looking back, I really should have known better. So I tried booting into single user mode, to see if that made any difference. It didn't, of course, but at least I got to see the actual error instead of a frozen splash screen. It was a kernel panic, because the kernel couldn't execute /sbin/init. So it was a corrupted filesystem preventing the boot, not an errant hardware probe.

So, back to the Live CD. Copy /sbin/init to the new hard drive, reboot. Nope. Live CD again, copy everything in /sbin. Everything in /bin too, what the hell. No dice. Maybe google can help. No, not really. Time to bite the bullet and reinstall. *sigh*

Google did help here. I found a howto for getting a list of installed packages, and reinstalling after just such a disaster. So, back to the Live CD, again, and get the list of all installed packages:
$ sudo mount /dev/sdb3 /mnt
$ sudo mount /dev/sdb4 /mnt/home
$ sudo su
# dpkg --root=/mnt --get-selections | grep -v deinstall > /mnt/home/ubuntu-files

Then I ran the installer and installed, taking care not to format my home partition. When that finished, I edited the new /etc/apt/sources.list and enabled the universe and multiverse repositories, then I chrooted into the fresh install to reinstall my packages:
$ sudo mount -t proc none /mnt/proc
$ sudo mount -o bind /dev /mnt/dev
$ sudo chroot /mnt /bin/bash
# apt-get update
# dpkg --set-selections < /home/ubuntu-files
# apt-get dselect-upgrade

That got me most of the way back. I unmounted everything, rebooted, and was eventually greeted by the GNOME login. Finally! Of course, I forgot to add my user account, or change the root password, so I couldn't login. But that was easy enough to fix.

No comments: