Next Previous Contents

51. FS Recovery: How to fix LILO and file system problems

Lets say that one day, you have to reboot your machine to install new hardware, find your machine CRASHED, etc. Upon reboot, you see an error like:

- LI (LILO never fully loads.. it just sits there)

or

- The kernel loads up fine but then says: "Vfs cannot open root device 08:11 kernel panic :vfs:unable to mount root fs on 08:11"

--

First, ask yourself:

A. What has changed recently? Did you add/remove any hard drives recently? Keep this in mind:

With IDE drives, they ALWAYS get the same name. IDE0-Drive0 is always /dev/hda and IDE1-Drive1 is always /dev/hdd.

With SCSI drives, they get their name dynamically. So if you have drives on SCSI ID 0, 4 and 5, you would have /dev/sda, /dev/sdb, and /dev/sdc (NOTE the lack of correspondance from the SCSI ID # and the drive name). NOW, lets say ID #4 DIED. Upon reboot, you would NOW see /dev/sda and /dev/sdb. Notice that old /dev/sdc is now "b". Sucks huh? This really can screw things up, especially for software RAID setups!!! Hopefully, this naming issue might be fixed in the 2.4.x kernels.

B. What drive do you boot from?

/dev/hda or /dev/sda

C. What drive is your / partition on?

/dev/hdaX, /dev/sdaX, etc

** For this example, I'm going to assume /dev/hda5 **

First, create a set of Linux RESCUE diskettes. This is done using "RAWRITE" or "dd" from images on your CDROM, an FTP server on the Inet, etc. You will need the BOOT and RESCUE images put onto diskettes.

Next, after you load up the rescue disks:

1. Mount your suspected "/" [root] partition

(mkdir /mnt/mnt; mount -t ext2 /dev/hda5 /mnt/mnt) Is everything there in /mnt/mnt as you expect?

A. No? Make sure you mounted the right partition. If you are *SURE* this is the right partition, umount this partition (umount /mnt/mnt).

Run "fdisk /dev/hda" and make sure all your partitions are there. If they are good. If they aren't, umount this partition, reboot and go into the CMOS setup.

Now, make SURE that your CMOS setup for the HDs (number of cylinders, heads, sectors, TRANSLATION) is configured the SAME way as when you installed Linux. I have seen a few times where the TRANSLATION settings were toggled from LBA to NORMAL or AUTO was being unreliable. For large HDs (> 1GB), it should be set to LBA.

NOTE: I do NOT recommend the use of "AUTO".

Upon reboot, re-run fdisk and hopefully your partition tables are ok. If not, I hope you documented your partition tables much like I did in the first chapters here in TrinityOS. If you didn't, you have a few last options.

Email me and I can give you some notes on how to rebuild a FS from the SuperBlocks or you can try some of the tools below. Please note that these tools might not be around anymore or there are now newer/better ones. If you know of other disk tools for Linux, please let me know.

Thanks to Harondel Sibble for this list ----------------------------------------------------------------- (i) findsuper is a small utility that finds blocks with the ext2 superblock signature, and prints out location and some info. It is in the non-installed part of the e2progs distribution.

(ii) rescuept is a utility that recognizes ext2 superblocks, FAT partitions, swap partitions, and extended partition tables; it prints out information that can be used with fdisk or sfdisk to reconstruct the partition table. It is in the non-installed part of the util-linux distribution.

(iii) fixdisktable ( http://bmrc.berkeley.edu/people/chaffee/fat32.html) is a LINUX utility that handles ext2, FAT, NTFS, ufs, BSD disklabels (but not yet old Linux swap partitions); it actually will rewrite the partition table,if you give it permission.

(iv) gpart ( http://home.pages.de/~michab/gpart/) is a utility that handles ext2, FAT, Linux swap, HPFS, NTFS, FreeBSD and Solaris/x86 disklabels, minix, reiser fs; it prints a proposed contents for the primary partition table, and is well-documented. Recommended! -----------------------------------------------------------------

Reboot into the rescue disk and try again. If things still aren't right, you are in a last ditch situation. The filesystem is probably a mess. Cross your fingers NOW and follow the next step.

B. Yes? Now unmount it (umount /mnt/mnt) and run a file system check on it. (e2fsck /dev/hda5) Make sure everything is cleaned up. You might be prompted if you want to fix things along the way. Say "Yes". If "e2fsck" it cannot complete, email me again and I can tell you how to do some final last tricks before you have to just format and restore from tape or completely re-install the OS.

C. Remount the / partition as show in A.

2. In /mnt/mnt/etc/lilo.conf, make sure that the "boot" line points to the correct boot drive (boot=/dev/hda).

NOTE: there should not be any NUMBER after the drive letter. This means its using the Master Boot Record or MBR to boot.

3. In the TOP most "image" section, make sure that:

- the specified "image" file exists in /mnt/mnt/boot - the specified "root" line is your actual partition for the / drive. - Exit out of the editor and save any changes

4. In /mnt/mnt/etc/fstab, make sure that the line that has the "/" in the second column reflects the correct drive and partition of your / partition. You should also confirm this for the possible other partitions like /var, /usr, /tmp, /home, etc.

5. Ok, here comes the magic if you DID make any changes to /etc/lilo.conf, run the following command from the rescue diskette

lilo -C /mnt/mnt/etc/lilo.conf -r /mnt/mnt

If everything goes well, you should see LILO run and print out all of your configured kernels with the top-most one with a "*" next to it.

6. Reboot and hopefully things are ok now.


Next Previous Contents