Thursday 2 December 2010

How to fix VMware error fsck.ext3 Unable to resolve UUID with no data loss

It happened to me after a power failure, that damaged a server of our vmware farm. The resolution is presented below:

ESX 4.0 host fails to boot after power operation with the error: fsck.ext3: Unable to resolve UUID

Symptoms

  • After power-cycling or rebooting an ESX 4.x server, the following error message is produced during boot:

    fsck.ext3: Unable to resolve 'UUID=34d192db-17eb-442e-9613-c5c24c6fa9fa'


    And

    *** An error occurred during the file system check.
    *** Dropping you to a shell; the system will reboot 
    *** when you leave the shell.


     

  • After encountering this error, you are unable to boot into ESX or Troubleshooting mode.
  • The unresolvable EXT file systems or partitions most commonly later appear to have mount points such as /var, /opt and /tmp.

Resolution


 

This issue occurs when the boot-time file system check utility (FSCK) for EXT-3 file systems cannot resolve a file system (by UUID) defined in /etc/fstab.

Issues that can result in this may include:

  • The default roll-back option is left enabled when a subsequent upgrade is being performed.
  • The device not present during system boot.
  • The unresolvable EXT file systems appear to reside on disks/devices that are initialized later during system boot (e.g. the last LUN).

Note: If you are experiencing an outage with virtual machines down, consider resolving the situation in a timely manner through the reinstallation of VMware ESX. Troubleshooting may take more time than a reinstallation, which is in the order of approximately 20 minutes.

Otherwise refer to instructions below for submission of information to VMware Technical Support for technical analysis.

Further troubleshooting is available in the shell:

  • Confirm the UUIDs which were not resolvable, and remain so, by running fsck again without additional arguments. Information similar to the following is displayed:


    # fsck

    fsck 1.39 (29-May-2006)
    e2fsck 1.39 (29-May-2006)
    esx-root: clean, 32953/641280 files, 414801/1281175 blocks
    e2fsck 1.39 (29-May-2006)
    /dev/sdt1: clean, 35/140832 files, 25323/281596 blocks
    fsck.ext3: Unable to resolve 'UUID=34d192db-17eb-442e-9613-c5c24c6fa9fa'
    e2fsck 1.39 (29-May-2006)
    /dev/sdt6: clean, 31/250368 files, 27851/500220 blocks
    e2fsck 1.39 (29-May-2006)
    /dev/sdt7: clean, 22/250368 files, 16815/500220 blocks


     

  • Record the UUID or UUIDs which failed to resolve. You may take a screen shot of your System Management Interface, take a picture, or write the values down.
  • Confirm these same values in the /etc/fstab file.

    # cat /etc/fstab

    UUID=79815890-f11c-4907-80fe-d1cd6bf061f8 /        ext3    defaults                  1 1
    UUID=45460133-027b-40b6-8b4d-e52aaf4c417f /boot    ext3    defaults                  1 2
    None                    /dev/pts                   devpts  defaults                  0 0
    /dev/cdrom              /mnt/cdrom                 udf,iso9660 noauto,owner,kudzu,ro 0 0
    /dev/fd0                /mnt/floppy                auto    noauto,owner,kudzu        0 0
    None                    /proc                      proc    defaults                  0 0
    None                    /sys                       sysfs   defaults                  0 0
    UUID=34d192db-17eb-442e-9613-c5c24c6fa9fa
    /var/log ext3    defaults,errors=panic     1 2
    UUID=e32ec5f4-d795-414a-8d73-a2bb3ea86342 swap     swap    defaults                  0 0


    Note: Highlighted in blue is the mount point for the respective unresolvable UUID, in red.

  • Verify what UUIDs the system is currently aware of by running the following command:

    # ls -l /dev/disk/by-uuid

    total 0
    lrwxrwxrwx 1 root root 10 Nov  9 14:36 45460133-027b-40b6-8b4d-e52aaf4c417f -> ../../sdm1

    lrwxrwxrwx 1 root root 10 Nov  9 14:36 e32ec5f4-d795-414a-8d73-a2bb3ea86342 -> ../../sdr1
    lrwxrwxrwx 1 root root 10 Nov  9 14:36 34d192db-17eb-442e-9613-c5c24c6fa9fa -> ../../sdr2
    lrwxrwxrwx 1 root root 10 Nov  9 14:36 79815890-f11c-4907-80fe-d1cd6bf061f8 -> ../../sdr5

    Notes:

    • This output reveals the UUID-to-partition relationship for all discovered EXT partitions in the system. Affected mount points or content can be associated using the previous step.
    • It is possible in some environments that none of the known partitions reported by listing /dev/disk/by-uuid match the unresolved UUID. This is correctable; for additional instructions, proceed to the following sections and correct the content of the /etc/fstab file.


     

Solution

VMware is currently investigating further for a full root-cause and solution. Workarounds are available below.

If you are able to reproduce this issue while maintaining production via alternate servers, contact VMware Technical Support after completing the following:

  1. Log into the terminal of the affected ESX server.
  2. Remount the root partition in read-write mode:

    # mount / -o remount,rw


     

  3. Configure Serial Line Logging per the section Configuring the Service Console for VMware ESX 3.x and 4.x in KB article: Enabling serial-line logging for an ESX and ESXi host (1003900).
  4. Reboot the ESX server and log the results via your listening serial terminal.
  5. Contact VMware Technical Support and file a Support Request. For additional information, see Filing a Support Request (1021619).

Workarounds

Both recommended workarounds involve the modification of the /etc/fstab file. You may either:

  • Generate a new UUID for the affected file system(s) and update /etc/fstab to match the new value(s).
  • Update /etc/fstab to incorporate the correct UUID from the file system.

Applying a new UUID

Apply a new UUID to the EXT-3 file systems which fail to resolve and update the /etc/fstab file.

  1. Run tune2fs against each Linux partition on the suspected disk device. For example:

    # tune2fs -l /dev/sdr2 | grep UUID
    Filesystem UUID:          34d192db-17eb-442e-9613-c5c24c6fa9fa


    # tune2fs -U random /dev/sdr2
    tune2fs 1.39 (29-May-2006)

    # tune2fs -l /dev/sdr2 | grep UUID
    Filesystem UUID:          25a18c70-ffcb-4b15-9d2d-1cfab1754d86

  2. Update /etc/fstab with the updated UUID. From earlier steps, /dev/sdr2 partition was determined to be the /var/log mount point:


     

    1. Remount the root partition in read-write mode:

      # mount / -o remount,rw


       

    2. Open the /etc/fstab file for re-writing. For more information, see Editing configuration files in VMware ESX (1017022).
    3. Search for, and change, the original UUID to the newly-generated UUID from earlier steps, above.
    4. Save the file and remount the root partition in read-only mode:

      # mount / -o remount,ro

    5. Reboot the server using shutdown -r now.


 

You can read the full document at (check the "mount" syntax):

http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=1017162&sliceId=1&docTypeID=DT_KB_1_1&dialogID=127160699&stateId=0%200%20138435051