Friday, February 10, 2012

Ubuntu Lucid 10.04 at EC2: cloudimg-rootfs does not exist

I've been having a seemingly intermittent problem where I can't boot a custom AMI built from one of Ubuntu's default AMIs.

My process:

  • Start with stock 10.04 LTS, instance-store, 64-bit, us-east-1 image: ami-35de095c
  • Fire it up: works
  • Install a bunch of stuff I need: no prob
  • Create a new AMI for next time with ec2-bundle-vol: "works"
  • Upload it, register it
  • Launch a new instance with this new custom AMI: box fails to come up!

The error message, available from get server log in the AWS Console, is this:

Gave up waiting for root device.  Common problems:
 - Boot args (cat /proc/cmdline)
   - Check rootdelay= (did the system wait long enough?)
   - Check root= (did the system wait for the right device?)
 - Missing modules (cat /proc/modules; ls /dev)
ALERT! /dev/disk/by-label/cloudimg-rootfs does not exist. Dropping to a shell!

Some Googling turned up a similar error for EBS instances using the XFS filesystem, but I was using ext3 and instance-store, so it didn't really apply.

The issue appeared to be that /dev/disk/by-label/cloudimg-rootfs is expected, but not present. What does this really mean? Took some help from a friend for me to realize that the filesystem name is stored on the physical (or virutal...) hard drive. When a new machine is imaged, it obviously won't have "cloudimg-rootfs" as it's filesystem name automatically. Apparently the AMI I was creating was not capturing the filesystem name. Found some evidence supporting this theory. I think the upshot is my version of ec2-ami-tools is missing the patch that records the filesystem name. I have 1.3.49953-0ubuntu1~lucid1 (update: This is an old backport which is probably the source of the issues), which by version number suggests it should have the patch, but I guess it does not (different versions listed here).

To deal with all this I just decided to not rely on LABEL=foo for referring to drives, and instead just use the good old direct notation /dev/sda1 (or whatever).

So the fix is: edit /etc/fstab, /boot/grub/menu.lst, and /boot/grub/grub.cfg and replace uses of LABEL=cloudimg-rootfs with /dev/sda1. This is less robust, but has the side-effect of actually working.

Work-around process:

  • Start with stock 10.04 LTS, instance-store, 64-bit, us-east-1 image: ami-35de095c
  • Fire it up: works
  • edit /etc/fstab, /boot/grub/menu.lst, and /boot/grub/grub.cfg as above
  • Install a bunch of stuff I need: no prob
  • Create a new AMI for next time with ec2-bundle-vol: "works"
  • Upload it, register it
  • Launch a new instance with this new custom AMI: boots!

I'll update this if I get to the bottom of the ec2-ami-tools version numbers.

Update: Looks like 1.3-45758-0ubuntu1.1 is the correct version of ec2-bundle-vol to avoid all this nonsense. Haven't tested yet.