Pete Cheslock

DevOps, RelEng, DevTools, Automation, Randomness

VMware Fault Tolerance

vSphere was just released to general availability today, and one of the best features of this upgrade is the addition of VMware Fault Tolerance.   From the VMware site:

VMware Fault Tolerance is leading edge technology that provides continuous availability for applications in the event of server failures,  by creating a live shadow instance of a virtual machine that is in virtual lockstep with the primary instance. By allowing instantaneous failover between the two instances in the event of hardware failure, VMware Fault Tolerance eliminates even the smallest of data loss or disruption.

At VMworld 2008 they let us play with a demo of VMware FT, and it really is an amazing technology.  Almost like watching your first VMotion (“You mean the VM moved from this server to that server?”).   VMware FT will allow you to have two running versions of the same virtual machine.  If you lose a host, the VM will continue running with no dataloss and minimal downtime (technically just a couple pings drop, but your users would not be likely to notice a disruption of service).  VMware FT does this by sending the same CPU instructions to both CPU’s via a FT logging NIC, which is a dedicated gigabit or better ethernet NIC on your vSphere hosts.

With any software that gives you that kind of power, there are some caveats and requirements to make FT work in your environment.   I felt it was a good idea to start a blog post that I could update with the various requirements for the use of FT with vSphere.  This list is my no means all-inclusive, but simply a place where I can keep track of the needs and caveats of FT.  Read more for my listing of requirements that I’ve found thus far.

Host Requirements

CPU

  • AMD Barcelona (Series 13xx, 23xx, 83xx)
  • Intel Harpertown (Intel 31xx, 33xx, 52xx, 54xx, 74xx)
  • Specifically a Hardware Virtualization (HV) enabled CPU
  • Intel VT or AMD-V enabled in the BIOS
  • Disable any power management in the BIOS (recommended)
  • Disable Hyper Threading (recommended)

Network

  • 2 FT Logging NIC (suggested)
  • 1Gbps or better

Storage

  • Shared
  • Fiber Channel, iSCSI, or NAS

ESX

  • Same build version of ESX on each host.
  • VMware HA must be enabled on the primary and secondary hosts in the cluster.

Guest Requirements

  • All ESX supported guest OS’s, 32bit or 64bit
  • One (1) vCPU  - vSMP is not yet supported.
  • Thin-provisioned disks are not supported (they will be converted to thick)
  • Paravirtualization is not supported
  • Physically attached CD-ROM, Floppy not supported
  • Physical RDM’s (Raw Device Mappings) not supported - Virtual RDM is supported.

Caveats

  • Storage VMotion not supported
  • N-Port ID Virtualization (NPIV) not supported
  • Need to have no single points of failure in any part of the environment (not required, but defeats the point of FT if your environment is not redundant).
  • DRS can not be enabled on the protected VMs (You can still run manual VMotion’s)
  • Hot add of of devices to the protected VMs is not supported
  • Snapshots are not supported (must be deleted before protecting)
  • VM Hardware must be at v7
  • Remove 3rd party clustering solutions prior to enabling FT.

Anything else that I’m missing?  Let me know in the comments, and I’ll keep the above info updated.