I ran into an very interesting issue today with a client who is using Veeam Backup and Replication to keep their virtual machines replicated to a remote ESX server for disaster recovery. Veeam starts a replication job and will take a snapshot of the virtual machine and then replicate the main VMDK disk file to the remote site. When the backup job finishes Veeam will tell VMware to remove the snapshot until the next replication schedule runs. Since we are replicating our VM’s across a slow WAN connection (600Kbps optimized with Citrix WANScalers) the replication can often timeout, or hang. Today I noticed that the replication had not updated since last night. So I needed to stop the replication and re-start it. Since the Citrix WANScalers can cache as well as compress, restarting a failed replication job is usually pretty quick, as most of the data was previously cached on the Citrix boxes. Here are the details of what I found, and how I fixed it…
To make the snapshot management easier, I store the VM configuration files on a separate LUN, and where you store the VM configuration files is where the snapshot deltas are created. This lets us keep the main VMDK’s LUN’s fairly static, without worry of snapshots filling up our available space. When looking at a specific VM today, I noticed that the data stores listed only showed the Snapshot LUN. This had meant that there was a snapshot taken which had not been removed. This particular VM was not currently replicating, so I knew that snapshot should not have existed. Normal operation should show both the snapshot LUN and the VMDK LUN.
When going to the Snapshot Manager, I was not able to see any snapshots on that VM.
I accessed the Datastore Browser to see if there were any delta VMDK’s on the disk; there ended up being 2 delta’s on my datastore.
I wanted to confirm that the virtual machine was indeed running off the delta disk. To check that, I simply went to edit the settings of this virtual machine, and looked at the virtual disk object. In this instance it was accessing the disk “exch-000002-delta.vmdk”, which was one of my delta disks.
There is a command you can run on the service console to try to remove snapshots if you are unable to with the VI Client.
When I ran this command, I received the following error:
VMControl error -3: Invalid arguments: Virtual machine has no snapshots
Doing some research on the VMware communities website, I found a recommendation to create a new snapshot excluding the VM memory, and then removing the snapshot. When I created a new snapshot on my virtual machine, I saw something very interesting. I saw an additional snapshot called “Consolidate-Helper-0”
At this point I deleted all the snapshots from the VM, and waited for the process to finish. A couple of my snapshots were pretty large, so vCenter timed out before they finished. I waited an hour, and then confirmed they were gone by checking the virtual disk resource in the VM settings.