Registered users
Linkedin Twitter Facebook Google+
  This website uses its own cookies or from third parties Close
33HOPS, IT Consultants
33HOPS ::: Proveedores de Soluciones Informáticas :: Madrid :+34 91 663 6085Avda. Castilla la Mancha, 95 - local posterior - 28700 S.S. de los Reyes - MADRID33HOPS, Sistemas de Informacion y Redes, S.L.Info

<< Return to index

© ESXi Snapshot Errors and Solutions.

One of the most common sources of support requests from part of our registered users is problems having to do with snapshots: creation, deletion, quiescing, etc...

We have already covered the most common scenarios, like known bugs with MBR Windows x64 OSs, missing services and so on.

In any case, the most common reported errors have to do with snapshot creation and deletion, and the accumulation of snapshots in the VM folders. In the vast majority of occasions, this problems arises when the VM administrator is not consolidating the VM on time when requested to do so. One may wonder why ESXi asks the admin to take action and it does not consolidate the VM automatically. Well, that could be a theme for other post, but pledging to reality, the fact is that it does not consolidate VMs automatically and thus the administrator (you) must do it.

So the basic message of the post in first instance is to remind you to keep an eye on your VMs and consolidate them from time to time or when asked to. Apart from that, the thing gets interesting from this point: "what if I did not consolidate on time and now my VM is piling snapshots and refusing to create new or delete the existing ones?".

Follow this points as a simple procedure. Most of the times running this checklist will suffice:

0 - Check that you have enough space in the virtual disks.
1 - Check that your datastore is not full and that it has sufficient space to maneuver.
2 - Consolidate manually and try again. If this step fails go to point number 3.
3 - Delete all snapshots, consolidate and try again. If this step fails go to point number 4.
4 - Stop the VM, delete all snapshots and consolidate with the VM in a stopped state, then switch it on. If this step fails go to point number 5.
5 - Stop the VM, delete all snapshots, restart the host daemon and try again: /etc/init.d/hostd restart. If this step fails go to point number 6 (only if you can discard snapshot data).
6 - Only if you can discard snapshot data. Stop the VM, delete all snapshot files: VM-000000-delta.vmdk, VM-000000.vmdk, *.vmsn, .vmsd, restart the host daemon and try again: /etc/init.d/hostd restart. If this step fails go to point number 7.
7 - Reboot ESXi and keep on reading.


First of all you must understand what those snapshots mean from a conceptual point of view. They are a chain of data, it's like if you had split the last I/O operations to disk into chunks and kept them there waiting for something to happen. How extensive that data is and how many snapshot files are there is just a matter of amount, but it does not modify the concept. They are waiting there for somebody to commit them to disk or discard them.

A general system error occurred:

There is a problem that arises from time to time in a given system, you cannot create snapshots, nor delete any pre-existing one, the event log shows A general system error occurred. Sometimes this problem persists even after discarding the snapshot data files manually as explained in point number 6, even after rebuilding data from a chain of snapshots as explained in the following paragraph. This drives users crazy and seems impossible to fix.

We have been able to reproduce this problem, that has to be considered an ESXi bug, in 5.X and 6.X systems. For some unknown and buggy reason, the VM regenerates a .vmsd file with invalid information, even after deleting all the snapshot files manually, including the .vmsd file itself. The bad .vmsd file that reproduces itself without any apparent reason, contains information about a snapshot that does not exist any more.

It does not matter how many times you turn the VM off and delete the .vmsd file, the wrong information reappears over and over. Even if you clone the VM from the topmost snapshot, the wrong information keeps on being thrown into the .vmsd file.

It is clearly not something in the ESXi host itself, as unregistering the VM and registering it again with a different id does not help the problem. Thus, it has to be something related to VMWare Tools.

Solution:

We have found that deleting the .vmsd file once the VM has been turned on and the wrong .vmsd file has been recreated, allows to create a new snapshot. From this point on, the problem seems to get resolved.

As some of the snapshot descriptor files can be damaged if this problem affects you, the best way to give remedy to it is to clone from the topmost snapshot, switch the newly created VM on, delete the .vmsd file and take a new snapshot.

Rebuild the VM data from the chain of snapshots:

If you are lucky, the consolidation will work and you will be able to commit those snapshots to the base vmdk files. If you aren't, then you will need to rebuild everything into a consolidated base vmdk file, or a set of them. If you have more than one virtual disk in that given VM. By doing that, you will loose your snapshots, but will save your data. This means you won't be able to go back to a previous state of the VM, but at least you will keep your valuable data. At this point you should consider yourself fortunate that you can do so with some minor hassle. If your set of snapshots is in a good state, then you may decide the snapshot from which you will consolidate the broken VM, and thus the point in time to which you will revert it. In any case, this procedure is much more complex than a simpler full consolidation starting with the topmost snapshot, specially if you want to preserve the remaining snapshots in the chain. In this post we will cover the simplest scenario and will recover the VM from the topmost snapshot present. If you want to recover from an earlier one, just follow the same procedure, but from a previous snapshot in the chain, and discard the rest of the data.

The standard procedure to give remedy to this is to clone the .vmdk files one by one from the topmost current available snapshot. To do that you will need to use vmkfstools in the following way: find the topmost snapshot .vmdk files, they will be something like yourVM-00000N.vmdk, you’ll find one 00000N.vmdk file per disk in the VM, if you only have one disk, there will only be a set of snapshot .vmdk files. Locate them all and clone each one of them (from the highest N present) by using vmkfstools this way:



This will create a new .vmdk file containing all the consolidated information from the base disk plus all the snapshots in the chain. Next step is to copy the .vmx file to the destination dir "/vmfs/volumes/datastore2/yourVM" in the example) and edit it by using vi editor to reflect the new paths to each disk. That’s all, you can switch your VM on, and if all steps were taken adequately, you’ll have a new sanitized VM.

Daniel J. García Fidalgo
33HOPS



Website Map
Consultancy
IT Manager
In Site
Resources & help
Index of Docs
33HOPS Forum

Fill in to download
The download link will be sent to your e-mail.
Name
Lastname
E-mail


            Read our Privacy Policy