Last updated on Monday 28th of February 2022 08:52:48 PM

How to use snapshots to increase your backup density

(*) This technique can't be used in every case. ©ESXi 7.0 per instance forces the removal of all snapshots before being able to backup the VM.

Snapshots are one of the key features of the ©VMWare Hypervisor OS: ©ESXi. They allow to create points in time to which you can later return, should something go south in your VM set up.

Even though they are one of the key features of any virtualization technology, they are not well understood by users and even some sysadmins. That frequently leads to undesirable situations in which data is compromised when not lost.

The "snapshot" concept has been around the storage world for some time now, there are some particularities in the way each technology works, both conceptually and, of course, functionally.
Image #1: VMWare ESXi Snapshot Chain
You can find snapshots in LVM & LVM2, a Logical Volume Management layer which is nowadays solidly integrated into most Linux distros. Microsoft developed its own concept into what they call Volume Shadow Copy Service, many other similar concepts have been built into most virtualization platforms, including the ©VMWare ©vSphere Hypervisor, namely ©ESXi.

In case of ©ESXi snapshots work both as intermediate restore points and a way to keep the state of a given VM consistent at some point in time. In versions previous to 7.0 they also served as a way to free the base disks so that they could be copied while the VM kept running, this is not possible any more since ©ESXi 7.0, as all files in a chain of snapshots are blocked for reading, even the -flat.vmdk files.

In Image #1 you can see a representation of a Virtual Machine with three snapshots. As from ©ESXi 7.0 and on all files are blocked for reading when the VM is on, we will pose our examples with a VM that is always off. Nonetheless in previous versions you could access all files in the chain, except for Snapshot #3, which is the file the VM is temporarily saving all writes to, or, as we like to say, it's the snapshot the VM is running on.

The Image #1 scenario can be created by taking a snapshot of some VM, copying file1 to it, taking a second snapshot then copying file2 and last, taking a third snapshot and copying file3 to some place in the guest's file system. By doing so, file1 will be contained in the first snapshot, file2 will be contained in the second snapshot and the last file, file3, will be hosted in the third snapshot.

Cloning from the snapshot tree

In above example (Image #1), you could use vmkfstools to copy the VM from any of the available restore points, represented by the green arrows, by just using any of the three snapshot .vmdk files as the source of the clone.

Thus, if we used the base .vmdk file, the one named after the VM with no -00000N.vmdk part (in our case the blue token), we would be generating a clone as the VM was previous to taking the first snapshot and with none of the three files in it.

If we cloned the VM from the first snapshot, tipically file -000001.vmdk, we would generate a clone of the VM as it was before taking the second snapshot and without file2 but file1 would be present. If we cloned from the second snapshot we would find file1 and file2, but not file3, as the VM would be a clone previous to taking the third snapshot, and so on.

Removing snapshots

The action of removing a snapshot from some UI is a bit misleading. When we think of the verb "to remove", we visualize some form of deletion, nonetheless that's not what happens when you remove a snapshot from an ©ESXi's virtual machine. Image #2: ©VMWare ©ESXi deleting a snapshot

Removing a snapshot integrates the data the snapshot is temporarily holding into the base disk. Image #2 illustrates this operation. The image represents each pending I/O operation temporarily stored to the Snapshot #1 file being written to the base disk. Once you delete a snapshot, you don't lose the data contained in it, the operation can't be undone though.

You could delete the other snapshots as well. They would be integrated into the previous one, namely consolidating the data into fewer files.

As per what I have described so far, ©ESXi's snapshots are, alone, a great way to keep different restore points in time.

As a result of that, should you be some sysadmin testing different versions of some sofware, like OS patches, or some kind of software that needs to be rolled back cleanly, snapshots are definitely the best tool you will ever find. They are by no means a way to perform backups though, as any hardware or human failure could make you lose your data.

©XSIBackup is our solution to ©ESXi backups. It can copy your data differentially and using compression+deduplication to any local or remote file system via SSH. It's extremely fast and reliable, you can download a free version which is totally functional with some limitations.

Backups and Snapshots.

©XSIBackup allows to perform backups that contain previous snapshots. Unlike some other backup solutions, which consolidate all snapshots previous to backups, ©XSIBackup will preserve any pre-existing chain of snapshots, so that when you restore your VMs, you can still decide at which snapshot point you want to recover the state of your VM. It's like being able to pack multiple restore points per backup restore point in a fractal fashion.

On top of the above ©XSIBackup allows to perform backups to deduplicated repositories with unlimted restore points. As each restore point may as well contain multiple snapshots, you can enrich your restore possibilities virtually ad infinitum.

Let's describe the following scenario:

You have an ©ESXi server with a SAP One installation plus a VoIP PBX and a file server. This is a typical configuration for a Call Center per instance. Imagine your inner requirements from the head of the IT department is to have one restore point every two hours.

You couldn't probably perform a full backup every two hours, maybe some differential backup, still your hardware would need to be sized for the task, like having a dedicated controller or network segment. The latter is not always possible, this is where snapshots come in handy.

Keeping a big number of snapshots in a production VM is not a good Idea, nevertheless, if your server is not overloaded and you still have some margin to play with, you could configure a snapshot to be taken every two hours. Then after every six hours (three snapshots) you can perform a full differential backup that will contain four restore points: the initial one plus one per snapshot as described above. You then delete all snapshots consolidate data and start again with no snapshots.

That will offer you 16 restore points per day instead of four with about the same noticeable load that would produce one differential backup every six hours.
Image #3: Fractal restore points
At Image #3 we have depicted a scenario in which three restore points with 5 snapshot based additional restore points produce a restore structure of 3 x 5 => 15 different effective restore points. Each green arrow represents an additional restore point provided by the snapshot structure.

You can stretch this concept to adapt to your needs, that will in turn depend on your hardware's capabilities, the amount of data to backup and your internal requirements.

Adding the guest to the equation:

But wait, what if the guest OS would also allow to take snapshots inside of it, like LVM2 based systems or Windows Server through the VSS Volume Shadow Service.

That would allow you to program inner snapshots throughout the day in such way that you could also use them to restore the guest to some state in between the ©ESXi snapshots. Let's say that you take one snapshot every half an hour and them delete them to start from scratch.

If you were able to sync those inner snapshots with the hypervisor level ones, you would multiply your available effective restore points by an additional x4, yielding 60 different restore points with the load of just three differential backup cycles, but I'll leave that for the next post...