Last updated on Tuesday 9th of August 2022 09:52:37 AM
©VMWare ©ESXi Backup Ten Commandments.
Ten basic things you must keep in mind at all times when backing up ©VMWare ©ESXi virtual machines
- Meta data
A running VM is not enough proof of a VM being healthy. There exists meta-data associated to each .vmdk disk, you can access that meta-data by using vmkfstools -t0 option.
It contains information on where each chunk of data is located in the physical hard disk and it is employed by ©XSIBackup to detect zero zones and the data to backup.
Should this information get corrupt, your backups will collect random data and will be useless. It is rare that this kind of thing happens, we have detected this sort of issue
very seldomly. It could be caused by some disk conversion not having been performed well or some faulty hardware. The solution is very straight, just clone your .vmdk disks with
vmkfstools and the meta-data will be updated and continue to be updated correctly.
- VMFS is great to host VMs, not so great to host deduplicated data
VMFS-5 has aproximately 130,000 available inodes. You don't need more than that to host VMs, as they are comprised by a limited number of big files. Storing deduplicated repositories
consists basically in the opposite kind of task, namely: storing millions of small files in a hierarchy. This makes VMFS5 not to be suitable to store deduplicated repositories.
©VMWare has not officially published the number of available inodes in VMFS6. Nevertheless it's easy to bechmark it by creating a lot of small files with a script. We did so and found
that it can host millions of files, still it's not very fast. Thus if you host repositories with the smallest possible block size (1MB) offered by ©XSIBackup-DC, you will find it
slugish and not very fit for the purpose.
That's not a problem though, as ©ESXi offers a lot more posibilities, like NFS datastores, which can use virtually any FS you want to use. This is the kind of storage media you
should use to store ©XSIBackup-DC generated repositories. The file systems you should use are ext4 or XFS, they are blazing fast and allow to store a virtually unlimited number of files
You are probably wondering if you could use some other FS; sure you can, we have used other FSs to store repositories, like: NFS, ext3, BTRFS, etc... In any case, we recommend that you use
ext4 or XFS, as we have taken the time to find they are the best for the task. If you have plenty of time and want to investigate, great, use ext4 or XFS in production though, we guess you don't
want to play russian roulette with your valuable data.
If you have a second disk, you can use an ext4/XFS NAS VM on that secondary disk and mount it over NFS, this way you have the advantage of using a 10GB virtual NIC which will allow faster backups.
This is one of the most celebrated features of ©ESXi, it allows to keep multiple states for a given VM. For snapshots to work well, you need to understand what they are and how they interact
and depend on the guest's OS.
You can take a snapshot without any further ado, it will work well most of the times. It will produce a new snapshot set of files which will contain the data written to disk since the moment
you took the snapshot. That data will be commited to the base disk (the -flat.vmdk file or some previous snapshot in the stack) or discarded when you delete that snapshot, depending on your relative
position to the snapshot cut point in the snapshot chain. You can move along the chain of snapshots to choose that position.
Things start to complicate a bit when you have some database service running in your guest. Why?, cause when you ask ©ESXi to take a snapshot, it just does it. If for some reason some file is being
written to disk while the snapshot is created, that file may not end up fully written to any of the disks: the base disk or the just created snapshot.
This is not an issue when we are talking about user files, like office files, photos audio, etc..., as the file will be consistent next time you commit the data to disk, but when you run a database service
of any kind, writes need to be atomically consistent. That is: all data is written or no data is written at all. This ensures DBs are consistent and continue to work properly, imagine what could be to
have a transaction partially written to the DB, where a new invoice is partially created and it can't be related to any client.
To avoid that ©ESXi snapshots have a special mode in which they are "quiesced" when taken. Quiescing means nothing else than making sure any pending I/O operation at the time to take the snapshot is
consistently written to disk before actually taking the snapshot. Consistently in this case means anything the underlying DB service needs to complete to have usable data.
For that to be possible, the snapshot system has to communicate with the guest service through ©VMWare Tools to allow any busy service to end writing data before actually taking the snapshot.
You should read our dedicated post on troubleshooting snapshots.
Most Linux distros work well out of the box with the most popular database software. Nonetheless Windows servers can be specially tricky at the time to be quiesced, they will need additional software in
the form of some helper service to coordinate the DB service with the shadow service, which is ©Microsoft's implementation of a Logical Volume Manager.
- Keep your server in a healthy state
We constantly find users with servers raising all kind of errors, from virtual machines which have been deleted from disk but are still registered to the ESXi host to faulty hardware or old virtual
disks which have not been updated to a fairly recent HW version, or even worse, contain corrupt meta-data tables. Again vmkfstools is the tool to use when you suspect some virtual disk is not
in a perfect working condition.
©VMWare will offer you the best virtualization platform, your system will be as reliable as you make it though.
- Check your backups and new available ©XSIBackup versions from time to time
You don't need to restore every backup to know it works, ©XSIBackup is reliable and backs thousands of ©ESXi hosts up every day. We also provide the tools to check the backups (--check action)
which allows to know whether all recorded blocks for a given VM are present.
Still, there exist circumstances that may produce unusable backups, like meta-data corruption or silent disk corruption. You should deploy procedures that ensure your system is working well, like: carrying
SMART tests on your disks regularly and also restore some random VM in your backup sets from time to time.
We are constantly maintaining the software. We do our best to release bug free versions, nonetheless we ourselves or our users may detect some bug from time to time. We publish bug information in the "Bug tracker" section of our forum and we publish bug fixes and improvements in the change log of each product:
•XSIBackup-DC change log
•XSIBackup-Classic Free&Pro change log
- Use pruning mechanisms (--prune & --rotate) cautiously
Before deploying your backup topology, think what's your goal twice. ©XSIBackup-DC will compress your backups well over 95% as soon as you accumulate some backup cycles in the repository.
This is enough to allow you to multiply your storage capacity by many times when compared to storing full backup sets.
When you are in a corporate environment, you will tipically want to keep as many backup restore points as possible and even archive old repositories to be kept for some time.
Pruning (--pruning) and rotating (--rotate as an extension of the --prune command) are destructive operations by definition, they locate and delete blocks which are no longer used by the repository.
This is an extremely intensive operation that will use around 60Kb of memory per stored non-zero gigabyte of data. This isn't much, take on account that you may host many terabytes of data though.
Thus a repo storing 30 days of a 3TB set would require around 5 GB of RAM just to store the block info previous to pruning. It isn't much on a decent ESXi server, but some users tend to overload servers, and
you can't rely on overcommitting resources when you are performing a destructive operation.
Don't get us wrong, we prune our own repositories every day, not only to save space, but to constantly test all functions. It is a mature function that has been recently fully revised, still: pay double attention when pruning. When you backup data you are just copying data from one place to another, you read and write. On the contrary, when you prune data, you compare block logs and then delete the unused ones, there's
a fundamentally conceptual difference between the two operations. When you prune data, perform a --check on the repository, or at least on the most recently added data from time to time, just to make sure that everything is working as you expect.
- Don't use worn out disks to store backups
Worn out disks are an excellent storage media to keep your photos and movies. Please, don't use old disks to store your virtual machine backups. I know this sounds obvious, but many SMEs tend to over amortize their assets, curiously cars are renewed more often than backup disks in many places.
Use new disks to store backups and change them every two years. Are they still in use?, great use them for some desktop or some other non-critical task. Run SMART tests on your disks daily and change them as soon as some damaged segment is detected, even if they are below the two year threshold.
- Always leave at least 20% of your SSDs free
This is a rule that will probably be waived in short, as SSDs manufacturers are constantly introducing new improvements that reduce that need to keep some space left not to hurt performance.
Still, if you don't know the inners of your SSDs, in order to know what's the limit you can take them to in regards to space utilization, then keep 20% free at all time.
- Use SSDs as cache if your main storage media is still regular HDs
SSDs price has dropped so much that soon HDs will be a thing of the past. Nonetheless, at the time to write this, HDs are still widely used, specially when it comes to store huge amounts of data as in the
case of VM backups.
©ESXi includes the possibility to use an SSD disk as cache for the server. This is specially useful, as SSDs are much faster than HDs and can be used to store data temporarily before definitely committing
it to the HDs. The bigger the SSD the better, also the faster it is the better it will work, thus big M.2 devices will be a better solution than small SATA ones.
Of course, if you can afford to have all storage in the form of big and fast SSD devices much better.
- Keep things tidy and use a reasonable naming convention
People don't usually care much about how they name things when it is indeed extremly important. Per instance, the space character is the default IFS (Inner File Separator) in NIX systems, that means
that every command line utilty will try to split things by the space character by default. Of course we have made sure, specially in DC edition, that all spaces are addressed adequately to select VMs and locate any possible datastore or path. Still, being a Sysadmin consists in part in identifying possible issues and preventing them from happening.
Naming a VM "Linux server of Mr. Häring's HHRR department" and putting it inside the "New iSCSI datastore" might be extremely descriptive of what that VM is and where it is stored, "/vmfs/volumes/iSCSI01/L01HHRR" is equally descriptive though. Not only it offers the same information, but the chances that some process crashes have been drastically reduced.
Then, what's the best approach?: well the best approach would be to create a naming convention that assigns things a name of a fixed length with no spaces. That would greatly simplify your scripting and would also reduce the chances to hit some bug.