Last updated on Wednesday 30th of March 2022 04:31:04 PM

How to use Onediff differential features to lower the backup load

 Please note that this post is relative to old deprecated software ©XSIBackup-Classic. Some facts herein contained may still be applicable to more recent versions though.

For new instalations please use new ©XSIBackup which is far more advanced than ©XSIBackup-Classic.

In the previous post we saw how to rewire our hardware to bring it closer to it's full potential, this is basic, no matter what you do with it, you first need to know how to use all the pieces in your particular chess table. Still the proposed backup was simple, since it would copy all data to a timestamped folder. This basic set up has two disadvantages mainly: in first place it has to copy all data every time, so it uses a lot of I/O throughput, in second place each backup is stored independently, thus they use as much space as one full backup multiplied by the number of backups we have, that in our previous post was 9. This disadvantage has two sides in reality, as storing the data many times is less efficient from a disk usage perspective, but it's safer, as we have data stored many times and thus we have redundancy. We'll see how we can reach a good trade-of point between both extremes later on.

By now we'll focus in finding a way to carry on the same backup proposed in the previous paper in a more efficient way, both in regards to I/O and used space.

Improving I/O efficiency

©XSIBackup Classic offers different ways to improve I/O, lest's analyze them so that you end up knowing what to choose for your particular needs.

• OneDiff: this is one of the additional backup programs that ©XSIBackup-Pro Classic offers. What it does is to store the changed blocks since the last backup in a different file, to send just the changed blocks next time. This makes OneDiff super fast in comparison to copying all .vmdk disks every time. It only has one drawback, which by the way can be easily compensated by combining OneDiff with other ©XSIBackup-Pro tools; data is always copied to the same mirrored VM, thus you only have one single copy of all your backups. Whatever problem you have in your VM, it will be mirrored to the OneDiff copy, and you won't be able to recover a specific version of your VM on a given date.

Some ©XSIBackup experts, like Jeff Kaminski at Vancouver, combine OneDiff with an LVM2 snapshot, in a very smart way, at the datastore OS level, which is very fast, thus being able to keep a historic set of backups of up to hundreds of versions. It is recommended to perform a post backup checksum check, via --certify-backup before taking the LVM snapshot, this ensures the LVM snapshot chain will be consistent. This is just an idea, not a documented procedure from our part, but it's worth exploring this kind of synergies, as they leverage the potential of ©XSIBackup by making it interact with powerful and very well tested tools like LVM2.

Another approach, which does not necessarily involve managing tasks outside ESXi itself is making a full copy of the _XSIBAK mirrored VM every day as explained in this post. It's not efficient in terms of disk usage, but it offers speed at the time to make the backup in the production VM plus redundancy achieved through the daily copy, which can be made without disturbing the production set of VMs. You can also combine OneDiff with the next program in the list: XSITools, this will make your disk usage take advantage of block level de-duplication plus compression and allow you to multiply the number of versions you can store.

• XSITools: XSITools was introduced in ©XSIBackup-Pro 9.0.0. It performs block level de-duplicated backups of your .vmdk disks and stores the chunks in any VMFS volume or any other FS available through your attached datastores in the form of an XSITools repository. This is just a regular FS folder with some configuration files inside that host some parameters. ©XSIBackup-Pro will automatically create an XSITools repo, if it does not exist yet, by just parsing the --backup-prog=xsitools and the --backup-point arguments pointing to some folder in any datastore. You can safely delete XSITools repositories and create new ones whenever you want, they are self contained and have all you need to restore a backup inside. XSITools not only provides block de-duplication, but LZO compression on top of it, which allows to reach a high level of space utilization in comparison to using regular full backups. By which factor you exploit your disk space will depend on the volume of daily data you manage, but it can easily range from x5 to x7 on typical production machines. Using compression is recommended, as it will barely affect backup speed.

De-duplication takes advantage of reusing single blocks of data, this allows it to achieve a high degree of compactness in stored information, but as the classical myths teach us, its power is at the same time a weakness. As every block of data is stored only once, loosing a single block, shared by many versions of a virtual machine inside any deduplicated repository, can render all of them useless. This of course does not happen very often, as an I/O error must occur, fortunately hard disks are very reliable nowadays. In any case, we must always think about every possibility and take action before it happens. A good way to compensate the likelihood of a block loss is to alternatively backup to two different repos, or open one per month, or even better, combine both procedures. Another reason to rotate XSITools repositories is because, there is a limit imposed by the underlying file system, in regards to the number of files that can be stored in a volume. In case of VMFS, that limit is above 130,000 files, thus by using the default 50M block size you can store many terabytes of data, still "a lot" does not equal "infinite".

• Replicating XSITools repos: another interesting property of XSITools repositories is that, they not only deduplicate blocks of data and store the chunks in a compressed manner, but by doing that, they split information into smaller chunks and allow Rsync overcome its weakness at the time to calculate delta checksums. In fact at the time to replicate XSITools chunks, you can allow Rsync to just compare names and sizes, as blocks are unique, there is no need to use the delta algorithm on individual blocks, thus Rsync is used only to sync new blocks. This allows to synchronize XSITools repositories very fast and keep a redundant set of VMs distributed across WAN networks. Performing a check on every distributed repository is a must after every synchronization. This is done off site in regards to our production VMs, so it won't affect performance. Rsync can't manage VMFS file properties, so this kind of sync, must be done from within the NAS OS, otherwise, Rsync won't be able to compare file times and sizes properly.

Advanced backup case studies

Daniel J. García Fidalgo