Registered users
Linkedin Twitter Facebook Google+
  This website uses its own cookies or from third parties Close
33HOPS, IT Consultants
33HOPS ::: Proveedores de Soluciones Informáticas :: Madrid :+34 91 663 6085Avda. Castilla la Mancha, 95 - local posterior - 28700 S.S. de los Reyes - MADRID33HOPS, Sistemas de Informacion y Redes, S.L.Info

<< Return to index

© XSIDiff, what is it?

XSIBackup was born a couple of years ago to offer an easy way to backup ESXi VMs from the ESXi OS and without any additional dependency. As it does not use any VMWare propietary API, it works on any commercial or free edition of ESXi from version 5.1.

XSIBackup uses two ways to copy VMs: vmkfstools and Rsync. Both are awesome tools of proven reliability and they provide a trustful method for copying .vmdk files. In any case both lack something important. Vmkfstools can copy .vmdk files ignoring holes at an optimal speed, but it lacks the ability to do differential copies and cannot read or write from/ to STD IN/OUT. Obviously VMWare provides this tool as a basic mechanism to copy .vmdk files and doesn't want to give "too much". On the other side we have Rsync. It's probably the best differential backup tool out there, and XSIBackup uses it mainly to backup over an IP network.

Rsync is an awesome tool, all of us can swear that, but it has a big drawback: it was not designed with virtualization disks on mind. It is a general purpose tool, and that can be a serious inconvenience. Checksum calculations are so heavy that syncronizing two huge terabyte files is, most of the times, a matter of hours just to get the diffed bytes. CPU time is the price to pay to save bandwidth. On top of that, the ESXi environment will only use one of the available cores, so we have limited CPU power.

Nevertheless in a virtualization host we can make some assumptions in order to move to an optimal tradeof between CPU and memory/bandwidth usage:

1 - We don't want to copy all of the bytes again, we want to optimize memory/bandwidth usage but we don't care too much if we copy a bit more than we should. Specially if ensuring that we get the exact diffed bytes will cost a lot of CPU.

2 - Files to mirror will be of the same size. If not we will truncate/ expand them aligning to the first byte in the hope that we will find big blocks of unchanged data.

3 - We know there will be a number of bytes that will be new, and probably some others that have been moved around. But it's okay if we miss something, just as long as we make an exact copy, it is not that important to minimize the transfer to the exact number of changed bytes.

We could make an analogy with the stock market "let someone else win the last dollar". Computer science is an empiric matter. We want to make an optimal utilization of our resources to make our enterprise be competitive. I could understand a matematician would not be considered a serious professional if he went like: "hey man, I summed 2 plus 2 and I got five, come on, I only missed one". On the contrary It might be of great benefit to transfer 10% more data than the exact number of changed bytes in sake of keeping the CPU and memory calm.

XSIDiff has been thought to find that optimal tradeof. It compares files by applying basic algorithms assuming that, most of the times, our bytes will be in the same place, and paying the price of our eventually wrong assumption in terms of bandwidth if not.

XSIDiff will turn into a more general Linux tool in some time and will allow to diff over the network too, stay tuned.

Daniel J. García Fidalgo
33HOPS



Website Map
Consultancy
IT Manager
In Site
Resources & help
Index of Docs
33HOPS Forum

Fill in to download
The download link will be sent to your e-mail.
Name
Lastname
E-mail


            Read our Privacy Policy