Last updated on Monday 28th of February 2022 08:52:48 PM

©XSIBackup Classic - Different checks to verify backup integrity

 Please note that this post is relative to old deprecated software ©XSIBackup-Classic. Some facts herein contained may still be applicable to more recent versions though.

For new instalations please use new ©XSIBackup which is far more advanced than ©XSIBackup-Classic.

You've probably noticed that we use terms as Trivial Check, disk checksum or --certify-backup all over the website. All those terms are related to checking whether a -flat.vmdk disk that has just been backed up is identical bit by bit to its original counterpart, or not.

This type of check is fundamental, in order for you to sleep profoundly, that is to say, while you are absolutely sure that the backed up files are a working backup set.

There is only one way to be 100% sure that two files are identical, which is to compare the string of zeros and ones that compound it. Needless to say, when we are working with files in the order of hundreds of gigabytes, like virtual disks, a full comparison is not very effective, if we take on account the time it would take to compare them and the resources that we would use to accomplish this task.

Luckily, there exists a different approach to comparing files, which does not imply comparing every bit. We can instead compare their checksums, and if such comparison is positive, then we can be pretty sure that both files are identical. Being pretty sure here, means having a certainty in the order of astronomical figures, that both files are identical. The astronomical certainty may vary, but still will keep you in a comfort zone.

Some of you may be thinking: "this guy is crazy, he's saying that we can be pretty sure the files are equal". Well, when we are talking about probabilities in the order of one particular atom in the whole universe hitting your window. That is an example of an event having a probability in that order of figures, then, being pretty sure is more than enough for most people.

Trivial Check is used in OneDiff backups, which require some additional checks to ensure data integrity, it may be used in some near future to check other types of backup, as: XSIDiff, Rsync, ...

But still, calculating hashes for big files can be time consuming, specially if you don't have fast SSD disks and powerful CPUs. This is where our Trivial Check comes into play. How does it work?, well, it uses three different simple checks (4 in total) to find facts about your virtual disks. It's goal is to reduce the time it takes to compare disks, in comparison to performing a full check on the whole disks.

Four checks of Trivial Check:

1 - No data transfer errors are detected. Should we get a TCP error, the rest of the Trivial Check would be invalidated.

2 - File sizes are the same. This is not a very strong check on it's own, as you may imagine, but it makes the other checks be more consistent.

3 - First N megabytes checksum. This is a configurable value at conf/xsiopts. This checks that the initial part of both disks (original and backup) are identical. Many operating systems will store one or more copies of the data structure in what is called a superblock. Any minimal change in the data of the guest OS would render the superblock and it's redundant copies different, resulting in a different checksum for the initial part of the disk. This check can't determine on its own if some data was changed, but gives us a very valuable insight.

In the particular case of Microsoft OSs and NTFS, the FS meta-data is stored in the MFT (Master File Table). The MFT and its replicas are not necessarily stored at the beginning of the disk, so in this case the Trivial Check is not as consistent as in the case of Linux operating systems.

4 - Total number of used space in both disks is the same. This check will only work when using VMFS as the underlying file system, nevertheless, it is the recommended file system to use when working with OneDiff, Rsync or XSIDiff programs.

5 - In case of OneDiff backups, not only the above checks are performed, but ESXi checks VM structure and coherence in order to be able to integrate snapshots with base files.

O.K., got it, so what?

Well, it's obvious that this set of checks is not equivalent to a full checksum comparison, in any case, the key is to look at it upside down: it's a really fast check that will rapidly identify most of the copy/ transfer errors and put us into an alert state. On top of that, it's dirty, looks awful, but it's much. much better than nothing.

How much better than nothing is it?

Well, i'd say it's something in between nothing and a full checksum. Calculating the exact probabilities of a false positive is something that I will leave for some probability enthusiast. I'm really very busy working in XSIBackup development and cannot remove the dust out of my school books. Anyway, there's probably no way to come up with a fixed figure, as there are too many assumptions into play. Try to look at it this way:

This is like looking at a transparent cage where you can see a mouse head, a moving tail and four legs, the rest of it is covered with some black paint. If you see all those parts and the tail is moving, then you most probably have a working mouse.

But the reductio ad absurdum point of view is even more interesting. If you see a mouse head, a moving tail and four legs and the mouse is dead, then your computer is trying to cheat you.

Jokes apart, if you have two files which have the same checksum for their initial 500 mb (default), have the same sparse size and also use the same amount of space inside the sparse file, and, on top of that, you didn't receive any runtime errors when transferring the file. Then, the only way that the backed up file is not an exact copy of the first, is that some bytes were silently swapped in the unchecked zone, which is a very unlikely event. Still, you can perform a full checksum comparison, which is not that time consuming on the other side.


Daniel J. García Fidalgo