Last updated on Tuesday 2nd of August 2022 11:17:19 AM

©XSIBackup: Repository Backup

How to quickly keep off-site copies of your backup repositories

Some people like to keep an additional level of security on their data, like per instance making a copy of their backup repositories to an offline server or some USB disk.

Back in the days of ©XSIBackup-Classic edition (only two years ago at the time to write this), we would recommend using Rsync to keep mirrored sets of your deduplicated repositories. Block size was 50MB, thus 5TB of data would represent a set of 100.000 data blocks which were still very manageable.

©XSIBackup deduplicated repositories are comprised of millions of files in a multilevel hexadecimal hierarchy. The default block size is 1MB, which helps keep repositories smaller and achieve a higher data density.

How to copy a XSIBackup repository

Therefore the number of data blocks is multiplied by 50, which makes each TB of data use 1.000.000 files. This is not that manageable any more for a tool like Rsync working at the File System level, in special when calculating hashes for all the data.

We don't mean that it can't handle it. You just need some extra RAM and CPU and Rsync will in the end handle the extra load. You will nonetheless find that it starts to behave sluggish and that calculating the differential data takes long.

We can facilitate things a lot by adding the Rsync option --size-only, which will skip recalculating hashes for every block of data. We can very reasonably assume that the probability that some sort of data corruption yields two different blocks with the same size is despicable, as it would need to pass, not only the TCP checksum, but the SSH integrity check.

Rsync will still be your best option and sometimes -the only- option if you need to sync a lot of data through some relatively narrowband WAN that has some high latency (100-200 ms). If latency goes beyond that, you have many chances to be producing more blocks than you can copy over the WAN link in some given amount of time.

We have devised a method to overcome the limitations imposed by the TCP protocol over high latency links, you can read on the details in this post: Narrowband off-site backup over high latency link

Apart from Rsync there is a simple remedy that will make Sync'ing your ©XSIBackup repositories a simple task with no load at all on the File System.

It can be as straight as synchronizing a virtual disk if you use a virtual appliance instead of a real File System on top of some hardware server. When you do so, your NAS File System is encapsulated in a .vmdk file and ©XSIBackup can handle it as a single virtual disk file.

It will as always traverse the -flat.vmdk file and copy the changed blocks. You may also take advantage of the CBT feature and optimize the synchronization by copying just the changed blocks in some seconds.

Not only that, your resulting mirrored repository will be 100% functional. You can use XSIGR on it to access individual guest level files on any of the available restore points.

On top of that you can keep any number of CBT replicas on multiple hosts, you just need to use a virtual disk big enough to fit your backup data.

We also recommend Rocky Linux as your preferred virtual appliance Operating System. It is as simple as installing a minimal install of the OS to have a fully functional appliance that will work with ©XSIBackup.

Is it a good idea to have so many layered encapsulated File Systems?

In theory, having many encapsulated FS structures will require more resources to access the data in them. Still in this case, the additional overhead is very little while obtaining in return a conceptual improvement that will in the end save a lot more resources than the ones used.

You will have at least 90% of the write throughput. You may loose a bit more efficiency in reads when compared to a real FS on top of a HD, still you will be able to read data at very good sustained rates, well above 100MB/s in modern hardware. If you take on account that the reads will be seldom, just to recover data, it's clear that the pros outnumber the cons by many times.

The different levels of encapsulation are:

For writes: VMFS + [LVM2: optional] + Guest OS FS (XFS, ext4)
For reads: VMFS + [LVM2: optional] + Guest OS FS (XFS, ext4) + XSIFS ( ©XSIBackup File System)

You can directly format your -flat.vmdk file with XFS (recommended) or ext4 from within Rocky Linux to remove one of the encapsulation layers. You may find it useful to still use LVM2 though, as it will offer you many advantages in terms of storage management and the overhead will be minimal.