When using the deduplicated backup functionality of XSIBackup-DC with option --backup, I understood that the integrity of the backup in the chosen repo directory depends on the entire chain of backups performed in this repo.
Lately, I discovered some XSIBackup-DC bugs and tried some different settings where I realized immediately after starting the deduplicated backup that this backup makes no sense. Therefore, I simply terminated ./xsibackup with kill signal SIGTERM.
Now the question arises whether such kind of termination garbles the repo or whether the SIGTERM kill signal causes ./xsibackup to clean up correctly, i.e. the last backed up VM is correctly removed from the repo so that there is nothing to worry.
Up to now, I delet the entire repo and started over again. But this is very time consuming and takes for the initial backup more than 24 hours. So I would appreciate whether it is safe to simply terminate the backup or to make same manual adaptions to keep the repo clean and tidy.
(c)XSIBackup repositories need two sources of information to be consistent:
1/ The data directory where the blocks with the data are stored in a hyerarchy of SHA-1 hashes
2/ The disks' manifests (.map files) stored in the timestamped folders where the VM folders are created containing these manifest files.
When you perform a backup you start adding blocks to the data folder. Once each disk backup ends, the corresponding manifest is copied over to the VM folder.
If you interrupt a backup via Ctr+C per instance, you are going to have new blocks. Some of them will be associated to a manifest file, should the disk copy be completed, some others will be orphan blocks without any manifest file.
Asuming the simplest case, namely: a one .vmdk disk VM. If you interrupt the backup in the middle of the only -flat.vmdk file copy process. You are going to leave orphan blocks in the data folder. The timestamped folder is usually removed automatically. Doing the same with the copied blocks would require a lot of time and would make Ctr+C'ing some backup job a really annoying experience.
These blocks can't be matched to any VM, as they are the blocks that are new to that restore point. They will remain in the data folder and will be overwritten in case they are found again in some subsequent backup cycle.
They won't be an issue, still they will occuppy some space. This space will be minimal, as the increment of data from one backup cycle to the next is usually limited and most of them will be again present in subsequent backups, maybe a few MB could be regained by pruning them, still it's not worth doing it.
In fact we haven't even provided a mechanism to prune orphan blocks. Maybe in the future, although this is not a priority. The reason for this type of pruning not being a priority to us can be deduced from the exposed above: only some minimal space will be regained, if some at all, while pruning this data would require traversing all blocks in the FS to check whether they are present in the .blocklog file or not. Approximately about the same time a prune process requires, given that the most time consuming task would be the stat() system call itself for every block in the repo.
One possibility to avoid leaving these orphan blocks would be to copy the blocks to a temp hyerarchy to then rename them to their final positions. We have evaluated this possibility since the beginning, still it could take quite a bit to rename them to their final destination folder. We'll keep this in mind and eventually incorporate it as an improvement if we find the average amount of time required to rename the blocks is worth the benefit.
To those of you getting sick thinking about those orphan blocks there, we just remind you that engineering is about results, not perfect beauty. When beauty can be achieved it's a bonus, still it's not the aim of the project. If you can't still get those blocks out of your mind, we would suggest that you start a career in mathematics ;-)