XSIBackup-DC ManualRead the "Ten Commandments Of ©ESXi backup" to have an overview of the most critical things to keep in mind at all time.
InstallationJust use the installer in the package. There is a Youtube video available.Introduction©XSIBackup-Datacenter AKA ©XSIBackup-DC is a software solution developed in plain C, having the GLibC library as its main dependency. It follows the same basic design principles of previous ©XSIBackup releases, namely: an efficient backup and replication tool that can be used directly in any ESXi or Linux host.©XSIBackup-DC can replicate & backup Virtual Machines hosted in ESXi servers just like ©XSIBackup-Free and ©XSIBackup-Pro do and can also backup & replicate file structures in Linux servers both as replicas or as backups to deduplicated repositories. Both operations may be performed to a locally accessible file system (any accessible disk or share: NFS, iSCSI, SMB (©XSIBackup-NAS)) or over IP. The nomenclature employed for IP transfers is: user@[FQDN or IP]:Port:/path/in/remote/systemIn the future we will extend ©XSIBackup-DC functionality to operate on different virtualization platforms like XEN or KVM. ©XSIBackup-DC is capable of performing two kind of operations on data: A/ Replicas.Both of them are performed by using a simple algorithm that compares data present on the host being backed up against the data eventually already present on the remote volume. ©XSIBackup-DC first detects holes in virtual disks and jumps over them, it can also detect zeroed zones in real data and jump over them as well. Only real data present on disk is actually processed. The SHA-1 checksum algorithm is used to compare chunks of data and decide whether the block being processed must be sent to the target folder or it is already present on the other side. When zero awareness is combined with SHA-1 differential algorithm, maximum speed is reached, that is on subsequent data operations to the first run which obviously must process all non zero data. ©XSIBackup-DC downloads data definitions stored on the remote side, so that all comparison operations of the XSIBackup algorithm are performed locally. REPLICAS: Since version 1.1.0.0 all remote .vmdk disks' SHA-1 hashes of --replica VMs are compared with their stored values before actually performing the --replica job itself. Should some change be detected, the hash tables for every disk will be rebuilt. This allows to switch the VMs on, test them and keep the --replica jobs without any further operation. Rebuilding the hash tables will take some time, nonetheless it will be much less than sending the full VM again from its primary location. You will know that the remote hash tables are being rebuilt because you will see this message on screen.
Target VM at <root@192.168.0.20:22:/repl/W01> has changed, hash table must be rebuilt... Some time will pass without any progress information until the remote tables are refreshed. How long that time will be will depend on the size of the disks and the real data contained in them. Detecting changes in VMWare ESXi VMs is possible as disk's CID is changed every time a VM is switched on, thus a .vmdk file checksum mismatch will detect it. You may also run the --check action on a local VM replica folder (a folder containing some .vmx file or .vmdk disk) from the server's command line. This will be equivalent to rehashing, it will be done implicitly while running the --check action. When you perform a rehash operation through the use of the --check action you will be presented a progress text UI with some basic statistics: files affected, changed blocks detected and repaired. While the virtual disk files are being rehashed you will see a KO in red, should some bad blocks be detected (blocks that have changed in regards to the previously stored hash table), along with the bad block count. Once the operation ends the KO will change to RE (repaired). When you perform a rehash operation through the --check action on a --replica folder, next time you run the replica job from the client side, no .vmdk file change will be detected and the --replica job will continue normally as if you had not switched the VM on. You may also run a --check action on a VM --replica folder that hasn't been modified by switching it on. In these cases the check will return no changes. Q: How do I know that a replica is actually valid? A: You may use the --check action on a replica and a full re-hash check will be performed. This guarantees that the files contained in the replica are an exact copy of the original ones. ./xsibackup --check /path/to/your/replica/YOUR-VM --------------------------------------------------------------------------------------------------------- BACKUPS: In case of backups, which are always performed to a deduplicated repository, you can choose to compress data by employing the acclaimed LZJB algorithm used by Solaris and ZFS. This allows to compress data as well as deduplicate it. The use of data compression is recommended (just add the --compression argument to your backup job) it offers some 45% compression ratio. If you are backing up to an already compressed files ystem you may remove the --compression flag and improve effective transfer speed and free your CPU from the compression load. Over IP Operations (SSH options)To be able to operate with any compatible remote server over IP, you need to first exchange keys to allow passwordless SSH communication by using the exchanged key to authenticate to the remote end. The --add-key action will allow you to do so from the command line.Please be aware that regular OpenSSH behavior is to raise an error should just any of the ciphers in the cipher challenge list not be available in the remote server. This can lead to errors when running over IP actions when the OpenSSH versions are too distant in time, as some ciphers are deprecated while some others are newly added to OpenSSH as time goes on. You can edit the ./etc/xsibackup.conf to customize the list of ciphers to use. ©XSIBackup-DC may operate in client/server mode. When you transfer data over IP, you must invoke the xsibackup binary on the other end. If you ommit the --remote-path argument, the client will look for the binary in the /usr/bin folder of the remote host. You may as well indicate the remote path by explicitly stating the remote installation path, just like you do with Rsync. --remote-path=/vmfs/volumes/datastore1/xsi-dir/xsibackup©XSIBackup-DC needs components in the ./bin folder, thus the contents of this directory must be present in the root installation dir and be executable to the user running the software. ©XSIBackup-DC can tunnel data through SSH to a remote server. The cipher algorithm that we may use for this tunnel can greatly affect transfer speed and CPU usage. The default set of ciphers in use is: aes128-ctr,aes192-ctr,aes256-ctr,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbcThe above set should work well even between distant Open SSH versions, i.e.: 5.6 => 7.5 and the other way around. Its downside is that they are not very fast, unless your CPU counts with a special set of instructions to handle this workload. Should you encounter some speed limiting issue, we recommend that you take advantage of the --ssh-ciphers argument and use chacha20-poly1305@openssh.com, so that it's used instead of AES cipher family. If you have a server grade CPU i5 or above, you will probably won't notice the difference, unless you are short of CPU at the time to perform the backup. This cipher will greatly improve speed due to its efficient design. This cipher was created by Prof. Daniel Bernstein and it's much faster than AES assuming that you don't count with some sort of hardware cryptographic co-processing help. You can optionally enable the L switch in --options=L (Less Secure Algorithm). It will try to use arcfour,blowfish-cbc,aes128-ctrThis last set is comprised by deprecated algorithms, you may want to use them when you don't need that much security in encryption, like in a controlled LAN, or when you need compatibility with older OpenSSH versions. On addition to the above, you may pass your own preferred algorithsm to be used to cipher SSH data: --ssh-ciphers=your-preferredalgorithm1, your-preferredalgorithm2As warned in the heading notice, should just one of the ciphers in our cipher list be missing at the remote end, you might receive an error stating so. It's a bit misleading, as the whole cipher list is presented in the error message. To avoid this situation, just choose one single common cipher present in the client and server side of the OpenSSH tunnel. Of course, using the same OpenSSH version on both sides minimizes the chances that you fall into these kind of problem. Backing up to ©Synology NAS devicesYou can use any Linux OSs as a remote backend for your backups, you can even perform concurrent backups. You must take into account that concurrent backups will limit the speed of every individual backup and that the locking mechanism as per v. 1.0.0.0 is lock files, nonetheless the worst thing a failed lock can produce is a duplicate hash, which does not affect consistency and can be easily fixed.In case of Linux servers the xsibackup binary alone is enough to run the server side. Given the fact that ©Synology devices use Linux, you can easily turn your NAS device into an ©XSIBackup-DC server by just copying any unlicensed ©XSIBackup-DC binary to your /usr/bin folder in the ©Synology appliance. You will, of course, need to enable SSH access to the ©Synology box, but that's something trivial, you can read this post for more details. In order for your ©Synology ©XSIBackup-DC server to operate correctly, you need to assign execute permissions to the xsibackup binary and have write permissions on the volume you want to use. By now only the use of the root user is supported, thus you should not have much trouble setting the server up. Once the SSH server is running and the binary is installed you just need to run the --add-key action from within the ©XSIBackup-DC client to exchange the key of the client OS with the server and start using over IP as you would do with any other Linux server. Using other NAS appliances.Just as long as the Linux kernel running in the server is able to run the xsibackup binary you can use whatever you want to act as a server. Nevertheless, the paths to SSH configuration files can vary from one system to another. We support ©Synology systems because we have tried them and we have tweaked ©XSIBackup-DC to be able to inter-operate with them. Should you want to use any other manufacturer's hardware, you do at your own risk, and you may need to exchange keys manually, which will not be guided by our support helpdesk.Folder structure©XSIBackup-DC consists in a single executable (xsibackup), plus an additional library bin/xsilib, which is only needed in case of being installed to an ESXi host. The first time it is executed ©XSIBackup-DC will create a set of folders to store its logs, jobs and service files. This structure will vary depending on if you install it to an ESXi host or to a Linux server. In case of ESXi hosts the folder structure will be created under the installation directory and will be as follows:/bin/ : stores all binariesWhereas when installed to a Linux server, the folders used will be those in the Filesystem Hyerarchy Standard (FHS). ©XSIBackup-DC uses temp files to store block maps, variables and different structures while it is running, thus it depends on a /tmp folder with sufficient space to hold this data. While working with files and VMs in the order of hundreds of gigabytes to one terabyte, this files size will be in the order on some hundreds of KB. Should the files grow beyond that, even ESXi /tmp file system that is 100 MB by default should be able to handle that. In case of Linux FS that may have an arbitrary /tmp partition size, this will never be a problem even on Exabyte VMs. A folder is created under /tmp for every running job as /tmp/xsi/PID, where PID is the process identification number assigned by the OS. All /tmp/xsi data is deleted both when finishing a process or initiating a new one. ©XSIBackup-DC --replica feature depends on the remote replica set not being touched (modified), should you change a single bit on the remote end, the replica mirror will break and the resulting mirrored set of files may become unusable. This is due to the fact that all the program knows about the remote files is it's hash map, which is not updated when you modify files by other means than the --replica action. Scheduling jobsJob schedule is performed by means of the crond service and its corresponding crontab. Just place backup jobs in files, assign them execute permissions and add them to the crontab. You can use the argument --save-job=NNN, which will facilitate the creation of backup job files in the etc/jobs folder.There are two main cron related arguments: --install-cron: allows to turn your scheduled cron jobs into permanent across reboots by adding a command to the etc/rc.local.d/local.sh file. Run this after setting your cron jobs up. The command that is added to the etc/rc.local.d/local.sh file is nothing but an --update-cron command.To set a cron schedule up, create the file <install-dir>/var/spool/cron/root-crontab if it does not exist. Add your cron schedules as you would add them to any other crontab. 0 6 0 0 0 /scratch/XSI/XSIBackup-DC/etc/jobs/001 > /dev/null 2>&1Then run the --update-cron command like this: ./xsibackup --update-cronIt will take the contents of the root-crontab file and add them to the ©ESXi crontab at /var/spool/cron/crontabs/root. Concurrency©XSIBackup-DC spirit it to become a heavy duty backup & replication utility and its basic structural design principles are oriented to that goal. Nevertheless in version 1.0.0.0, locking of the data/.blocklog file, which is the main block manifest, which is in turn shared by different backup folders and backup processes is provided via .lock files, this is not the most efficient way to manage concurrency. You could in fact hit some circumstance in which some information is written to a .blocklog file which is supposed to be locked.This would be quite rare though, only if you try to write from many different processes to the same repository at the same time you may be able to run over some lock. Even in case this circumstance happened, nothing serious would occur, as duplicating some block info in the manifest file is harmless. The block manifest file can be rebuilt from the underlying data quite fast by using the --repair action, which would eliminate any duplicates. The files that allow you to restore some backup are the .map files, stored in the backup folders and the data blocks themselves, which are kept in the /data directory. You could even delete the manifest file (/data/.blocklog) and still be able to rebuild it via the --repair action. To take on account©ESXi 7.0©ESXi 7.0 has introduced some drastic changes in VM behaviour. Since this version, when a VM is on ALL files are read locked on the ©ESXi shell. It does not matter if you take a snapshot, still ALL files are read locked, including some eventual existing snapshot. As a result of that only -flat.vmdk files (and also all other basic configuration files) are backed up. We may introduce some code to delete any eventual pre-existing snapshot when ©ESXi 7.0 or above is detected, by now knowing this is left to the Sysadmin: if you want all data to be backed up, do not keep snapshots in production virtual machines. If you backup some VM with previous snapshots in ©ESXi 7.0, you will have to, by now, edit the .vmx file to point it to the base disks before using the restored file set. Subfolders It's worth to note that each backup jobs maintains a set of temporal files in an exclusive and independent directory and that it backs data up to an exclusive directory on the server repository, which is uniquely identified by a timestamp and an eventual subfolder set by the subfolder=somesubfolder argument. If you don't differentiate backups from different servers by using the --subfolder argument, i.e.: --subfolder=CURRENT-SERVERYou are taking the small risk that some jobs triggered at the same time are stored to the same time stamped folder . This is unlikely to happen, on top of that the VM being backed up would need to be called the same in both servers for files to mix up. Nevertheless, always use the subfolder option when backing up from different servers. This is a must, not only because of the situation treated above, but also from a simple organizational point of view. Take on account that if you trigger multiple simultaneous backups from different servers without having first designed a system to support it, you will most likely clog your network, your disk controller and your server. As known blocks start to accumulate in the block manifest (/data/.blocklog) the traffic will be reduced to blocks that have changed since last backup cycles and the backups will as a result be performed much faster. You can think of ©XSIBackup-DC as some "Incredible Hulk" that grows in power as you load it with tons of data. Of course the results you get will be bound by your hardware limits and the limits of our software, but you should easily accumulate many terabytes of real data, which will normally correspond to some exabytes in backups. Design©XSIBackup-DC stores backups to proprietary repositories, nevertheless the structure and format of these repositories has been designed to be "eye friendly" to the system administrator.Data chunks are stored in raw format in subfolders of the backup volume file system as well as hash maps corresponding to the files in the backup. Thus you could very well rebuild your files from the data on disk by just adding the data chunks as described in the human friendly .map files, which are nothing but a manifest of the chunks encountered in the file when backed up. ©XSIBackup-DC uses a default block size of 1MB, but it can be as big as 50MB. As you may imagine, this could accumulate a big number of chunks in the data folder structure, in the order of millions. As you probably already know, the ESXi VMFS file system has around 130.000 possible inodes, thus it is not very convenient to store deduplicated backups, as you will soon run out of inodes. Any regular Linux file system will do it, but if you are willing to achieve great results we recommend that you use XFS or ext4, as they will allow to store millions of files and are, at the same time, the fastest file systems. Speed is an important factor when you accumulate a lot of data, as blocks need to be sought in the file system. Using a regular Linux system mounted over NFS3 is the ideal target for your backups. It can also be a specialized device like the popular Synology and QNap NAS boxes. Data chunks are stored in the data folder inside the repository in a hierarchical subfolder manner. Each subfolder corresponds to an hexadecimal character up to 5 levels under the root of it and blocks are stored in their matching first 5 characters folder. Assuming the robustness of the SHA-1 hash algorithm, which offers astronomical figure collision free unique IDs and the fact that the .map files are stored in unique folders; the probability to lose data due to some collision or repository corruption is very low. Even if you completely delete the .blocklog manifest file, it can always be rebuilt from the constituent .map files and the deduplicated chunks in the data folder by using the --repair argument. The .blocklog file in the root of the /data folder is a mere way for the client backup processes to know about the preexisting deduplicated chunks. This file is downloaded to the temp client folder previous to every backup cycle, thus the check on the existence of the block is performed locally. This has a small disadvantage, which is not knowing about blocks pertaining to ongoing backup jobs, but offers the huge advantage of performing block lookups locally at local speed. Once every backup cycle finishes, the newly generated data, that is: data which was not found on the downloaded .blocklog manifest file, is added to the repository shared .blocklog file. This process locks the .blocklog file for the time it takes to complete, generating a /data/.blocklog.lock file, which is removed once the integration of the differential data completes. The differential data is stored temporarily in the /tmp/xsi/%PID%/.blocklog.diff file of the client while the backup is taking place. The whole temp folder is deleted upon each backup cycle. ©XSIBackup-DC is a low level tool. It's as secure as dd or rm are in your Linux server, so make sure that you assign it adequate permissions. You may use different remote users than root, that is very convenient, especially when backing up to remote Linux servers, but trying to run it in an ESXi server under a different user than root, will require you to configure permissions accordingly. Also please note that when opening up execute permissions on ©XSIBackup-DC binary to other users than root, you are opening a potential security breach. IMPORTANT: everything about the .blocklog manifest, the .diff files and the integration of the differential data constitutes a different and isolated subsystem in regards to the backup itself. Loosing differential metadata, registering duplicate block hashes or, as said, deleting the whole .blocklog manifest is unimportant, as it can always be regenerated accurately from the constituent blocks. Even in the worst of the cases, by receiving a totally corrupt .blocklog file (which of course should never happen) and by messing up all differential data, your files will still be backed up accurately and you will be able to repair your repository afterwords. The worst possible situation in regards to the logic of the deduplication is that some block is reported as inexistent and is copied again. All this assuming that the backup completes and there aren't any hardware or communication issues. Designed to be useful©XSIBackup-DC has been designed with you in mind. A datacenter system administrator that needs a tool which is easy to use and extremely powerful at the same time.As you already know (if you read the previous chapters) ©XSIBackup-DC stores deduplicated and eventually compressed chunks of data to the backup volume file system. Map files are stored to folders like the following: <root of repo>/subfolder/timestamp/VM/vm-disk1-flat.vmdk.map Whereas blocks are stored in the already explained five level subfolder structure under /data, something like: <root of repo>/data/a/0/f/3/0/a0f307f7abb76d7...bc3576adef5299a Just as long as you keep this data intact, you can easily rebuild it by using the --repair command. Then it's easy to realize that you can merge preexisting repositories into a single one and still keep data intact. This is useful in case you need to consolidate data into a single backup volume. You can of course duplicate your repositories contents somewhere else. Thanks to the fact that data is split into thousands of deduplicated chunks, you can use Rsync to keep copies of your repositories offsite and use ©XSIBackup-DC to rebuild your VMs or any other data anywhere. The xsibackup.conf fileThis file is located in the etc/ directory in case of ESXi systems. It contains default values that can be tweaked by the user. Some of this values may also have a command line argument that may in turn modify the default values.As a general rule XSIBackup Datacenter will use this values if no superseding argument is provided. You may per instance omit the --compression argument if you have activated it in the xsibackup.conf file. # This are the default values for some variables. Most of them may be also setThe variables supported in the xsibackup.conf file are:
The smtpsrvs.conf fileThis is the file (etc/smtpsrvs.conf) that holds the SMTP servers configured to be used with XSIBackup-DC. It works exactly the same as in previous editions of XSIBackup: one server per line preceded by an integer number that is in turn the unique Id for the SMTP server itself. You will use this Id when calling or referencing the SMTP server.The above is how the etc/smtpsrvs.conf file looks like. There's a short explanation about the fields and the order they have to be set. All fields are separated by a semicolon (;), except the server and port, which are separated by a colon (:) Each SMTP server entry is composed by 9 fields, being the latter (--smtp-delay) optional.
To use this newly configured server in your backup jobs, just append... --mail-to=email@address.com --use-smtp=NTo the rest of arguments. You don't need to set anything but this two values: an e-mail address where to mail the results and an ordinal number (N), corresponding to the first field ORDINAL(integer) in the above configured SMTP server. Since version 1.1.1.0 two new actions allow to add and test servers through a command line user interface. --smtp-add: call it without any argument. It will sequentially ask you for the SMTP server and port, mail from address, username, password, security options and optional delay. It has the advantage of probing the SMTP server is reachable and that the e-mail addresses are correctly written before saving the data. It will also preformat it, so you are save from unadvertedly pasting invisible characters with a different page code. ./xsibackup --smtp-add --smtp-test: call it without any argument. This action will present you a list of the SMTP servers available in the etc/smtpsrvs.conf file, just select one of the Ids and then provide an e-mail address where to send a test. ./xsibackup --smtp-test E-mail reportsE-mail reports provide a way to know how a given backup or replication job behaved. You may activate them by just adding a --mail-to address to the job. Of course you need to have previously added at least one SMTP server to the /etc/smtpsrvs.conf file.You can specify which e-mail server you want to use by employing the --use-smtp argument and passing it an SMTP server ordinal number. In case you don't use this argument the backup job will fall back to use the first available SMTP server. E-mails are sent using a template stored in /var/html, name them 000-999[.html], the default Classic XSIBackup e-mail template is provided as 000.html. You may create your own and store the HTML in this folder. Just add the <!-- PLACEHOLDER REPORT -->HTML comment wherever you would like the table containing the backup information to appear. To use your user created template just add the --html-template=NNN argument. Creating Backup & Replica JobsBasic usage consists in passing an action first plus one or two paths depending on the type of action being performed, then the rest of arguments../xsibackup [action] [source] [target] [options]Quick examples: ATTENTION: don't copy directly from this document into your SSH client. The chances that some character substitution happens is high. ActionsAction comes in first place after the call to the binary. It can be one of these:--backup : this action will perform a deduplicated backup, optionally compressed by the LZJB compression algorithm, to the directory specified in the target argument. SourceThis is the second argument in case of performing a --backup or --replica action and the only path required when executing a --check, --info or --prune operation.When performing copy operations (--backup or --replica ) this argument must point to an existing directory containing some files. Those files will be backed up or replicated to the target directory. You may backup directories, which can be useful in case of VMs that are not registered to the ESXi inventory, or VMs. To backup a VM stored in a directory (or a series of them), you must point the source argument to the root directory where the .vmx file is contained. To select Virtual Machines, as in the above examples, just enclose the whole source argument between double quotes and use the VMs keyword (it's case sensitive) followed by a list of Virtual Machines separated by commas, or: the ALL keyword to backup all VMs, the RUNNING keyword to backup VMs which are in an ON state. ./xsibackup --backup /home/me/my-data /mnt/NFS/backup/repo01 TargetTarget is the third argument in the command line. It represents a directory where files will be backed up into an existing deduplicated directory, or replicated to it. If the directory does not exist it will be created and eventually a new repo initialized by XSIBackup-DC.--target can be a local or remote directory in the form user@host:port:/path/to/backup/dir as we have seen before Options--block-size[=1M(default)|10M|20M|50M]: (optional) this is the block size that will be used to deduplicate data when using the --backup action. In case of replicas a fixed block size of 1M will be used. You can choose between: 1, 10, 20, or 50 megabyte block sizes when performing a --backup action. 10M is the default --block-size.--block-size=1M Set block size to one megabyte--quiesce: (optional) this option has no arguments, use it when backing up VMs to quiesce the guest OS previous to taking the backup snapshot. If you don't pass this option, no quiescing will take place. |