Last updated on Tuesday 16th of August 2022 06:34:57 PM

©VMWare ©ESXi narrowband off-site backup over high latency link

Overcome poor effective bandwidth due to latency using parallel Rsync streams

We recently had to devise an off-site backup system for a client hosting 1.6 TB of data sitting in a Carebean island with an asymmetric FO connection that would yield barely 1.5 -> 2.0 MB/s upwards for an SSH stream of data.

The target system was a dedicated server from a well-known ISP in Central Europe with latencies ranging from 160 to 200 ms., which is not bad at all for the distance, but too much to achieve fast TCP transfer speeds using standard OpenSSH.

The method explained in this post has been applied to synchronizing ©XSIBackup repositories, nonetheless the concepts deployed in this method can be easily applied to any other similar situation in which any number of files must be syncronized over a relatively low bandwidth link with a high latency.

To make things even worse, it's not unfrequent that they suffer from blackouts or hurricane alerts, which results in the servers being switched off for some time to prevent physical damage.

We installed ©XSIBackup locally and deployed some ready to use replicas and a backup repository with 20 days backward restore points. So far so good, the local part of the job was easy to accomplish to Rocky Linux 8 VM hosted in a new 4TB local disk.

The thing started to become not so easy when we tried to upload the whole Rocky Linux VM to the off-site server at Europe. The relative high latency resulted in very poor effective upload speeds due to the nature of the TCP protocol, which requires acknowledgement from the remote end.

We recently wrote a post on this subject: ©VMWare ©ESXi throughput limitations.

- There exist ways to work that limitation inherent to the TCP protocol around, but in this case the tools to use are fixed and they use syncronous TCP and OpenSSH -

We first started to upload the full backup server Rocky Linux VM to the off-site server, but the speed was very low and the frequent network outages would make it unfeasible to use this method

Luckily ©XSIBackup repositories break the VMs into small chunks that are easy to transfer discretionally and that allows to perform an easy accounting on the number of them already transferred to the backend

So we started to upload around 1.1 million blocks of 680Kb size as an average to the target server using Rsync over SSH. The thing was slow, but after some days we had uploaded around 500K blocks and we felt it could be done. Nonetheless we needed to ensure that the amount of new blocks generated daily could be uploaded in time to the off-site backup server.

We then new we had to give the project an additional twist. We couldn't be conformist with the speed we were achieving, as any unexpected situation could end up in a clogged queue and an excruciating delay in the off-site backups.

So, as we knew that the low effective bandwidth was due to the syncronous nature of the TCP protocol plus a reduced OpenSSH inner buffer, we came to the conclusion that we needed to paralellize the transfers to improve the saturation of the TCP/IP stream and squeeze its full potential.

We have been using Rsync for years and we knew we could trust its rock solid stability for the task. We read some posts on other people setting the same kind of strategy and their comments were totally favourable, so we started with some preliminary tests. The results were even better than what we expected, so we refined the method.

You will need an additional component in your source server, typically the one where you have your primary local backup repository. This component is screen. It's a terminal session manager that allows your scripts to detach from the TTY so that you can run multiple jobs as different processes and monitor them as they run.

There are other methods to detach a terminal session from a TTY, but they are difficult to accomplish and reconnecting is not always possible. The screen binary makes it easy to run multiple child processes attached to a virtual TTY and reattach to them from the main TTY window as needed.

The aim of this post is not to train the reader in using screen, still we will provide some basic functionality insights.

- Prepending screen -dm to your Rsync commands will create a separate virtual TTY identified by a subprocess Id that you can use to reattach to the terminal session.

- Running screen -list will print a list of the running screen subprocesses along with their Ids.

- Running screen -r <screen Id> will reattach your current terminal to the output of that subprocess.

- [Ctrl+a] + [Ctrl+d] will exit the subprocess view without affecting it.

The Bash script we finally prepared to run multiple Rsync processes is the one below: remote_sync_parallel.sh Please note how we use the Rsync --size-only argument to tell Rsync to compare files by their size only.

Removing that flag would result in the Rsync processes calculating the full checksum value for every file, which would make extensive use of the local and remote CPU, you would risk to clog the backup servers and assuming that you have enough CPU cycles available, it would take much longer indeed, as you would be reading the full repository (1.6 TB in our case study) and calculating the hash checksum for every block.

By using the --size-only argument we are making some presuppositions: if the TCP protocol and the SSH tunnel do not return any errors and the name and the size of the local and remote files is the same, we can assume with some fair degree of certainty that the files are identical.

The certainty degree as expressed above is below a full checksum on each file, still, the possibility that a TCP checksum is passed plus the SSH integrity check is passed plus the resulting file is the same size and still the destination data is corrupt is something we can presume to be rather impossible.

#!/bin/bash
# Get the current working directory (where the script is).
PWD="$( cd "$( dirname "$0" )" && pwd )"

# Set the log file location
LOGFILE="${PWD}/remote_sync_parallel.log"

# Keep this many days of restore points
KEEPNDAYS=20

# Set the base directory of the repository you want to sync 
BASEREPO="/home/backup/repo01/"

# Prune the local repository at BASEREPO, it will typically remove one dir if you run it every day
PRUNEPATHS="$( find "$BASEREPO" -maxdepth 1 -type d -regextype sed -regex ".*/[0-9]\{14\}$" -mtime +${KEEPNDAYS} )"
for p in $PRUNEPATHS
do
    "$XSIBIN" --prune "$p" >> "${LOGFILE}"
done

# Delete the remote old dirs to be in accordance with the prune process above.
# You don't need to prune the remote dir, but just to delete the timestamped folders
# as the remote blocks are deleted by virtue of the --delete argument in the partial 
# Rsync block process run below in the [for i in {0..15}] loop.
# This SSH command assumes that the local server and the remote offsite host are in 
# the same time zone. If the remote server is not in the same timezone the KEEPNDAYS
# may not mean the same in the remote off-site host. Use UTC time to avoid conflics.
 
ssh -p22 -i /root/.ssh/xsi_rsa1 root@a.b.c.d "find /home/backup-vol1/xsi/repo01/ -maxdepth 1 -type d -regextype sed -regex \".*/[0-9]\\{14\\}\$\" -mtime +$(( $KEEPNDAYS )) -exec rm -rf {} \;"

# Copy the .map files in parallel by restore point folder, all the files in the YYYYMMDDhhmmss 
# timestamped folders of the repo. These files contain the structure of each virtual disk and are 
# fundamental along with the blocks themselves at the time to recreate a virtual disk and a VM.

MAPF="$( find "${BASEREPO}" -maxdepth 1 -type d -regextype sed -regex ".*/[0-9]\{14\}" )"
for f in $MAPF
do
    echo "$(date) | Launching rsync proc /$MAPF"
	screen -dm rsync -av --progress --size-only -e 'ssh -p22 -i /root/.ssh/xsi_rsa1' ${f} root@a.b.c.d:/home/backup-vol1/xsi/repo01/ --delete
done

# Copy the .xsitools file
screen -dm rsync -av --progress --size-only -e 'ssh -p22 -i /root/.ssh/xsi_rsa1' ${BASEREPO}.xsitools root@a.b.c.d:/home/backup-vol1/xsi/repo01/

# Copy the data/.blocklog file, this is the main manifest file containing all ordered blocks 
screen -dm rsync -av --progress --size-only -e 'ssh -p22 -i /root/.ssh/xsi_rsa1' ${BASEREPO}data/.blocklog root@a.b.c.d:/home/backup/repo01/data/

# Copy the block files by hex heading. Here is where the main paralellization is happening
for i in {0..15}
do
    hexval="$(printf '%x\n' $i)"
    echo "$(date) | Launching rsync proc /data/$hexval"
    screen -dm rsync -av --progress --size-only -e 'ssh -p22 -i /root/.ssh/xsi_rsa1' /home/backup/repo01/data/$hexval root@a.b.c.d:/home/backup/repo01/data/ --delete
done

Dissecting the script

There are two main loops in the script where the parallelization is happening:

1/ for f in $MAPF

Here we are copying the YYYYMMDDhhmmss timestamped folders one by one inside the loop. As the screen command triggers the subprocess and returns immediately, the loop finishes almost instantly. This loop will generate as many child processes as restore points are present in the repository but will copy just the .map files, we will also need the blocks to be able to rebuild the virtual disks or to access them.

2/ for i in {0..15}

This loop takes care to copy the data blocks, this is the bulk of the data and the child processes that will take the most time to complete. Differently as the .map files loop, this one will always produce 16 subprocesses, which correspond to the 16 hexadecimal digits of the first character in the first level of the block structure.

The expression hexval="$(printf '%x\n' $i)" converts each number in the sequence 0...15 to hexadecimal 0...f:

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f

Running the above script should return something like the output below, where you can see 10 screen subprocesses launched by screen to sync 10 restore points with .map files from July the 1st to July the 10th plus 16 additional screen child processes syncronizing each of the 16 main folders of the block structure.

|---------------------------------------------------------------------------------|
||-------------------------------------------------------------------------------||
|||   (c)XSIBackup-Free 1.5.2.1: Backup & Replication Software                  |||
|||   (c)33HOPS, Sistemas de Informacion y Redes, S.L. | All Rights Reserved    |||
||-------------------------------------------------------------------------------||
|---------------------------------------------------------------------------------|
                   (c)Daniel J. Garcia Fidalgo | info@33hops.com
|---------------------------------------------------------------------------------|
System Information: Linux, Kernel 4 Major 18 Minor 0 Patch 372
---------------------------------------------------------------------------------------------------
PID: 694748, Running job as: root
---------------------------------------------------------------------------------------------------
- Pruning '/home/backup/repo01/20220630074950'...
- Ordered and unique blocks: 900702, Index depth was set to : 5
- 100.00% | removed 3465 from 73991 blocks in folder | 1.18 GB were liberated
- Deleting blocks from .blocklog file...
- New block count was set to: 900328
- Removed backup folder: /home/backup/repo01/20220629074950
The --prune process finished (1)
---------------------------------------------------------------------------------------------------
Removed PID                   OK
---------------------------------------------------------------------------------------------------
Wed Jul 20 17:23:20 EDT 2022 | Launching rsync proc /data/20220701030623
Wed Jul 20 17:23:20 EDT 2022 | Launching rsync proc /data/20220702030623
Wed Jul 20 17:23:20 EDT 2022 | Launching rsync proc /data/20220703030623
Wed Jul 20 17:23:21 EDT 2022 | Launching rsync proc /data/20220704030623
Wed Jul 20 17:23:21 EDT 2022 | Launching rsync proc /data/20220705030623
Wed Jul 20 17:23:21 EDT 2022 | Launching rsync proc /data/20220706030623
Wed Jul 20 17:23:22 EDT 2022 | Launching rsync proc /data/20220707030623
Wed Jul 20 17:23:22 EDT 2022 | Launching rsync proc /data/20220708030623
Wed Jul 20 17:23:22 EDT 2022 | Launching rsync proc /data/20220709030623
Wed Jul 20 17:23:22 EDT 2022 | Launching rsync proc /data/20220710030623
Wed Jul 20 17:23:21 EDT 2022 | Launching rsync proc /data/0
Wed Jul 20 17:23:21 EDT 2022 | Launching rsync proc /data/1
Wed Jul 20 17:23:21 EDT 2022 | Launching rsync proc /data/2
Wed Jul 20 17:23:21 EDT 2022 | Launching rsync proc /data/3
Wed Jul 20 17:23:22 EDT 2022 | Launching rsync proc /data/4
Wed Jul 20 17:23:22 EDT 2022 | Launching rsync proc /data/5
Wed Jul 20 17:23:22 EDT 2022 | Launching rsync proc /data/6
Wed Jul 20 17:23:22 EDT 2022 | Launching rsync proc /data/7
Wed Jul 20 17:23:22 EDT 2022 | Launching rsync proc /data/8
Wed Jul 20 17:23:22 EDT 2022 | Launching rsync proc /data/9
Wed Jul 20 17:23:22 EDT 2022 | Launching rsync proc /data/a
Wed Jul 20 17:23:23 EDT 2022 | Launching rsync proc /data/b
Wed Jul 20 17:23:23 EDT 2022 | Launching rsync proc /data/c
Wed Jul 20 17:23:23 EDT 2022 | Launching rsync proc /data/d
Wed Jul 20 17:23:23 EDT 2022 | Launching rsync proc /data/e
Wed Jul 20 17:23:23 EDT 2022 | Launching rsync proc /data/f

If you now list the running screen subprocesses (screen -list), you will see something like this:

yourhost# screen -list
		714058..yourhost (Detached)
		714059..yourhost (Detached)
		714060..yourhost (Detached)
		714061..yourhost (Detached)
		714062..yourhost (Detached)
		714062..yourhost (Detached)
		714064..yourhost (Detached)
		714065..yourhost (Detached)
		714066..yourhost (Detached)
		714067..yourhost (Detached)
		714068..yourhost (Detached)
		714069..yourhost (Detached)
		714070..yourhost (Detached)
		714071..yourhost (Detached)
		714072..yourhost (Detached)
		714073..yourhost (Detached)
		714074..yourhost (Detached)
		714075..yourhost (Detached)
		714076..yourhost (Detached)
		714077..yourhost (Detached)
		714078..yourhost (Detached)
		714079..yourhost (Detached)
		714080..yourhost (Detached)
		714081..yourhost (Detached)
		714082..yourhost (Detached)
		714083..yourhost (Detached)
26 Socket in /run/screen/S-root.

The first 10 processes correspond to the restore point sincronization, the YYYYMMDDhhmmss folders containing the .map files.

The last 16 entries correspond to each one of the 16 main folders in the hexadecimal structure of the blocks inside the data folder.

If you run screen -list again after some minutes you will see that there are less processes, they disappear from the list as each individual task is completed and each screen subprocess ends.

You can connect to any of those processes with screen -r <procId> to inspect its output to STDOUT, it will give you an idea of how long must the process run until it ends. Per instance, running screen -r 714071 would return something like the code below.

yourhost# screen -r 714071
3/7/d/e/3/37de31edf82a5c31e30a800d1a90d4da959c1d47
      1,167,797 100%   89.27kB/s    0:00:12 (xfr#509, ir-chk=1001/56850)
3/7/d/f/d/
3/7/d/f/d/37dfdc87657a6c240603423d03be0df61335fb05
        239,645 100%  109.26kB/s    0:00:02 (xfr#510, ir-chk=1008/56895)
3/7/e/7/3/
3/7/e/7/3/37e73d2c16e7415156ce070f56b7a094ee967f9d
        418,887 100%  127.28kB/s    0:00:03 (xfr#511, ir-chk=1000/57082)
3/7/e/7/c/
3/7/e/7/c/37e7ce6d5ab066b400047b9fb781a552b42a1e66
        793,897 100%  242.73kB/s    0:00:03 (xfr#512, ir-chk=1012/57100)
3/7/e/9/8/
3/7/e/9/8/37e98b2ec5424966ed8dec48100a40e4bcc5676e
        605,218 100%  187.39kB/s    0:00:03 (xfr#513, ir-chk=1000/57139)
3/7/e/9/9/
3/7/e/9/9/37e991956bb072d699ed08653d92023662d3a0e1
        596,062 100%  197.45kB/s    0:00:02 (xfr#514, ir-chk=1002/57142)
3/7/f/2/0/
3/7/f/2/0/37f2016123882b0365b62b1fc27b300db30ff42b
        621,587 100%  135.59kB/s    0:00:04 (xfr#515, ir-chk=1000/57393)
3/7/f/6/f/
3/7/f/6/f/37f6f80673196b295c080504749dbf2179b9d077
        722,695 100%  163.94kB/s    0:00:04 (xfr#516, ir-chk=1001/57511)
3/7/f/c/5/
3/7/f/c/5/37fc5078a9340e9da754c207705c2286d5f5dacd
        592,519 100%  157.75kB/s    0:00:03 (xfr#517, ir-chk=1001/57673)
3/7/f/c/7/
3/7/f/c/7/37fc7c33bf150b21d5789cdf7d82ec25a494ccdf
        239,796 100%  197.44MB/s    0:00:00 (xfr#518, ir-chk=1015/57691)
3/7/f/d/0/
3/7/f/d/0/37fd064efc46e11cee1b1b8d65fb2db96a166e36
        627,638 100%   96.63kB/s    0:00:06 (xfr#519, ir-chk=1004/57703)
3/7/f/f/4/
3/7/f/f/4/37ff4dbf5c98bbc36b6c55e1b5c0ea282a903ccd
      1,065,106 100%  128.56kB/s    0:00:08 (xfr#520, ir-chk=1000/57766)
3/8/0/a/5/
3/8/0/a/5/380a516d6eee1d56504c9662d17156ca1e6b9849
        757,808 100%  138.15kB/s    0:00:05 (xfr#521, ir-chk=1010/58058)
3/8/0/e/5/
3/8/0/e/5/380e5a774c2fa48013c08e6e0367add4e7266db4
      1,177,382 100%  196.48kB/s    0:00:05 (xfr#522, ir-chk=1002/58164)
3/8/1/3/c/
3/8/1/3/c/3813cf45339c5bcb5ed40f39ee5b8c3fef27980c
        233,965 100%  165.33kB/s    0:00:01 (xfr#523, ir-chk=1000/58316)
3/8/1/5/3/
3/8/1/5/3/38153f968f99704ae0553558bb86c6a57bb5e091
        297,514 100%  221.69kB/s    0:00:01 (xfr#524, ir-chk=1000/58368)
3/8/1/a/d/
3/8/1/a/d/381adff6021e1bb7e8d9687a182a0e5877697176
        425,984  67%  127.96kB/s    0:00:01

Once all child processes finish and the command screen -list returns no child processes the syncronization will have finished.

The first sync, which will copy hundreds of thousands or millions of blocks should be run manually and inspected frequently to make sure that the seeding process completes with no errors. You may need to do that over the weekend or the nights depending on your infrastructure load.

In our case study, which is real, we were able to multiply the saturation of the upload stream by more than 10.

Latest posts
Title
2023-11-26 11:36:24: What's new in XSIBackup-App (2023)
2023-10-17 08:58:36: XSIBackup-App, achieving parallelization on multi-server backups
2023-09-04 19:22:41: Free Linux Backup with XSIBackup-App: MySQL/ MariaDB databases
2023-09-04 19:19:53: XSIBackup-App: how it works and file layout
2023-08-20 11:05:09: VMWare ESXi narrowband off-site backup over high latency link
Subscribe