You are not logged in.
I am trying to repair my repository (about 20TB) but it fails:
[root@Server1:/vmfs/volumes/cacffd7b-11f7dacc/xsi_dir] ./xsibackup --repair /vmfs/volumes/xsibackup01/DC-old/
|---------------------------------------------------------------------------------|
||-------------------------------------------------------------------------------||
||| (c)XSIBackup-DC 1.5.1.12: Backup & Replication Software |||
||| (c)33HOPS, Sistemas de Informacion y Redes, S.L. | All Rights Reserved |||
||-------------------------------------------------------------------------------||
|---------------------------------------------------------------------------------|
(c)Daniel J. Garcia Fidalgo | info@33hops.com
|---------------------------------------------------------------------------------|
System Information: ESXi, Kernel 7 Major 0 Minor 3 Patch 0
-------------------------------------------------------------------------------------------------------------
PID: 11381218, Running job as: root
-------------------------------------------------------------------------------------------------------------
Sorting blocks...
-------------------------------------------------------------------------------------------------------------
Sorting progress 22%sort: out of memory
-------------------------------------------------------------------------------------------------------------
SIGTERM (13) condition was trapped: check logs for more details
-------------------------------------------------------------------------------------------------------------
Cleaning up...
-------------------------------------------------------------------------------------------------------------
Removed host <tmp> dir OK
-------------------------------------------------------------------------------------------------------------
Removed prog <tmp> dir OK
-------------------------------------------------------------------------------------------------------------
[root@Server1:/vmfs/volumes/cacffd7b-11f7dacc/xsi_dir]
- Hardware : HP DL360 Gen10, 64 GB Memory
- currently only running one VM
- free memory 44 GB
- Sort process allocates about 600m and the crashes
Offline
Is it 20.00TB of data on disk? or nominal data. If it is data on disk, you should be keeping your repos smaller to avoid resource usage growing wild when performing such actions as --repair or --prune. It is OK to just store backups, as the use of memory is extremely limited, in the order of some tens of megabytes.
What we mean is that --backup and --replica actions use a very limited amount of memory while --repair and --prune in special use a massive amout of memory.
You should be using a Linux host to accumulate that amount of data. (c)ESXi is not the adequate OS to host deduplicated data for many reasons, being the most important one that VMFS is not the right File System to store deduplicated repositories. We always take the chance to explain this in all related posts.
You need a fast file system and a real file system to store millions of chunks of data with maximum efficiency. We recommend that you use XFS or ext4 should XFS not be available. The Operating Systems that we recommend to store big repositories are CentOS 6/7 and Rocky Linux 8
Even if you use some NFS mount with some XFS or ext4 FS on the remote volume, trying to --repair from the (c)ESXi host is highly inefficient. Although NFS might look like a local file system to the eye of the user, it is not. Every system call you place on the remote storage will suffer from the corresponding network latency. This is OK when working with a limited set of files, like (c)ESXi does, nonetheless if will be extremely inefficient and unfeasible when working with big deduplicated repos.
On top of that the collection of utilities that come with (c)ESXi are a limited set of Busybox, they are in turn modified and stripped from some of their functions in some cases. Thus the sort binary that is throwing your -out of memory- error is not a full version of the well known Linux [b]sort[/b] utility.
(c)ESXi is not a full OS. It is a hypervisor. The use it makes of memory is not what you would expect from a full fledged OS. It will not use all available memory to run local binaries like [b]sort[/b]. This is not a problem to make your backups, still it will be when trying to perform operations that require massive amounts of CPU and memory. Thus, use the backup server to perform such actions instead of (c)ESXi.
As an excerpt:
- Host your deduplicated repositories in Linux. You may attach your linux box over NFS and perform local backups or send your backups over IP, that does not matter.
- Keep your backup repositories below a manageable size. This will be proportional to your hardware posibilities.
- Run your --repair and --prune operations from within the Linux backup server.
Apart from that there will still be a limit to what you can handle from within Linux, obviously the limit will be way higher than what you could achieve from within (c)ESXi over an NFS link. As (c)XSIBackup evolves it will become more powerful and refined, mostly by getting rid of the external binary dependencies and carrying out some actions by using more "built for the purpose" algorithms.
UPDATE:
Nonetheless, for the case being, --repair is a rather straight action. It collects blocks in the .map files on each restore point, loads them into a memory buffer, removes duplicates and checks for the physical blocks at the same time that it rewrites those blocks to the /data/.blocklog file or main block manifest file.
In your particular case --repair is failing in the first stages, while collecting blocks, it just can't grow the buffer any further, as the local shell is limited in resources when compared with what you want to achieve as previously explained. The solution is to run the very same command from the Linux backup host.
If you are hosting 20TB of data and, assuming that your compression ratio is just 90% (way below what (c)XSIBackup can achieve), you are managing 200TB of nominal data, which implies loading more than 5 billion block hashes into the memory buffer => 2.14 * 10^11 blocks. In your particular case you should devise a backup system, in special a backup host, that can really hold and work with that amount of data. That would require some dedicated M2 temp disk as cache in the (c)ESXi server (only if your VMs are really big) and a big amount of RAM in the backup host as well as a last generation CPU. The faster the disks are the better obviously, thus using M2 as the target of your backups would also be a great help.
Offline
Thank you for your detailed explanation. Sorry that I didn't explaned our environment in detail in the post.
- Yes, we are backing up to a special Linux based Backup Host (Synology NAS 1817+ with 8x8 TB Disks) mounted over NFS
- The backup volume is 20TB , the occupied space is currently around 12 TB in 16.616.908 files
- I have tried to run --repair from the Linux host:
root@NasBackup01:/volume1/xsibackup01/xsi_dir# ./xsibackup --repair ../DC-old/
|---------------------------------------------------------------------------------|
||-------------------------------------------------------------------------------||
||| (c)XSIBackup-Free 1.5.1.12: Backup & Replication Software |||
||| (c)33HOPS, Sistemas de Informacion y Redes, S.L. | All Rights Reserved |||
||-------------------------------------------------------------------------------||
|---------------------------------------------------------------------------------|
(c)Daniel J. Garcia Fidalgo | info@33hops.com
|---------------------------------------------------------------------------------|
System Information: Linux, Kernel 3 Major 10 Minor 105 Patch 0
-------------------------------------------------------------------------------------------------------------
PID: 5604, Running job as: root
-------------------------------------------------------------------------------------------------------------
Sorting blocks...
-------------------------------------------------------------------------------------------------------------
sort: option requires an argument -- 'T'
Try 'sort --help' for more information.
execve("/bin/sh", ["sh", "-c", "sort -T'' | uniq > ../DC-old/data/.blocklog.tmp"], [/* 20 vars */] <unfinished ...>
Offline
some more finding:
I mounted the nfs volume on a Ubuntu 18.04LTS server, getting the same error for the missing temp dir for sort:
sort: option requires an argument -- 'T'
then I installed a brand new RockyLinux system, mounting the nfs volume and starting xsibackup --repair
xsibackup is running and traversing the data dirs, but the first step creating the .blocklog.tmp was skipped: .blocklog.tmp is empty.
Looking at strace I'v found that you are calling xsilib from the bin dir, but:
root@rocky:/mnt/xsibackup01/xsi_dir/bin# ./xsilib
bash: ./xsilib: Datei oder Verzeichnis nicht gefunden
root@rocky:/mnt/xsibackup01/xsi_dir/bin# file ./xsilib
./xsilib: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, for GNU/Linux 2.6.9, not stripped
xsilib is a 32bit binary and does not run on 64bit Linux :-(
Offline
after installing on RockyLinux
[root@rocky klaus]# yum install glibc-devel
[root@rocky klaus]# yum install glibc-devel.i686
I can execute xsilib:
[root@rocky xsi_dir]# bin/xsilib --get-files
Speicherzugriffsfehler (Speicherabzug geschrieben)
but it crashes
Offline
Appliances such as (c)Synology and Qnap are great options when you need to provide some quick storage in the form of NFS or iSCSI, nonetheless they are closed down propietary environments. For what you want to do and the amount of data that you want to handle a full Linux OS with absolute control is a must.
The sort binary in Synology lacks the T option. You have a statically compiled sort binary in the /bin dir of the installation package that you can copy to your /usr/bin dir in the Synology appliance. Just keep a copy of the old one.
You don't need the xsilib binary in Linux, it is a helper only used when (c)XSIBackup is run in (c)ESXi. We don't know why it is being called in Rocky Linux, it could be a bug, we'll check that. In any case you can consider that strace output to be an spurious error. [b]Xsilib[/b] is used to traverse directory entries, it is not related to creating the .blocklog.tmp file.
If your --repair command is running fine and it's not returning any errors you can consider it to be running fine. As said, repair will just re-account the blocks in the restore points and rebuild the .blocklog file. We have designed all those files in text form so that you can actually see what is inside and how things work.
You could even use simple bash tools to partially rebuild the .blocklog file for some restore point only.
Offline
We have checked the [b]xsilib[/b] issue and we believe it to be virtually impossible that xsilib is invoked when executing xsibackup in a Linux environment, as the environment is checked before actually calling [b]xsilib[/b] and only if ESXi is detected this auxiliary library is used. The possibilities that you get a false positive for ESXi detection in Linux are zero.
Maybe there's some important piece of information that we are missing.
Maybe you are still running the command from within (c)ESXi. Please, run your --repair command from within the Linux shell.
Offline
so comparing strace between the two systems was leading me to the wrong path regarding xsilib :-(
[quote]The sort binary in Synology lacks the T option.[/quote]
The sort binary in Synology has the T option, but the temp path is missing as argument:
execve("/bin/sh", ["sh", "-c", "sort -T'' | uniq > ../DC-old/data/.blocklog.tmp"], [/* 20 vars */] <unfinished ...>
I removed a lot .map files and I am now down to a volume that --repair is running on VMWare.
It is now creating a blocklog file with 3.853.161 entries.
Giving a max blocksize of 1M that would be not even 4 TB !!
Offline
We had posted a really long answer on this, but the problem has too many variables and too many things that we still don't know.
(c)ESXi's sort is not going to do it for such a big amount of data. Run the --repair command from the server side instead.
If you need some further help contact support directly to receive more personalized support.
Offline