#26 2018-12-18 09:31:35

admin
Administrator
Registered: 2017-04-21
Posts: 813

Re: Space usage with xsitools

If you only have room for one backup, then what you need is more room.
You can try to overcome it in any possible way, but facing the real issue is something that you can't escape.

We will study this as a feature in next main branch.

Offline

#27 2018-12-18 09:45:01

lievenmoors
Member
Registered: 2018-11-21
Posts: 8

Re: Space usage with xsitools

I made a couple of changes to the script above.
The main changes are:

- Use -name '*-flat.vmdk' when looking for vmdk files containing hashes.
  On the Xsibackup website, it is stated that only these vmdk files are deduplicated.

- Use sort -u, instead of sort + awk. I don't think awk is needed because we sort the input anyway.

- Use: find with -regex  looking for blocks, instead of ls+grep,
  and be stricter on the filename (don't use word boundaries).

- use `basename $0` instead of $0

Note: make sure you create the directory "var/logs/$name" or adapt it to your taste...

#!/bin/sh

# Check for inconsistencies in xsitools-repository
# Find and delete unused files
# Prune old backups

name=`basename $0`

{
echo "Begin: `date`"

# usage
if [ -e $1/.xsitools ]
  then
  echo "$1 seems to be an xsitools-Repository, using it."
  else
  echo "$1 doesn't seem to be an xsitools-Repository."
  echo "Use \"$name [xsitools-repo-directory]\""
  exit 1
fi

if [ "$2" != "--delete" ]
  then
  echo "Use \"$name [xsitools-repo-directory] [--delete]\" to remove unused files (be careful)."
  else
  echo "\"--delete\" is set, will remove unused files."
fi

if echo $3 | egrep -q '^[0-9]+$';
  then
  echo "Searching for backup-folders older than $3 days."
  bkpfolders=`find $1 -type d -maxdepth 1 -regex ".*/[0-9\-]\{14\}" -mtime +$3`
  if [ ! -z "$bkpfolders" ]
    then
    echo "$bkpfolders found, deleting"
    rm -rf $bkpfolders
    else
    echo "No backup-folders found."
  fi
  else
  echo "3rd option can be a number: Delete backup-folders older than ... days."
  echo "You can use this to prune older backups (be careful)."
fi

# Temporary files and variables
temp_dir=`mktemp -d -t`
hashes="$temp_dir/hashes"
hashes_sorted="$temp_dir/hashes_sorted"
files="$temp_dir/files"
files_sorted="$temp_dir/files_sorted"
delete_candidates="$temp_dir/delete_candidates"
missing_files="$temp_dir/missing_files"
diff_output="$temp_dir/diff_output"
hashes_count=0
files_count=0

echo "Collecting hashes of all .vmdk files."
# my old version to exclude delta files:
# find $1/ -path data -prune -o -name *.vmdk -maxdepth 3 | grep -v '\delta.vmdk$' | grep -v '\sesparse.vmdk$' | while read line; do cat "$line" ; done | grep -o '\b[0-9a-f]\{40\}\+\b' > $hashes
# wile-loop inserted for handling filenames with spaces, exclude delta files (snapshots), faster search (thanks to wowbagger)
# find $1/ -path $1/data -prune -o -name *.vmdk | grep -v '\delta.vmdk$' | grep -v '\sesparse.vmdk$' | grep -v $1/data | while read LINE; do cat "$LINE" ; done | grep -o '^\b[0-9a-f]\{40\}\+\b' > $hashes
find $1/ -path $1/data -prune -o -name '*-flat.vmdk' -exec cat {} \; > $hashes
echo "Sorting hashes and removing duplicates."
sort -u $hashes > $hashes_sorted
hashes_count=`cat $hashes_sorted | wc -l`
echo "Hashes in vmdks: $hashes_count"

echo "Generating list of files in ./data."
# find $1/data -type f -exec basename {} \; > $files
# ls -1R $1/data | grep -o '\b[0-9a-f]\{40\}\+\b' > $files

find $1/data -type f -regex '.*/[0-9a-f]\{40\}\+$' -exec basename {} \; > $files
echo "Sorting list of files."
sort $files > $files_sorted
files_count=`cat $files_sorted | wc -l`

echo "Files: $files_count"

# some checks if everything is valid
echo "Using diff for comparing .vmdk-hashes with filenames in ./data."
diff $hashes_sorted $files_sorted -U 0 > $diff_output

if [ $? -eq 0 ];
  then
  echo "No unused files found. Every hash in the .vmdk files"
  echo "has a proper file in data-directory. Good."
  echo "Removing temporary files."
  rm -rf "$temp_dir"
  echo "End: `date`"
  exit 0
  else
  echo "Checking if hashes in .vmdk files have a file in the data-directory."
  grep "^-[a-f0-9]" $diff_output | sed 's/^.//' > $missing_files
  if [ `cat $missing_files | wc -l` -eq 0 ];
    then
    echo "Every hash contained in the .vmdk files has a proper file in data-directory. Good."
    grep "^+[a-f0-9]" $diff_output | sed 's/^.//' > $delete_candidates
    unused_count=`cat $delete_candidates | wc -l`
    echo "There are $unused_count unused files in ./data:"
    if [ "$2" != "--delete" ];
      then
      cat $delete_candidates
    fi
    else
    echo "The following `cat $missing_files | wc -l` data files are missing:"
    cat $missing_files
    echo "Repository is damaged. Leaving everything untouched. Exiting."
    echo "Removing temporary files."
    rm -rf "$temp_dir"
    echo "End: `date`"
    exit 1
  fi
fi

if [ "$2" == "--delete" ]
  then
  echo "Counting space used of $1/data."
  echo "Repo-size before pruning: `du $1/data/ -h -s | awk '{print $1;}'`"
  cat $delete_candidates | while read file
    do
    rmpath="$1/data/`echo $file | cut -c1-1`/`echo $file | cut -c2-1`/`echo $file |cut -c3-1`/$file"
    echo "Deleting $rmpath"
    rm -rf $rmpath
    done;
  echo "Counting space used of $1/data."
  echo "Repo-size after pruning: `du $1/data/ -h -s | awk '{print $1;}'`"
  echo "Removing empty directories."
  # Busybox find doesnt know -empty.
  find $1/data -type d -depth -exec rmdir -p --ignore-fail-on-non-empty {} \;
  echo "Counting files in data-directory again."
  # no sort needed here
  # find $1/data -type f -exec basename {} \; > $files
  # ls -1R $1/data | grep -o '\b[0-9a-f]\{40\}\+\b' > $files
  find $1/data -type f -regex '.*/[0-9a-f]\{40\}\+$' -exec basename {} \; > $files
  files_count=`cat $files | wc -l` 
  if [ $files_count == $hashes_count ]
    then
    echo "Number of files and hashes ($files_count) are same, everything went right."
    else
    echo "Number of files ($files_count) and hashes ($hashes_count) are different."
    echo "Perhaps not every file could be deleted. Check it using the logfile."
    echo "End: `date`"
    echo "Removing temporary files."
    rm -rf "$temp_dir"
    exit 1
  fi
  echo "Updating Bcnt in .xsitools-file:"
  bcnt=`grep Bcnt $1/.xsitools | awk -F ': ' '{print $2}'`
  echo "Old value of Bcnt: $bcnt."
  echo "Setting actual number of files ($files_count) as new value of Bcnt."
  sed -i -e "s/Bcnt: $bcnt/Bcnt: $files_count/" $1/.xsitools
fi

echo "Removing temporary files."
rm -rf "$temp_dir"
echo "End: `date`"
} 2>&1 | tee -a var/logs/$name/$name-`date +"%d"`.log

exit 0

Offline

#28 2018-12-18 10:02:57

lievenmoors
Member
Registered: 2018-11-21
Posts: 8

Re: Space usage with xsitools

admin wrote:

If you only have room for one backup, then what you need is more room.
You can try to overcome it in any possible way, but facing the real issue is something that you can't escape.

Do you mean that there is a high enough risk that this backup isn't sane?
Does this have to do with possible hash collisions?

Offline

#29 2018-12-18 16:38:15

admin
Administrator
Registered: 2017-04-21
Posts: 813

Re: Space usage with xsitools

No, that means that you need a bigger storage device.
To translate from probabilities to a "real life" joke, the possibility that you hit a hash collision is about the same than that of a meteor ridden by a chubby Santa landing in your toilet in the next 30 minutes, it's not zero, but pretty close to.

Offline

#30 2018-12-21 18:48:55

admin
Administrator
Registered: 2017-04-21
Posts: 813

Re: Space usage with xsitools

We will be sending XSIBACKUP-PRO 11.2.0 to registered users from Dec 27th, which includes (c)XSITools repository pruning, both on demand via the --prune-xsitoolsrepo and automatic by means of the --backup-room argument, which will limit the size to which an (c)XSITools repository can grow.

Offline

#31 2018-12-26 18:40:51

admin
Administrator
Registered: 2017-04-21
Posts: 813

Re: Space usage with xsitools

Next XSIBACKUP-PRO version will incorporate a pruning mechanism by taking into account the --backup-room argument, thus you will be able to rotate backups by sitting inside the boundaries of the amount passed along with this argument.

It is worth to note that the pruning is performed after the current VM being backed up has been performed, thus you need some maneuver margin which is at least the size of the current VM being backed up.

Future versions will include an (c)XSITools backup rotation mechanism by using the --del-dirs argument.

Due to SSDs nature, it is mandatory to keep at least 10% of the disk free at all time, otherwise they will reach their worn out limit much sooner. This is due to the physical limits of an SSD cell, which can only be overwritten a limited number of times.

https://blog.westerndigital.com/ssd-end … eds-needs/
https://www.cnet.com/how-to/how-ssds-so … -lifespan/

Offline

#32 2019-01-07 09:57:42

lievenmoors
Member
Registered: 2018-11-21
Posts: 8

Re: Space usage with xsitools

Could you explain if --backup-room will be able to make room within one repository, so the repository doesn't grow bigger than that. If I understood right, --backup-room used to delete older repositories when you had more than one xsitools repository. In other words, do I still need more than one repository, in order to make use of this feature?

Offline

#33 2019-01-07 11:08:10

admin
Administrator
Registered: 2017-04-21
Posts: 813

Re: Space usage with xsitools

You are too limited.
Download latest version 11.2.2, which allows to prune (c)XSITools repositories by using the --backup-room argument. It allows to parse up to 2048 gb as the XSITools repo limit, next version clears that restriction.

Offline

#34 2019-01-07 11:31:58

lievenmoors
Member
Registered: 2018-11-21
Posts: 8

Re: Space usage with xsitools

Just to be sure I understand how the new version works...

So if I run this xsitools job repeatedly:

"/vmfs/volumes/datastore1/xsi-dir/xsibackup" --backup-prog=xsitools:z --backup-point=/vmfs/volumes/Backup/xsit-repo --backup-type=Custom --backup-vms="..."  --backup-room=2000 --mail-to=... --use-smtp=1 --backup-how=Hot --backup-id=01 --description="..." --exec=yes >> "/vmfs/volumes/datastore1/xsi-dir/var/logs/xsibackup.log"

it will make room as needed within the --backup-point folder, by deleting the eldest folders (named like 20190102002335, as determined by mask), and by pruning the repository afterwards.

Do I understand this right?

Offline

#35 2019-01-07 12:17:32

admin
Administrator
Registered: 2017-04-21
Posts: 813

Re: Space usage with xsitools

It will prune the (c)XSITools repository to make sure it stays within the 2Tb boundary. You don't have to take this literally to the Mb unit, as there are maneuver margins and the repo is pruned only when the new backup set has been fit into the repo, but you just have to tweak the --backup-room a bit below your desired size to fit your available space.

Please, be aware that as --backup-room not only affects the size of the (c)XSITools repo, but also the accumulated size of the folders with an XSITools mask, i.e.: 20190107102345, it may also prune other types of backups if the accumulated size of those thusly masked folders reach the configured value, so it's better to keep your XSITools repositories in a different root folder.

You may as well run out of space in your disk while the size of the repo is still below the limit, in that case you would get out of space errors from the file system. It's easy to prevent all of this situations by keeping backup contents well organized.

Offline

Board footer