Posts by rs

rs · © XSITools

To speed up the script you can comment out these lines:

echo "Repo-size before pruning: `du $1/data/ -h -s | awk '{print $1;}'`"

and

echo "Repo-size after pruning: `du $1/data/ -h -s | awk '{print $1;}'`"

These take long to execute and are only informational.

rs · © XSITools

Hi,

thank you for your suggestions. I inserted them in the actual version of the script which I'm using. To be double-sure I exclude delta.vmdk and sesparse.vmdk files. There can still be files beginning with a 40-character hex-pattern in a snapshot.

#!/bin/sh

# Check for inconsistencies in xsitools-repository
# Find and delete unused files
# Prune old backups

{
echo "Begin: `date`"
# usage
if [ -e $1/.xsitools ]
  then
  echo "$1 seems to be an xsitools-Repository, using it."
  else
  echo "$1 doesn't seem to be an xsitools-Repository."
  echo "Use \"$0 [xsitools-repo-directory]\""
  exit 1
fi

if [ "$2" != "--delete" ]
  then
  echo "Use \"$0 [xsitools-repo-directory] [--delete]\" to remove unused files (be careful)."
  else
  echo "\"--delete\" is set, will remove unused files."
fi

if echo $3 | egrep -q '^[0-9]+$';
  then
  echo "Searching for backup-folders older than $3 days."
  bkpfolders=`find $1 -type d -maxdepth 1 -regex ".*/[0-9\-]\{14\}" -mtime +$3`
  if [ ! -z "$bkpfolders" ]
    then
    echo "$bkpfolders found, deleting"
    rm -rf $bkpfolders
    else
    echo "No backup-folders found."
  fi
  else
  echo "3rd option can be a number: Delete backup-folders older than ... days."
  echo "You can use this to prune older backups (be careful)."
fi

# Temporary files and variables
temp_dir=`mktemp -d -t`
hashes="$temp_dir/hashes"
hashes_sorted="$temp_dir/hashes_sorted"
files="$temp_dir/files"
files_sorted="$temp_dir/files_sorted"
delete_candidates="$temp_dir/delete_candidates"
missing_files="$temp_dir/missing_files"
diff_output="$temp_dir/diff_output"
hashes_count=0
files_count=0

echo "Collecting hashes of all .vmdk files."
# my old version to exclude delta files:
# find $1/ -path data -prune -o -name *.vmdk -maxdepth 3 | grep -v '\delta.vmdk$' | grep -v '\sesparse.vmdk$' | while read line; do cat "$line" ; done | grep -o '\b[0-9a-f]\{40\}\+\b' > $hashes
# wile-loop inserted for handling filenames with spaces, exclude delta files (snapshots), faster search (thanks to wowbagger)
find $1/ -path $1/data -prune -o -name *.vmdk | grep -v '\delta.vmdk$' | grep -v '\sesparse.vmdk$' | grep -v $1/data | while read LINE; do cat "$LINE" ; done | grep -o '^\b[0-9a-f]\{40\}\+\b' > $hashes
echo "Sorting hashes and removing duplicates."
sort $hashes | awk '!a[$0]++' > $hashes_sorted
hashes_count=`cat $hashes_sorted | wc -l`
echo "Hashes in vmdks: $hashes_count"

echo "Generating list of files in ./data."
# find $1/data -type f -exec basename {} \; > $files
ls -1R $1/data | grep -o '\b[0-9a-f]\{40\}\+\b' > $files

echo "Sorting list of files."
sort $files > $files_sorted
files_count=`cat $files_sorted | wc -l`

echo "Files: $files_count"

# some checks if everything is valid
echo "Using diff for comparing .vmdk-hashes with filenames in ./data."
diff $hashes_sorted $files_sorted -U 0 > $diff_output

if [ $? -eq 0 ];
  then
  echo "No unused files found. Every hash in the .vmdk files"
  echo "has a proper file in data-directory. Good."
  echo "Removing temporary files."
  rm -rf "$temp_dir"
  echo "End: `date`"
  exit 0
  else
  echo "Checking if hashes in .vmdk files have a file in the data-directory."
  grep "^-[a-f0-9]" $diff_output | sed 's/^.//' > $missing_files
  if [ `cat $missing_files | wc -l` -eq 0 ];
    then
    echo "Every hash contained in the .vmdk files has a proper file in data-directory. Good."
    grep "^+[a-f0-9]" $diff_output | sed 's/^.//' > $delete_candidates
    unused_count=`cat $delete_candidates | wc -l`
    echo "There are $unused_count unused files in ./data:"
    if [ "$2" != "--delete" ];
      then
      cat $delete_candidates
    fi
    else
    echo "The following `cat $missing_files | wc -l` data files are missing:"
    cat $missing_files
    echo "Repository is damaged. Leaving everything untouched. Exiting."
    echo "Removing temporary files."
    rm -rf "$temp_dir"
    echo "End: `date`"
    exit 1
  fi
fi

if [ "$2" == "--delete" ]
  then
  echo "Counting space used of $1/data."
  echo "Repo-size before pruning: `du $1/data/ -h -s | awk '{print $1;}'`"
  cat $delete_candidates | while read file
    do
    rmpath="$1/data/`echo $file | cut -c1-1`/`echo $file | cut -c2-1`/`echo $file |cut -c3-1`/$file"
    echo "Deleting $rmpath"
    rm -rf $rmpath
    done;
  echo "Counting space used of $1/data."
  echo "Repo-size after pruning: `du $1/data/ -h -s | awk '{print $1;}'`"
  echo "Removing empty directories."
  # Busybox find doesnt know -empty.
  find $1/data -type d -depth -exec rmdir -p --ignore-fail-on-non-empty {} \;
  echo "Counting files in data-directory again."
  # no sort needed here
  # find $1/data -type f -exec basename {} \; > $files
  ls -1R $1/data | grep -o '\b[0-9a-f]\{40\}\+\b' > $files
  files_count=`cat $files | wc -l` 
  if [ $files_count == $hashes_count ]
    then
    echo "Number of files and hashes ($files_count) are same, everything went right."
    else
    echo "Number of files ($files_count) and hashes ($hashes_count) are different."
    echo "Perhaps not every file could be deleted. Check it using the logfile."
    echo "End: `date`"
    echo "Removing temporary files."
    rm -rf "$temp_dir"
    exit 1
  fi
  echo "Updating Bcnt in .xsitools-file:"
  bcnt=`grep Bcnt $1/.xsitools | awk -F ': ' '{print $2}'`
  echo "Old value of Bcnt: $bcnt."
  echo "Setting actual number of files ($files_count) as new value of Bcnt."
  sed -i -e "s/$bcnt/$files_count/g" $1/.xsitools
fi

echo "Removing temporary files."
rm -rf "$temp_dir"
echo "End: `date`"
} 2>&1 | tee -a $0-`date +"%Y-%m-%d"`.log

rs · © XSITools

The script is a temporary solution. I think everyone would prefer a similar function in xsibackup-pro. See it as a collection of ideas which posiblemente could be used and/or modified for integrating it into xsibackup.

Changing the line

cat `find $1/ -path data -prune -o -name *.vmdk` | grep -o '\b[0-9a-f]\{40\}\+\b' > $hashes

to

find $1/ -path data -prune -o -name *.vmdk | while read LINE; do cat "$LINE" ; done | grep -o '\b[0-9a-f]\{40\}\+\b' > $hashes

handles also filenames with spaces.

This is not very efficent, because cat is called once per .vmdk-file. But ist works. Better ideas are welcome.

I'm testing an updated version of the script, which I can post when it's working as I expect.

rs · © XSITools

Hello,

thank you for your suggestions. I made a script, which does this:

- It collects the hashes of the vmdk-files, sorts and removes duplicates.
- It generates a list of the files in data-directory and sorts it.
- With this information it feeds "diff" to find orphaned and (hopefully not) missing files.
- If no orphaned or missing files are found: log, exit.
- If there are missing files: log, exit.
- If there are orphaned files: log, exit.

I think, the script still needs some testing. Deleting the wrong files copuld be desastrous.
I would thank you if you have any suggestions.

Anyway, if option "--delete" is set:

- Delete the orphaned files
- Remove eventually empty directories in data-folder
- Update Bcnt-value in .xsitools-file

rs · © XSITools

#!/bin/sh

# Check for inconsistencies in xsitools-repository
# Find and delete unused files

{
echo "Begin: `date`"
# Temporary files and variables
hashes="/tmp/hashes"
hashes_sorted="/tmp/hashes_sorted"
files="/tmp/files"
files_sorted="/tmp/files_sorted"
delete_candidates="/tmp/delete_candidates"
missing_files="/tmp/missing_files"
diff_output="/tmp/diff_output"
files_count=0
hashes_count=0

echo "Deleting old temporary files"
rm -f $hashes $hashes_sorted $files $files_sorted $delete_candidates $missing_files $diff_output

# usage
if [ -e $1/.xsitools ]
  then
  echo "$1 seems to be an xsitools-Repository, using it."
  else
  echo "$1 doesn't seem to be an xsitools-Repository."
  echo "Use \"$0 [xsitools-repo-directory]\""
  exit 1
fi

if [ "$2" != "--delete" ]
  then
  echo "Use \"$0 [xsitools-repo-directory] --delete\" to remove unused blocks (be careful)"
  else
  echo "\"--delete\" is set, will remove unused blocks."
fi

echo "Collecting hashes of all .vmdk files."
cat `find $1/ -path data -prune -o -name *.vmdk` | grep -o '\b[0-9a-f]\{40\}\+\b' > $hashes

echo "Sorting and removing duplicate hashes." 
sort $hashes | awk '!a[$0]++' > $hashes_sorted
hashes_count=`cat $hashes_sorted | wc -l`

echo "Generating list of files in ./data."
find $1/data -type f -exec basename {} \; > $files

echo "Sorting list of files."
sort $files > $files_sorted
files_count=`cat $files_sorted | wc -l`

echo "hashes in vmdks: $hashes_count"
echo "files: $files_count"

# some checks if everything is valid
echo "Using diff for comparing .vmdk-hashes with filenames in ./data."
diff $hashes_sorted $files_sorted -U 0 > $diff_output

if [ $? -eq 0 ];
  then
  echo "No unused files found. Every hash in the .vmdk files"
  echo "has a proper file in data-directory. Good."
  else
  echo "Checking if hashes in .vmdk files have a file in the data-directory."
  grep "^-[a-f0-9]" $diff_output | sed 's/^.//' > $missing_files
  if [ `cat $missing_files | wc -l` -eq 0 ];
    then
    echo "Every hash contained in the .vmdk files has a proper file in data-directory. Good."
    grep "^+[a-f0-9]" $diff_output | sed 's/^.//' > $delete_candidates
    unused_count=`cat $delete_candidates | wc -l`
    echo "There are $unused_count unused files in ./data:"
    cat $delete_candidates
    else
    echo "The following `cat $missing_files | wc -l` data files are missing:"
    cat $missing_files
    echo "Repository is damaged. Leaving everything untouched. Exiting."
    echo "End: `date`"
    exit 1
  fi
fi

if [ "$2" == "--delete" ]
  then
  cat $delete_candidates | while read file
    do
    rmpath="$1/data/`echo $file | cut -c1-1`/`echo $file | cut -c2-1`/`echo $file |cut -c3-1`/$file"
    echo "Deleting $rmpath"
    rm -rf $rmpath
    done;
  echo "Removing empty directories."
  # Busybox find doesnt know -empty.
  find $1/data -type d -depth -exec rmdir -p --ignore-fail-on-non-empty {} \;
  echo "Updating Bcnt in .xsitools-file:"
  bcnt=`grep Bcnt $1/.xsitools | awk -F ': ' '{print $2}'`
  echo "Old value of Bcnt: $bcnt."
  echo "Setting actual number of files ($hashes_count) as new value of Bcnt."
  sed -i -e "s/$bcnt/$hashes_count/g" $1/.xsitools
fi

echo "End: `date`"
} 2>&1 | tee -a $0-`date +"%Y-%m-%d"`.log

Forum ©XSIBackup: ©VMWare ©ESXi Backup Software

#1 Re: © XSITools » Space usage with xsitools » 2018-06-07 13:51:07

#2 Re: © XSITools » Space usage with xsitools » 2018-06-07 12:48:19

#3 Re: © XSITools » Space usage with xsitools » 2018-02-06 21:48:03

#4 Re: © XSITools » Space usage with xsitools » 2018-02-01 11:46:47

#5 Re: © XSITools » Space usage with xsitools » 2018-02-01 11:45:47

Board footer