#1 2017-04-24 15:48:57

marcoi
Member
Registered: 2017-04-24
Posts: 8

Space usage with xsitools

I been testing xsitools and like what I see so far. I did have a question on how to maintain space of the local VMFS that the repo is stored on since, each backup is a time stamp. Eventually I would like to remove some of the older content time stamps or keep a limited set, say a weeks worth of backups. Are there options to do this? What happens if I have xsitools keep going until the VMFS runs out of space, will the tool know to stop or clean up the repo? I like to know what is automated and what I need to worry about.

Offline

#2 2017-04-24 16:04:57

Daniel
Moderator
Registered: 2017-04-22
Posts: 60

Re: Space usage with xsitools

As XSITools is a deduplication engine, the timestamped folders do not contain data (except for the auxiliary files: .vmx, vmxf, .nvram...). What they contain are hash tables for the

-flat.vmdk

files, which in the end are the real holders of data in VMWare ESXi.

Deleting the blocks corresponding to those hash tables on the other hand is not possible, as those blocks may, and will in fact be shared by other backups. We could detect all blocks that are exclusively used by that backup (which would be very CPU intensive), but that would not free much space.

So, what to do to limit space usage by XSITools repos?.
Well, the approach is different, you cannot limit backups by space, because you do not know in advance what will be the resulting FS space consumed, but you can limit backups by time. Nevertheless, if XSITools is short of space it will still delete the eldest ordinary backups to make room.

The way to limit backups in time is by backing up to a dinamic path by using the date function like this:

Example1, the repo folder will be initialized every month:

--backup-point="/vmfs/volumes/backup3/$(date +%Y%m)"

Example2, the repo folder will be initialized every quarter

--backup-point="/vmfs/volumes/backup3/$( date +"%Y %m" | awk '{Q=int($2/4)+1; printf("%sQ%s\n", $1, Q);}' )"

Example3, set up two backup devices, one for even days and the other one for odd days

--backup-point="/vmfs/volumes/backup$( [ $(($(date +'%d')%2)) -eq 0 ] && echo EvenDays || echo OddDays )"

Example4, backup to the even and odd days devices creating one new backup folder per quarter

--backup-point="/vmfs/volumes/backup$( [ $(($(date +'%d')%2)) -eq 0 ] && echo EvenDays || echo OddDays )"/$( date +"%Y %m" | awk '{Q=int($2/4)+1; printf("%sQ%s\n", $1, Q);}' )

Last edited by Daniel (2017-04-24 16:19:35)

Offline

#3 2017-10-09 08:54:35

sistemi
Member
Registered: 2017-08-29
Posts: 53

Re: Space usage with xsitools

If i understood the xsitools logic, deleting a dir could create orphaned blocks, check this "ugly version" script to check if the are orphaned blocks.

It must be copied into xsitools repo main  folder, and forgive the "2017*" filter used on grep, i'm not a skilled bash developer, suggestions are welcome!

#!/bin/sh
#
# Check orphaned blocks in xsitools repository
#
# ugly first version
#
# gets total existing blocks
brecrd=$( cat .xsitools | grep Bcnt | awk -F ': ' '{print $2}' )
echo $(date) Total blocks: $brecrd

bproc="0"
#for each block search if referenced in any vmdk flat file ...
find ./data -type f -exec basename {} \; | while read  file
do
#ugly filter to exclude .data folder   
  ret=$(grep -lrx -m1 $file 2017* | grep ".vmdk" | wc -l)
# progress counter      
  bproc=$(( $bproc + 1 ))
  n=$(($bproc%100))

  if [ "$n" == "0" ]
  then
    echo $(date) Processed: $bproc of $brecrd  
  fi 

# if found print it, for now, todo: delete it
  if [ "$ret" == "0" ]
  then
    echo Found unused block: $file  
  fi 
done;

Offline

#4 2017-10-09 18:24:16

admin
Administrator
Registered: 2017-04-21
Posts: 563

Re: Space usage with xsitools

We'll take this as a software request, it's something we already had in our todo list. All possible solutions will need to be taken into account and performance tests will need to be carried on, in order to find the most efficient solution.

Offline

#5 2018-02-01 11:45:47

rs
Member
Registered: 2018-01-31
Posts: 5

Re: Space usage with xsitools

#!/bin/sh

# Check for inconsistencies in xsitools-repository
# Find and delete unused files

{
echo "Begin: `date`"
# Temporary files and variables
hashes="/tmp/hashes"
hashes_sorted="/tmp/hashes_sorted"
files="/tmp/files"
files_sorted="/tmp/files_sorted"
delete_candidates="/tmp/delete_candidates"
missing_files="/tmp/missing_files"
diff_output="/tmp/diff_output"
files_count=0
hashes_count=0

echo "Deleting old temporary files"
rm -f $hashes $hashes_sorted $files $files_sorted $delete_candidates $missing_files $diff_output

# usage
if [ -e $1/.xsitools ]
  then
  echo "$1 seems to be an xsitools-Repository, using it."
  else
  echo "$1 doesn't seem to be an xsitools-Repository."
  echo "Use \"$0 [xsitools-repo-directory]\""
  exit 1
fi

if [ "$2" != "--delete" ]
  then
  echo "Use \"$0 [xsitools-repo-directory] --delete\" to remove unused blocks (be careful)"
  else
  echo "\"--delete\" is set, will remove unused blocks."
fi

echo "Collecting hashes of all .vmdk files."
cat `find $1/ -path data -prune -o -name *.vmdk` | grep -o '\b[0-9a-f]\{40\}\+\b' > $hashes

echo "Sorting and removing duplicate hashes." 
sort $hashes | awk '!a[$0]++' > $hashes_sorted
hashes_count=`cat $hashes_sorted | wc -l`

echo "Generating list of files in ./data."
find $1/data -type f -exec basename {} \; > $files

echo "Sorting list of files."
sort $files > $files_sorted
files_count=`cat $files_sorted | wc -l`

echo "hashes in vmdks: $hashes_count"
echo "files: $files_count"

# some checks if everything is valid
echo "Using diff for comparing .vmdk-hashes with filenames in ./data."
diff $hashes_sorted $files_sorted -U 0 > $diff_output

if [ $? -eq 0 ];
  then
  echo "No unused files found. Every hash in the .vmdk files"
  echo "has a proper file in data-directory. Good."
  else
  echo "Checking if hashes in .vmdk files have a file in the data-directory."
  grep "^-[a-f0-9]" $diff_output | sed 's/^.//' > $missing_files
  if [ `cat $missing_files | wc -l` -eq 0 ];
    then
    echo "Every hash contained in the .vmdk files has a proper file in data-directory. Good."
    grep "^+[a-f0-9]" $diff_output | sed 's/^.//' > $delete_candidates
    unused_count=`cat $delete_candidates | wc -l`
    echo "There are $unused_count unused files in ./data:"
    cat $delete_candidates
    else
    echo "The following `cat $missing_files | wc -l` data files are missing:"
    cat $missing_files
    echo "Repository is damaged. Leaving everything untouched. Exiting."
    echo "End: `date`"
    exit 1
  fi
fi

if [ "$2" == "--delete" ]
  then
  cat $delete_candidates | while read file
    do
    rmpath="$1/data/`echo $file | cut -c1-1`/`echo $file | cut -c2-1`/`echo $file |cut -c3-1`/$file"
    echo "Deleting $rmpath"
    rm -rf $rmpath
    done;
  echo "Removing empty directories."
  # Busybox find doesnt know -empty.
  find $1/data -type d -depth -exec rmdir -p --ignore-fail-on-non-empty {} \;
  echo "Updating Bcnt in .xsitools-file:"
  bcnt=`grep Bcnt $1/.xsitools | awk -F ': ' '{print $2}'`
  echo "Old value of Bcnt: $bcnt."
  echo "Setting actual number of files ($hashes_count) as new value of Bcnt."
  sed -i -e "s/$bcnt/$hashes_count/g" $1/.xsitools
fi

echo "End: `date`"
} 2>&1 | tee -a $0-`date +"%Y-%m-%d"`.log

Offline

#6 2018-02-01 11:46:47

rs
Member
Registered: 2018-01-31
Posts: 5

Re: Space usage with xsitools

Hello,

thank you for your suggestions. I made a script, which does this:

- It collects the hashes of the vmdk-files, sorts and removes duplicates.
- It generates a list of the files in data-directory and sorts it.
- With this information it feeds "diff" to find orphaned and (hopefully not) missing files.
  - If no orphaned or missing files are found: log, exit.
  - If there are missing files: log, exit.
  - If there are orphaned files: log, exit.

I think, the script still needs some testing. Deleting the wrong files copuld be desastrous.
I would thank you if you have any suggestions.

Anyway, if option "--delete" is set:

- Delete the orphaned files
- Remove eventually empty directories in data-folder
- Update Bcnt-value in .xsitools-file

Offline

#7 2018-02-01 14:51:35

marcoi
Member
Registered: 2017-04-24
Posts: 8

Re: Space usage with xsitools

As an alternative, you can use the weekly qualifier in the storage path then remove the weeks you no longer need to keep as backups.
so you would have
WEEKLY
-->01
----->data
----->01012018
--------->vmBackup1
----->01022018
--------->vmBackup1

-->02
----->data
----->01102018
--------->vmBackup1
----->01112018
--------->vmBackup1

ETC.

Offline

#8 2018-02-01 18:12:08

admin
Administrator
Registered: 2017-04-21
Posts: 563

Re: Space usage with xsitools

Dear Sirs:

You can post add-ons, just as long as they don't imply modifying and redistributing XSIBACKUP-PRO's code. Nevertheless, we don't support any kind of third party tool and will not give support for it.

Thank you for your understanding.

Offline

#9 2018-02-06 09:52:32

sistemi
Member
Registered: 2017-08-29
Posts: 53

Re: Space usage with xsitools

Hi admin, may we use this thread to follow the development of that promising script? I just want to open a bug for it regarding the names of the .vmdk files containing spaces that causes the script to crash.... Thanks.

Offline

#10 2018-02-06 10:23:40

admin
Administrator
Registered: 2017-04-21
Posts: 563

Re: Space usage with xsitools

You can do it, but we'll add support for that in some future version. When that moment comes, this script will be removed, and in the meanwhile it's you that will have to give support to its users.

So please, this goes for anybody using this or other third party scripts. Do not contact support to place questions about this or any other third party script, it does not matter if you are a registered user or not.
WE DO NOT OFFER SUPPORT FOR THIS SCRIPT AND OTHER THIRD PARTY TOOLS.

Regards

Offline

#11 2018-02-06 10:54:56

sistemi
Member
Registered: 2017-08-29
Posts: 53

Re: Space usage with xsitools

That's clear, thank you!

Offline

#12 2018-02-06 21:48:03

rs
Member
Registered: 2018-01-31
Posts: 5

Re: Space usage with xsitools

The script is a temporary solution. I think everyone would prefer a similar function in xsibackup-pro. See it as a collection of ideas which posiblemente could be used and/or modified for integrating it into xsibackup.

Changing the line

cat `find $1/ -path data -prune -o -name *.vmdk` | grep -o '\b[0-9a-f]\{40\}\+\b' > $hashes

to

find $1/ -path data -prune -o -name *.vmdk | while read LINE; do cat "$LINE" ; done | grep -o '\b[0-9a-f]\{40\}\+\b' > $hashes

handles also filenames with spaces.

This is not very efficent, because cat is called once per .vmdk-file. But ist works. Better ideas are welcome.

I'm testing an updated version of the script, which I can post when it's working as I expect.

Offline

#13 2018-02-07 10:40:30

admin
Administrator
Registered: 2017-04-21
Posts: 563

Re: Space usage with xsitools

I think you should not get obsessed with this feature too much. If we have not added it so far is for a number of reasons:

1 - Purging orphaned blocks would be time consuming
2 - Deleting the wrong ones would be desastrous
3 - The space you are going to save by purging a well populated repo is insignificant, as most blocks will be shared

So, in the end, the consumed CPU and the expected results in return, makes its cost effectiveness, as a feature, be very poor.

Offline

#14 2018-04-28 14:04:52

wowbagger
Member
Registered: 2017-05-11
Posts: 14

Re: Space usage with xsitools

I've got about 40-60 GB of daily data added to the data directory so this script comes in handy. It's been running for while thinning out the backup tail but recently it started reporting errors while xsibackup did not report any errors on full repository checks.

Whenever XSIBackup/XSITools backs up a server that has a snapshot it stores the actual binary vmdk file in the VM folder and not in data. This snapshot .vmdk file does not have the usual xsitools vmdk file block ID list layout so there is a possibility the grep regexp will match an ID in this file. Best case is the specific id does not exist in the data folder and it throws an integrity error, worst case it would delete a possible valid block from data. I came across this situation on a snapshotted pfSense that includes pfBlockerNG ip & dns spam/blacklists.
In the snapshot vmdk file there are spam url lists in cleartext exactly like this:

http://i-removed-the-hostname/169c13b4f0ce92d5e5740c354187bbaf/790d8

The

169c13b4f0ce92d5e5740c354187bbaf

part is matched by the regexp resulting in the reported error, it thinks it's a valid block from a vmdk but it's not. Adding a carret sets the regexp to match only at the beginning of a line. After this the reported errors were gone.

From:

grep -o '\b[0-9a-f]\{40\}\+\b' 

To:

grep -o '^\b[0-9a-f]\{40\}\+\b' 

These pfBlockerNG dns/ip spam/blacklists are generated/refreshed hourly in raw tmp slices hence why they show up in cleartext in the vmdk file I think.

Offline

#15 2018-05-01 15:21:33

wowbagger
Member
Registered: 2017-05-11
Posts: 14

Re: Space usage with xsitools

This

find $1/ -path $1/data -prune -o -name *.vmdk | grep -v $1/data | while read LINE; do cat "$LINE" ; done | grep -o '^\b[0-9a-f]\{40\}\+\b' > $hashes

generates the hashes a lot faster. The previous line still descended into data for some reason.

Offline

#16 2018-06-07 12:48:19

rs
Member
Registered: 2018-01-31
Posts: 5

Re: Space usage with xsitools

Hi,

thank you for your suggestions. I inserted them in the actual version of the script which I'm using. To be double-sure I exclude delta.vmdk and sesparse.vmdk files. There can still be files beginning with a 40-character hex-pattern in a snapshot.

#!/bin/sh

# Check for inconsistencies in xsitools-repository
# Find and delete unused files
# Prune old backups

{
echo "Begin: `date`"
# usage
if [ -e $1/.xsitools ]
  then
  echo "$1 seems to be an xsitools-Repository, using it."
  else
  echo "$1 doesn't seem to be an xsitools-Repository."
  echo "Use \"$0 [xsitools-repo-directory]\""
  exit 1
fi

if [ "$2" != "--delete" ]
  then
  echo "Use \"$0 [xsitools-repo-directory] [--delete]\" to remove unused files (be careful)."
  else
  echo "\"--delete\" is set, will remove unused files."
fi

if echo $3 | egrep -q '^[0-9]+$';
  then
  echo "Searching for backup-folders older than $3 days."
  bkpfolders=`find $1 -type d -maxdepth 1 -regex ".*/[0-9\-]\{14\}" -mtime +$3`
  if [ ! -z "$bkpfolders" ]
    then
    echo "$bkpfolders found, deleting"
    rm -rf $bkpfolders
    else
    echo "No backup-folders found."
  fi
  else
  echo "3rd option can be a number: Delete backup-folders older than ... days."
  echo "You can use this to prune older backups (be careful)."
fi

# Temporary files and variables
temp_dir=`mktemp -d -t`
hashes="$temp_dir/hashes"
hashes_sorted="$temp_dir/hashes_sorted"
files="$temp_dir/files"
files_sorted="$temp_dir/files_sorted"
delete_candidates="$temp_dir/delete_candidates"
missing_files="$temp_dir/missing_files"
diff_output="$temp_dir/diff_output"
hashes_count=0
files_count=0

echo "Collecting hashes of all .vmdk files."
# my old version to exclude delta files:
# find $1/ -path data -prune -o -name *.vmdk -maxdepth 3 | grep -v '\delta.vmdk$' | grep -v '\sesparse.vmdk$' | while read line; do cat "$line" ; done | grep -o '\b[0-9a-f]\{40\}\+\b' > $hashes
# wile-loop inserted for handling filenames with spaces, exclude delta files (snapshots), faster search (thanks to wowbagger)
find $1/ -path $1/data -prune -o -name *.vmdk | grep -v '\delta.vmdk$' | grep -v '\sesparse.vmdk$' | grep -v $1/data | while read LINE; do cat "$LINE" ; done | grep -o '^\b[0-9a-f]\{40\}\+\b' > $hashes
echo "Sorting hashes and removing duplicates."
sort $hashes | awk '!a[$0]++' > $hashes_sorted
hashes_count=`cat $hashes_sorted | wc -l`
echo "Hashes in vmdks: $hashes_count"

echo "Generating list of files in ./data."
# find $1/data -type f -exec basename {} \; > $files
ls -1R $1/data | grep -o '\b[0-9a-f]\{40\}\+\b' > $files

echo "Sorting list of files."
sort $files > $files_sorted
files_count=`cat $files_sorted | wc -l`

echo "Files: $files_count"

# some checks if everything is valid
echo "Using diff for comparing .vmdk-hashes with filenames in ./data."
diff $hashes_sorted $files_sorted -U 0 > $diff_output

if [ $? -eq 0 ];
  then
  echo "No unused files found. Every hash in the .vmdk files"
  echo "has a proper file in data-directory. Good."
  echo "Removing temporary files."
  rm -rf "$temp_dir"
  echo "End: `date`"
  exit 0
  else
  echo "Checking if hashes in .vmdk files have a file in the data-directory."
  grep "^-[a-f0-9]" $diff_output | sed 's/^.//' > $missing_files
  if [ `cat $missing_files | wc -l` -eq 0 ];
    then
    echo "Every hash contained in the .vmdk files has a proper file in data-directory. Good."
    grep "^+[a-f0-9]" $diff_output | sed 's/^.//' > $delete_candidates
    unused_count=`cat $delete_candidates | wc -l`
    echo "There are $unused_count unused files in ./data:"
    if [ "$2" != "--delete" ];
      then
      cat $delete_candidates
    fi
    else
    echo "The following `cat $missing_files | wc -l` data files are missing:"
    cat $missing_files
    echo "Repository is damaged. Leaving everything untouched. Exiting."
    echo "Removing temporary files."
    rm -rf "$temp_dir"
    echo "End: `date`"
    exit 1
  fi
fi

if [ "$2" == "--delete" ]
  then
  echo "Counting space used of $1/data."
  echo "Repo-size before pruning: `du $1/data/ -h -s | awk '{print $1;}'`"
  cat $delete_candidates | while read file
    do
    rmpath="$1/data/`echo $file | cut -c1-1`/`echo $file | cut -c2-1`/`echo $file |cut -c3-1`/$file"
    echo "Deleting $rmpath"
    rm -rf $rmpath
    done;
  echo "Counting space used of $1/data."
  echo "Repo-size after pruning: `du $1/data/ -h -s | awk '{print $1;}'`"
  echo "Removing empty directories."
  # Busybox find doesnt know -empty.
  find $1/data -type d -depth -exec rmdir -p --ignore-fail-on-non-empty {} \;
  echo "Counting files in data-directory again."
  # no sort needed here
  # find $1/data -type f -exec basename {} \; > $files
  ls -1R $1/data | grep -o '\b[0-9a-f]\{40\}\+\b' > $files
  files_count=`cat $files | wc -l` 
  if [ $files_count == $hashes_count ]
    then
    echo "Number of files and hashes ($files_count) are same, everything went right."
    else
    echo "Number of files ($files_count) and hashes ($hashes_count) are different."
    echo "Perhaps not every file could be deleted. Check it using the logfile."
    echo "End: `date`"
    echo "Removing temporary files."
    rm -rf "$temp_dir"
    exit 1
  fi
  echo "Updating Bcnt in .xsitools-file:"
  bcnt=`grep Bcnt $1/.xsitools | awk -F ': ' '{print $2}'`
  echo "Old value of Bcnt: $bcnt."
  echo "Setting actual number of files ($files_count) as new value of Bcnt."
  sed -i -e "s/$bcnt/$files_count/g" $1/.xsitools
fi

echo "Removing temporary files."
rm -rf "$temp_dir"
echo "End: `date`"
} 2>&1 | tee -a $0-`date +"%Y-%m-%d"`.log

Last edited by rs (2018-06-07 13:01:31)

Offline

#17 2018-06-07 13:51:07

rs
Member
Registered: 2018-01-31
Posts: 5

Re: Space usage with xsitools

To speed up the script you can comment out these lines:

echo "Repo-size before pruning: `du $1/data/ -h -s | awk '{print $1;}'`"

and

echo "Repo-size after pruning: `du $1/data/ -h -s | awk '{print $1;}'`"

These take long to execute and are only informational.

Offline

#18 2018-11-28 10:17:03

lievenmoors
Member
Registered: 2018-11-21
Posts: 3

Re: Space usage with xsitools

I was testing the script to cleanup after a backup and
the script wrongly assumed certain blocks where available.

The reason seemed to be that I had files like $hash.tmp and $hash.rm
in the repository. I guess those files where leftovers from a failed backup.

For the rest it seems to work really well. Thanks!

----

I would also like to mention to the xsi developers that the functionality of
this script is not a luxury when you have limited backup space.

If you want to backup a VM of 1.5T on a disk of 2.9T, chances are that
you will be able to store quite a number of "snapshots" with xsitools, but you
wouldn't be able to initialize a new repository without deleting the old,
and be without any snapshots for the duration of the backup.

In this case it would be a lot better to manage the space within one repository,
and check your repository from time to time.

And of course the best would be if xsibackup would be able to predict the space
that will be needed before doing the backup. But maybe that would double the backup time.

lieven

Offline

#19 2018-11-29 16:25:30

admin
Administrator
Registered: 2017-04-21
Posts: 563

Re: Space usage with xsitools

First of all, just in case you didn't see our post above:

WE DO NOT OFFER SUPPORT FOR THIS SCRIPT AND OTHER THIRD PARTY TOOLS

XSIBackup users can exchange ideas and code of their own in this forum up to some point that we decide.
Please do not take those upper letters as a shout, but as some official statement from part of 33hops.com: you use third party code at your own risk.

Please read the rest of the post to, at least, have our opinion in regards to purging XSITools repositories.

The "luxury" XSITools offers is measured in compression ratio. If you have a 2.9T disk and pretend to backup a 1.5T disk into it and keep two repositories, then you have some arithmetic problem. Knowing a disk's content on advance necessarily implies reading the disk, so you answered your own question. It would not take twice as much, but at least some 50% more.

Offline

#20 2018-12-04 09:20:18

lievenmoors
Member
Registered: 2018-11-21
Posts: 3

Re: Space usage with xsitools

In my case, I cannot afford two reposistories because I don't have that much space on my backup disks.
The reason is, that I chose a combination of data backups, and VM backups.

But that doesn't mean that I don't want to keep more than one snapshot. Because of data deduplication, I can keep
e.g. 5 daily snapshot, without needing 5 x the space. I can be relatively sure that it will fit, because I can monitor the
system, make an avarage of how much the system grows in one day, and add plenty headroom to that.

So I would like to plead that there is a use case for this functionality. And I wouldn't mind if the backup would
take 50% more time, if I would be sure it would fit.

I have read your arguments above. But I'm not sure I agree with all of them.

> 1 - Purging orphaned blocks would be time consuming

When I take a backup of my VM with XSItools, it takes roughly 8 hours,
even if most of the data is already there. Purging the orphaned blocks with this script
takes maybe half an hour. Of course with smaller block sizes, it might take longer.

> 2 - Deleting the wrong ones would be desastrous

Is it hard to make sure you don't?

> 3 - The space you are going to save by purging a well populated repo is insignificant, as most blocks will be shared

The space used only depends on how long you keep using the same repository.

Right now, the only way to make space is deleting every data block in the repository,
and start completely from scratch. Wouldn't it be at least possible to move the blocks
from one repo to the other? If you are worried about bad blocks, you can always check
the repository.

Offline

#21 2018-12-04 10:28:29

lievenmoors
Member
Registered: 2018-11-21
Posts: 3

Re: Space usage with xsitools

By the way,

have you guys heard of casync: https://github.com/systemd/casync
Wouldn't that be a perfect fit for xsitools? The software is LGPL2 licensed.

Or you might be using it already...

greetings

Offline

#22 2018-12-04 17:26:42

admin
Administrator
Registered: 2017-04-21
Posts: 563

Re: Space usage with xsitools

As XSIBackup and XSITools evolve, rotation will be achieved by pruning instead of by creating a new repository. Nevertheless, comprehending what deduplication is, necessarily involves being aware of the risks you assume and covering those risks appropiately, so some sort of redundancy is desirable, whether by keeping a historic archive or by cloning the repository itself.

Keeping some snapshots in your production VM is a good technique to achieve versioning while still using regular backup methods. The only drawbacks I can find is that 5 snapshots will limit performance, maybe not too much, that will mainly depend on your particular circumstances: hardware, load, concurrent users, etc....

On the other side snapshots are unique pieces of data, they can't be deduplicated, at least with a big block size. Even a small block size would not yield better results due to the nature of what a snapshot is. Obviously that will depend on the type of data you store on them. If we are talking about a server in a notary office where snapshots are constitued by thousands of Word documents, then maybe you would be able to achieve some sort of positive result with a small block size, still that's not very common and snapshots tend to be unique in the vast majority of the cases.

Thank you for the feedback, we'll take a look at it, although we are constrained by ESXi, so any piece of software that is not pure C has little chances to run on it.

Offline

#23 2018-12-05 08:29:57

sistemi
Member
Registered: 2017-08-29
Posts: 53

Re: Space usage with xsitools

If it could enhance the overall process, I found an alternative to openssl hashing that is very fast. I know that the bottleneck is the storage, but in some scenarios could speed up the block checks. I tested in esxi 5 and 6 and it is working, has to be compiled statically on linux.

project: http://cyan4973.github.io/xxHash/
https://github.com/Cyan4973/xxHash

to compile I followed this guide: http://www.kioptrix.com/blog/a-few-nice … -binaries/

Offline

#24 2018-12-05 12:47:50

admin
Administrator
Registered: 2017-04-21
Posts: 563

Re: Space usage with xsitools

We already use OpenSSL for hashing, thank you for the feedback.

Offline

Board footer