Registered users
Linkedin Twitter Google+

In order to improve user's experience and to enable some functionalities by tracking the user accross the website, this website uses its own cookies and from third parties, like Google Analytics and other similar activity tracking software. Read the Privacy Policy
33HOPS, IT Consultants Download XSIBackup
33HOPS ::: Proveedores de Soluciones Informáticas :: Madrid :+34 91 930 98 66Avda. Castilla la Mancha, 95 - local posterior - 28700 S.S. de los Reyes - MADRID33HOPS, Sistemas de Informacion y Redes, S.L.Info

<< Return to index

Part III out of IV - < Goto part II >

Tutorial: building a Linux lightweight deduplication appliance

As we have seen in the previous chapters of this set of posts dedicated to deduplication, we have a number of available Linux and Windows systems that we can use to take advantage of block level deduplication. In this post we'll see how we can build a deduplication appliance, that can serve as a datastore for our daily backups with XSIBackup. With such appliance we could store tenths or even hundreds of backups in the same space where we can now only fit a few backup folders. Such a device would offer us the ability to move in time through our set of backups to recover a file lost three months ago, per instance.

If you read the first post, you know we mentioned some comments from Linus Torvalds where he declared his skepticism about filesystems built on top of FUSE, mainly becose of it running in userspace memory. In our case we only need a storage FS, where we can fit our daily backups, to be able later on to browse trough them and pick up whatever we want to restore. So, we don't need much of anything, just as long as we can read and write to our system at decent rates and restore data effectively. We do not care if we can't connect 100 users to read and write at the same time, all we need is an efficient deduplicated storage device that works with modest resources.

The set of tools that we have choosen (after thorough testing) is:

- Centos 6.7 as the base O.S.
- Lessfs as the deduplicated filesystem.
- NFS as the transport protocol.

In this tutorial chapter, we'll start with a clean CentOS 6.7 installation from a minimum installation CD, we must make sure that we only install what we need for our appliance, all unnecesary devices should also be removed from the VM, such as: floppy, usb, audio, etc... For our post we have used a 1 tb. hard disk with a default partition layout and an etx4 FS.

Minimum hardware required

We'll use Putty to connect to our newly installed CentOS 6.7 O.S. Next, once we have installed the OS, we'll remove all unnecesary software and services, starting by selinux:

# vi /etc/selinux/config => set it to => SELINUX=disabled

Next, we'll remove all unneeded services:

- crond
- iptables
- ip6tables
- iscsi
- iscsid
- postfix
- rdisc
- restorecond
- rsyslog
- saslauthd

CentOS services to be removed

It's up to you to decide if you want to keep some of these services, like the firewall iptables, iscsi daemons, etc... In any case, we just want a lightweight storage appliance for our example, so we'll remove anything that doesn't serve our direct purposes. You can use below code to disable the upper services. Note that some are removed and some other just disabled, depending on if they could be useful later on.

After copying and pasting the above code in your Putty window you'll end up with a minimum set of services running in your Centos 6.7 install.

CentOS memory consumption, top command
Top command showing a memory consumption below 100mb.

Great!, now we have the base for our system, but we still need to install some software:

- First we will install the open-vm-tools, this is a package that provides the same functionality as VMWare tools, but they are open source, and also well tested, so don't worry.

- We will also need to install NFS-Ganesha. It is an NFS server in userspace, this will ensure compatibility with our userspace filesystem Lessfs.

- And also mhash, a library that will provide Lessfs with different hashing algorithms.

- And wget, a neat small tool for downloading files.

To install open-vm-tools and NFS-Ganesha we first need to install the EPEL repo in our CentOS server, so:

# yum install epel-release && \
sudo yum install open-vm-tools nfs-ganesha mhash wget

On top of that we must install FUSE libraries and also TokyoCabinet. Although we will not be using it as our database system Lessfs uses it as a dependency. So...

yum install tokyocabinet fuse-libs

Now we can start with the specific software and services that will provide us with the deduplication functionality. We will use Lessfs 1.7.0 available here for download:

Lessfs is a FUSE FS, it is probably not the most award winning deduplication FS out there, maybe as Linus Torvalds commented is a "toy" in comparison to well known "built from the base" FSs like ext(2|3|4), ntfs, reiserfs, etc... But it does what we need, it does it reasonably well, and it does it with a limited set of resources. So, we'll use it becose it satisfies our needs as the base FS for a deduplicated backup device. We don't care if it does not support many concurrent users or we need to tweak the startup script ourselves.

Apart from Lessfs binaries, we will need a key/value database to store all deduplicated blocks and their hashes. Lessfs can use various databases: Tokyo Cabinet, HamsterDB or BerkeleyDB, we will use the last, it's not the fastest, but it is the safest. Tokyo Cabinet is really fast, but its not very reliable in case of power outages, you can read more details about these facts here:

To be tidy, we would need two CentOS 6.7 installations, one of them with all the development tools installed, to compile all the needed software, and the other to be used as our production OS. In sake of concreteness I'll provide the Lessfs compiled binaries and startup script for a CentOS 6.7 OS. You can use the below command to download the needed binaries.

The last four commands will install the Lessfs binaries and the service startup script in their final location, so don't worry, you got them where you want yet. Now we need the BerkeleyDB binaries. We'll be using version 4.8. The following command will download BerkeleyDB binary compiled for CentOS 6.7.

We must make sure that FUSE is loaded on start, so we add this line to /etc/rc.d/rc.local

echo "modprobe fuse > /dev/null 2>&1" >> /etc/rc.d/rc.local

And thats it!, we have all the packages installed and we are ready to learn how to use our datastore.

Part III out of IV < Goto part IV >

This page was last modified on 2021-01-12

Website Map
Resources & help
33HOPS Forum
Index of Docs

©33HOPS site relies on the following technologies and partners:
SSL Protocol PayPal Payment Gateway Stripe Payment Gateway

©33HOPS Sistemas de Información y Redes, S.L. | VAT No: ESB83583716 | Avda. Castilla la Mancha, 95, local posterior, 28701 San Sebastián e los Reyes (Madrid) Spain

Fill in to download
The download link will be sent to your e-mail.

            Read our Privacy Policy

(*) DC & Pro users, please login to your user area to download