Breaking through the backup barrier
Smooth Operator
BackupPC handles backups over the network for a ranges of platforms. Find out more about this user-friendly, configurable, high-performance open source backup system.
Network backup platforms are often unwieldy, partly because of the complexities of scheduling logic and media management. User friendliness can be hard to find in an enterprise-ready backup system. The BackupPC project [1] fills the backup niche elegantly, handling backups over the network for a plethora of platforms and transports.
BackupPC follows the Unix tradition of small programs that perform a single task very well. Like other classic Unix utilities, BackupPC leverages the power of other applications instead of trying to reinvent the wheel. BackupPC supports several protocols for both Windows and Unix-like clients – from rsync and SMB/CIFS, to tar and rsync tunneling over SSH. The focus is on efficient scheduling and a user-friendly restore process.
BackupPC has an active user community with mailing lists and a user-generated wiki, and the project is still led by the original primary author, Craig Barratt. Although the tool has been around since 2001 and is relatively mature, the latest version – BackupPC 3.1.0 – seems to be reaching new users.
Benefits
One of the defining features of BackupPC is data de-duplication. In a traditional backup system, having multiple backups of files that haven't changed in more than one full backup interval requires storing the same information more than once. The problem is only compounded when you back up multiple computers – particularly if they are end-user machines that might be on the same circulation list for memos, spreadsheet, and other common documents. BackupPC addresses this problem with a two-tier check. The first check locates files with the same names and hashes the files to see whether they are identical. If it determines that the files are the same, it moves a single copy of the file to the "pool" and creates hard links to each instance of the file in the backup set. The results are surprising: In the first test I ran on eight machines (performing uncompressed backups and retention of two full backups and six incrementals), my total data store was ~1TB, but BackupPC's data de-duplication brought the actual size on disk to ~675GB.
BackupPC also offers several nice scheduling features, such as the ability to prioritize backups. By default, BackupPC wakes hourly and identifies any computer that hasn't completed a backup within the specified interval. Also, it checks to see which machines are on the network, and after combining these two lists, BackupPC prioritizes the list of available hosts on the basis of time since the last backup. Other factors can also influence this priority list. For instance, a machine that is on the network 24 hours a day is generally preempted by a machine with a more sporadic network presence record.
My favorite feature of BackupPC is that end users can initiate and perform their own restores without the interaction of the backup operator or system administrator. If you have been involved in backups on any scale, you know that handling restores of a lost or mangled file is time consuming. If the user needs to find a specific version of the file, the restore process can grow into a multi-hour effort. BackupPC offers a friendly web interface that provides a directory and file tree for each backup. Users can select a single file or multiple files in the tree, and BackupPC will restore these files without the need for a system administrator. BackupPC even checks for whether the user has the necessary access permissions to view the file before beginning the restore.
Users also have some control over when to start a backup (full or incremental) or whether to remove their machines from the backup list for a number of hours.
Installation
Installation of BackupPC is relatively painless because it's included in most mainstream distribution package repositories. However, sometimes it doesn't include the latest available code or has some special installation requirements, so I'll cover installation from source. If you used your distribution's package for BackupPC, skip ahead to the Configuration section.
Before working on the installation, you must consider disk space and how it is set up. Because BackupPC handles de-duplication by creating hard links from the file location in the directory structure of the backups to the pool where duplicated files are actually stored, the backup store must be on a single filesystem. This doesn't mean that you can't use LVM, software RAID, or hardware RAID to combine multiple disks, but you can only use a single filesystem to hold the store.
BackupPC tests to ensure that it can create these hard links at each startup. You'll need to know the mount point of this filesystem during installation.
The next two steps are really one-liners in the console and consist of creating a user for BackupPC to run as and installing the software prerequisites. Of course, this assumes that you have httpd already installed and configured for your server:
# adduser backuppc # yum install perl-Compress-Zlib perl-Archive-Zip perl-XML-RSS perl-File-RsyncP
After the prerequisites are out of the way, you can grab the source [1] and uncompress it:
$ tar -zxvf BackupPC-3.1.0.tar.gz $ cd BackupPC-3.1.0 $ su -c "perl ./configure.pl"
This launches the installer, which performs the basic configuration and installation of BackupPC. The default answers are fine, with a few exceptions. The data directory should be the mount point of the filesystem for the backup pool (e.g., /data/BackupPC or a subdirectory therein). Also, you might need to enter the correct path for the CGI bin directory (e.g., /var/www/cgi-bin/).
So that BackupPC will start automatically, you must add init scripts to your system. In the init.d subdirectory, you will find init scripts for a variety of distributions.
Copy this to /etc/init.d, set it to start on boot, and then start the daemon:
$ su -c "cp linux-backuppc /etc/init.d/backuppc" $ su -c "chkconfig --add backuppc" $ su -c "chkconfig --level 345 backuppc on" $ su -c "chkconfig --list backuppc" $ su -c "service backuppc start"
Configuration
Although the installation process handles the basic configuration elements, other options are available via the web interface or command line.
BackupPC configuration is contained in two files under /etc/BackupPC: hosts details the identity of the hosts to be backed up, and config.pl controls the server configuration.
The hosts file lists the hostnames to be backed up and the authorized users for that machine:
host dhcp user moreUsers # <--- do not edit this line nalleyt61 0 david # <--- example static IP host entry host2 1 bill jeff,fred # <--- example DHCP host entry
When I cover authenticating to the web interface, I'll explain authorized users more, but the vital points are the hostname and the DHCP setting. If your machine gets its address via DHCP, you still want to use 0 for the DHCP setting, which tells BackupPC to use DNS to find the host. Setting this value to 1 tells BackupPC only to use nmblookup to query for the host address via NetBIOS.
The default config.pl is configured to wake up every hour to look for hosts to back up, do a full backup approximately every 7 days, and do an incremental backup every day (Figure 1).
You can adjust these – and other – settings. The manual and the config file provide detail about each option, but I will only look at the minimum options that must be configured to start backups on either Windows or Linux. Also, it's important to remember that you can make modifications on a per-machine basis, too.
One thing to set up is the admin user and how backups will be transferred in your environment (see Listing 1). However, I don't advocate the use of root as the backup user; instead, I suggest that you use a low-privileged account and set up sudo so that rsync is accessible. As the backuppc user, you'll also need to log in to the client machine via SSH so that it becomes a known host.
Listing 1
config.pl Options
With the use of visudo, set up the following line in /etc/sudoers on the client machine,
backuppc ALL=NOPASSWD: /usr/ bin/rsync
then modify the command arguments so that it uses sudo to call rsync:
$Conf{RsyncClient Cmd} = '$sshPath -l backup $host nice -n 19 sudo /path/to/ rsyncSend $argList+';
Although you shouldn't limit yourself to just these configuration options, setting these items at a minimum will take care of backing up Windows machines or Linux machines with Samba shares exposed.
Although you can configure a number of other things, such as file/directory exclusions and compression levels, the last required item is configuring the web interface. The installation automatically installed the web interface, but you need to set up authentication for it, and you need a way to authenticate the users in the hosts file and the admin users. Because you are using Apache to provide authentication, you have a variety of ways to authenticate. For instance, you could use LDAP, Active Directory, basic authentication, or anything else Apache supports.
Although Barratt's documentation delves into setting up LDAP authentication, among others, I'll focus on basic digest authentication, which requires you to add the section shown in Listing 2 to httpd.conf. Then, you'll want to run:
htpasswd -c /etc/httpd/conf/ passwd ke4qqq htpasswd /etc/httpd/conf/ passwd bill
Note that the -c switch is only used when initially creating the password file; omit it for each subsequent user. This will prompt you for ke4qqq's password in the file.
After reloading httpd and starting BackupPC, you should be able to launch a browser and point it to http://backuppcserver/cgi-bin/BackupPC_Admin/authenticate as a user you created and gain access to the web interface. If you aren't an admin user, you'll only have access to machines on which you are listed as the user in the host file.
Listing 2
Modifying httpd.conf
Buy this article as PDF
(incl. VAT)