Automate data backup at the command line
Automatic Backup
Backing up data is an unpopular task that many users – and even some administrators – consider a chore, prompting us to take a look at some command-line automatic backup programs.
Linux users have access to numerous backup tools. Administrators who like working with SSH appreciate that servers of any size and design can be backed up with command-line programs. However, the differences in terms of features are quite considerable (see Table 1 for an overview). Not every program is suitable for every application scenario. In this article, I investigate which tools work for which environments.
Table 1
Command-Line Backup Tools
| Attic | bup | Duplicity | rdiff-backup | rsnapshot |
---|---|---|---|---|---|
Local backup |
Yes |
Yes |
Yes |
Yes |
Yes |
Backup via SSH |
Yes |
Yes |
Yes |
Yes |
Yes |
Verification |
Yes |
Yes |
Yes |
Yes |
Yes (logfile) |
Encryption |
Yes |
Yes |
Yes |
No |
No |
Cloud services |
No |
No |
Yes (Amazon, Rackspace) |
No |
No |
Include/exclude directory |
Yes |
Yes |
Yes |
Yes |
Yes |
Time-controlled |
Yes* |
Yes* |
Yes* |
Yes* |
Yes* |
Front ends available |
No |
Yes |
Yes |
No |
Yes |
Incremental backups |
Yes |
Yes |
Yes |
Yes |
No |
Differential backups |
No |
No |
No |
No |
No |
Manual full backup |
Yes |
Yes |
Yes |
Yes |
Yes |
FUSE-mount possible |
Yes |
Yes |
No |
Yes |
No |
*Backups scheduled with the cron daemon. |
Server vs. Desktop
Home users often store large volumes of data on their computers, similar in volume to those found on servers in small businesses. High-definition video collections, as well as audio files with lossless compression and photo folders, are real memory hogs. New data is often added, but once stored, the data hardly ever changes.
On the other hand, you will also often find small files (such as correspondence, tables, presentations, and databases) on server systems. These data collections are constantly changing through modifications, such as newly created records or added documents. Accordingly, backup strategies must take existing data resources into account to guarantee rapid reconstruction in the event of data loss.
Differential vs. Incremental
Administrators distinguish three backup strategies: full backup, differential backup, and incremental backup. The full backup, a copy of the existing data, is always the first backup in any plan – subsequent backups follow as differential or incremental backups. Whereas differential backups always save changes since the last full backup, incremental backups only save modifications relative to the last backup of any kind.
The differential backup procedure requires more space for individual backups, but in an emergency, only the full backup and the last backup will help you recover the entire database. Although the incremental method uses less disk space, all incremental backups need to be re-installed in the correct sequence during the restore starting from the full backup. If a small backup is missed, the database is no longer reconstructible.
Before selecting a command-line backup software, I recommend initially performing a careful analysis of your data collection and data growth, so you do not accidentally select a program that is unsuitable for your specific IT environment.
Differential backup strategies are more likely to be used for databases that have relatively few large files and moderate regular modifications, whereas incremental backups are better suited to typical office environments. Regardless of which you choose, you should always run at least one full backup per week.
Desktop users who want to back up their own databases without root privileges do not have a huge choice of backup software for the command line, which obviously requires some knowledge of the command syntax. For users, it is important to be able to run a backup as smoothly and reliably as possible. The end user will only use the backup software – and actually perform the backup – if it is quick and easy to use.
Ideally, the same software can be used for mixed environments with both a backup server and additional desktop backups by users. Using the same software saves you the hassle of having to know the syntax of two programs – and thus avoids any associated errors.
Attic
The Attic backup program, which is written in Python, can be found in the repositories of some Linux distributions, such as Mageia, openSUSE, ROSA, or Slackware Linux; it can be installed conveniently using the respective package managers. The project page also provides the source code for download. Detailed documentation is also available [1].
Attic requires Python v3.2 or greater and openSSL in a version greater than 1.0.0. Because the software also lets you mount a backup set in user space, the llfuse package from the Python treasure trove has to be installed to provide this function.
After a successful installation, you first have to initialize a backup repository. This is achieved with the command:
attic init /<Repository-Path>/<Repository-Name>.attic
Several directories can then be backed up in an archive (which should be specially created) in this repository. Attic does not enable encryption by default. The names for these archives can be freely selected. The following command backs up the directories:
attic create /<Repository-Path>/<Repository-Name>.attic::<Archive-Name>/<Source-Directory-1>/ <[...]> <Source-Directory-n>
If the data must be encrypted, then the
--encryption=passphrase|keyfile
parameter command must be added.
I recommend using the weekday as the archive name for regular backups of the same directories, which quickly gives you the correct sequence of backups during a restore. The first backup in a repository can take a long time to complete for large data volumes, but subsequent backups will be far quicker because Attic saves them incrementally (i.e., only modified or newly added data is included in the backup).
If you want to monitor the backup run, you can display the most important data for the backed up archive using the --stats
parameter. Attic not only lists the directories and the required time for the backup run, but also the number of files backed up and the volume of data. It shows both the original and the compressed backed up data volumes, so you can keep track of data compression efficiency (Figure 1).
In contrast to many other backup tools, Attic provides a convenient approach to listing archive content. For this purpose, enter the following command at the prompt:
attic list -v /<Repository-Path>/<Repository-Name>.attic::<Archive-Name>
The software then lists all the content, including file size, owner, and file permissions. Subdirectories are automatically included, and it shows the absolute paths.
Buy this article as PDF
(incl. VAT)