Automate data backup at the command line

Automatic Backup

Lead Image © Tezz Stock, 123RF.com

Lead Image © Tezz Stock, 123RF.com

Article from Issue 206/2018
Author(s):

Backing up data is an unpopular task that many users – and even some administrators – consider a chore, prompting us to take a look at some command-line automatic backup programs.

Linux users have access to numerous backup tools. Administrators who like working with SSH appreciate that servers of any size and design can be backed up with command-line programs. However, the differences in terms of features are quite considerable (see Table 1 for an overview). Not every program is suitable for every application scenario. In this article, I investigate which tools work for which environments.

Table 1

Command-Line Backup Tools

 

Attic

bup

Duplicity

rdiff-backup

rsnapshot

Local backup

Yes

Yes

Yes

Yes

Yes

Backup via SSH

Yes

Yes

Yes

Yes

Yes

Verification

Yes

Yes

Yes

Yes

Yes (logfile)

Encryption

Yes

Yes

Yes

No

No

Cloud services

No

No

Yes (Amazon, Rackspace)

No

No

Include/exclude directory

Yes

Yes

Yes

Yes

Yes

Time-controlled

Yes*

Yes*

Yes*

Yes*

Yes*

Front ends available

No

Yes

Yes

No

Yes

Incremental backups

Yes

Yes

Yes

Yes

No

Differential backups

No

No

No

No

No

Manual full backup

Yes

Yes

Yes

Yes

Yes

FUSE-mount possible

Yes

Yes

No

Yes

No

*Backups scheduled with the cron daemon.

Server vs. Desktop

Home users often store large volumes of data on their computers, similar in volume to those found on servers in small businesses. High-definition video collections, as well as audio files with lossless compression and photo folders, are real memory hogs. New data is often added, but once stored, the data hardly ever changes.

On the other hand, you will also often find small files (such as correspondence, tables, presentations, and databases) on server systems. These data collections are constantly changing through modifications, such as newly created records or added documents. Accordingly, backup strategies must take existing data resources into account to guarantee rapid reconstruction in the event of data loss.

Differential vs. Incremental

Administrators distinguish three backup strategies: full backup, differential backup, and incremental backup. The full backup, a copy of the existing data, is always the first backup in any plan – subsequent backups follow as differential or incremental backups. Whereas differential backups always save changes since the last full backup, incremental backups only save modifications relative to the last backup of any kind.

The differential backup procedure requires more space for individual backups, but in an emergency, only the full backup and the last backup will help you recover the entire database. Although the incremental method uses less disk space, all incremental backups need to be re-installed in the correct sequence during the restore starting from the full backup. If a small backup is missed, the database is no longer reconstructible.

Before selecting a command-line backup software, I recommend initially performing a careful analysis of your data collection and data growth, so you do not accidentally select a program that is unsuitable for your specific IT environment.

Differential backup strategies are more likely to be used for databases that have relatively few large files and moderate regular modifications, whereas incremental backups are better suited to typical office environments. Regardless of which you choose, you should always run at least one full backup per week.

Desktop users who want to back up their own databases without root privileges do not have a huge choice of backup software for the command line, which obviously requires some knowledge of the command syntax. For users, it is important to be able to run a backup as smoothly and reliably as possible. The end user will only use the backup software – and actually perform the backup – if it is quick and easy to use.

Ideally, the same software can be used for mixed environments with both a backup server and additional desktop backups by users. Using the same software saves you the hassle of having to know the syntax of two programs – and thus avoids any associated errors.

Attic

The Attic backup program, which is written in Python, can be found in the repositories of some Linux distributions, such as Mageia, openSUSE, ROSA, or Slackware Linux; it can be installed conveniently using the respective package managers. The project page also provides the source code for download. Detailed documentation is also available [1].

Attic requires Python v3.2 or greater and openSSL in a version greater than 1.0.0. Because the software also lets you mount a backup set in user space, the llfuse package from the Python treasure trove has to be installed to provide this function.

After a successful installation, you first have to initialize a backup repository. This is achieved with the command:

attic init /<Repository-Path>/<Repository-Name>.attic

Several directories can then be backed up in an archive (which should be specially created) in this repository. Attic does not enable encryption by default. The names for these archives can be freely selected. The following command backs up the directories:

attic create /<Repository-Path>/<Repository-Name>.attic::<Archive-Name>/<Source-Directory-1>/
  <[...]>
  <Source-Directory-n>

If the data must be encrypted, then the

--encryption=passphrase|keyfile

parameter command must be added.

I recommend using the weekday as the archive name for regular backups of the same directories, which quickly gives you the correct sequence of backups during a restore. The first backup in a repository can take a long time to complete for large data volumes, but subsequent backups will be far quicker because Attic saves them incrementally (i.e., only modified or newly added data is included in the backup).

If you want to monitor the backup run, you can display the most important data for the backed up archive using the --stats parameter. Attic not only lists the directories and the required time for the backup run, but also the number of files backed up and the volume of data. It shows both the original and the compressed backed up data volumes, so you can keep track of data compression efficiency (Figure 1).

Figure 1: Attic provides clear-cut data for the latest backup.

In contrast to many other backup tools, Attic provides a convenient approach to listing archive content. For this purpose, enter the following command at the prompt:

attic list -v /<Repository-Path>/<Repository-Name>.attic::<Archive-Name>

The software then lists all the content, including file size, owner, and file permissions. Subdirectories are automatically included, and it shows the absolute paths.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Charly's Column: Backup2l

    At his cozy home, sys admin Charly has always used rsnapshot to back up his data. Although things will be staying this way for server backups, he has found something more manageable for backing up the Kühnast family's desktop PCs.

  • Rotating Backup with rsync
  • Admin Workshop: Backups with Rsync

    It is often inefficient to fire up a tape drive whenever you need to back up files or restore a backup. The Rsync tool pushes critical files to a second computer, where you can access them easily.

  • Admin Workshop: Backups

    Data always seems to get lost at exactly the wrong moment, but the right backup strategy can help you restore those missing files.

  • Areca Backup

    Sometimes you just need to back up a few directories on a computer, not administer a distributed installation or an array of disks. Areca Backup gives you hassle-free backups of individual hard drives.

comments powered by Disqus