Data backup in the cloud with Duplicati
The free backup tool Duplicati simplifies the process of backing up data with cloud providers while at the same time protecting backups with strong cryptography.
The topic of online backups took on a whole new aspect after the Snowden revelations of recent weeks and months. Even before the NSA scandal, however, people were wondering how safe their data would be with a cloud provider (see the "Persistent Criticism" box). Although manual encryption of the data up front can help mitigate concern, the process also involves a huge amount of effort and makes the entire workflow complex.
Despite the facilities they offer, online services like Dropbox were already under fire before people became aware of intelligence service monitoring practices. This is because the tools either don't encrypt the transferred data they store, or they use crypto techniques that are managed by the provider.
Neither of these factors has anything to do with the fact that the connection between the PC and the favored provider is usually encrypted. To index the data entrusted to them, some service providers either entirely forgo encrypted storage or use their own process – users have no say in this.
Indexing allows the provider to determine whether a file has a known checksum. In this case, the provider does not need to save the data again but simply store a reference, which helps save storage space. The procedure is beneficial to the provider, but it also gives the provider full control over the data.
A remedy has appeared in the form of a small open source tool named Duplicati , which kills two birds with one stone: It includes back ends for many important cloud services, so you can forget about using the native clients, and it encrypts all your data before transferring.
In fact, the program offers a third distinct advantage over manually backing up data via the provider's native cloud client: As true backup software, Duplicati not only flexibly helps you select the data to be backed up, it also backs up data on the fly that you can't easily access as a user. Additionally, clients are available for Linux, Windows, and Mac OS X, so the program can be deployed seamlessly on multiple platforms.
Duplicati for Linux
The open source program Duplicati was originally penned by Danish developer Kenneth Skovhede, who is still is one of the main developers. Since 2011, Germany's René Stach has also coordinated public relations and managed communications with external developers.
The core of the software is under the GPL. Other licenses apply for some of the SDKs, tools, and libraries, such as AWSSDK, GPG, SQLite, or PuTTY. Version 1.3.4, which was published in February 2013, can be downloaded from the project website  with a choice of DEB and RPM packages for Linux distros, binaries for Windows and Mac OS X, and packed binary packages for Linux and Windows.
The program quickly demonstrates its roots in Windows: The interface in the Linux and Mac versions is based on the free .NET implementation Mono and the WinForms libraries. The tool for the command line, however, is similar for all three versions. In the upcoming version 2.0, the developers are planning some radical streamlining (see the "Duplicati 2.0" box).
The Duplicati developers are currently working hard on version 2.0, which – among other things – uses a more modern and more efficient storage engine. Additionally, the developers want to simplify the interface for the command line radically, so that the software theoretically just needs three commands:
restore. This change would improve the possibilities for scripting.
The new storage engine also uses a completely block-based approach but has continued to support filesystem-based back ends, such as FTP, SSH, and WebDAV. The new features, besides dedupe, include "NTFS reparse points" and "junctions," the NTFS version of symbolic links, which will be of more interest to Windows users. The new storage format also supports Amazon Glacier.
According to the developers, the new version also features a revamped SSH back end, an email notification function, and support for LZMA/7z compression. You can try out the CLI version of the new block-based engine by downloading from the project website . It serves as the basis for the upcoming (and long overdue) revamp of the interface, which is planned as a web interface in the future.
The advantage of Duplicati is that it includes an impressive range of back ends for the major cloud providers, including 1&1 SmartDrive, Amazon S3, Google Drive, ownCloud, Windows SkyDrive, Strato HiDrive, or T-Online Media Center. For more information on the supported cloud providers, see the Beginner's Guide . Before transmission, the software uses AES-256 or GPG to encrypt all the data, and it supports incremental backups.
Installing the packages for the distributions proves to be the easiest approach, because you do not have to worry about resolving dependencies yourself. Alternatively, you can simply download the tarball and unpack it in any directory. Before doing so, install the mono-runtime, libmono2.0-cil, and libmono-winforms2.0-cil packages. For more information about installation, see Google Code .
Whereas Windows uses Microsoft's Volume Shadow Copy service to back up open or locked files, Linux supports snapshots via the Logical Volume Manager. Using the snapshot method on Linux means having tools for LVM in place and additional administrative privileges. A copy-on-write procedure is used.
A redirect-on-write permanently reroutes all changes to the snapshot, but copy-on-write keeps back the changes in the metadata until the original data has been fully transferred to the snapshot. When reading a snapshot, the operating system can first determine whether the part to be read already exists and then use it. Thus, administrative privileges are essential for scenarios involving snapshots.
You can control the way snapshots are handled with the
--open-file-policy options; the former is disabled by default, because the function needs the above-mentioned, far-reaching rights and preconditions. Additionally, the
required parameters are available for this option. The
on parameter tries to use an existing snapshot. In the absence of such, the program outputs an extensive warning for any files that it cannot back up. The
auto option does the same thing but without a warning.
If a snapshot exists, Duplicati ignores the
--open-file-policy, if set. It is only used if no snapshot exists and the snapshot policy is not set to
--open-file-policy lets you specify how the program handles open files:
ignore basically excludes locked files from the backup. The default setting is
snapshot, which gets a copy of the file as-is and a warning in the log if the file changes during the backup.
copy tells Duplicati to try and make a copy of the open file before backing up if the lock mechanisms allow it. Because this action only works with a full copy, problems occur with very large files. If a snapshot exists, Duplicati ignores
--open-file-policy and uses the snapshot.
The Linux version of Duplicati also backs up various metadata typical of Unix-style operating systems. You can configure three different types of behavior for handling symlinks with the
For example, if you choose
store, the tool only saves the symlink. Using
follow, on the other hand, adds the linked files to the backup, whereas
ignore does exactly that to the symbolic link. Duplicati version 1.34 saves only the timestamp of metadata such as pipes or FIFOs.
To invoke the wizard, type
Duplicati at the command line, or, after installing via the package sources, you can start the program via the KDE menu or the Gnome/Unity dash, depending on your distribution. If you want to benefit from the advanced options for backing up open or locked files via LVM snapshots, you will need to launch Duplicati with administrative privileges.
The first time you use the program, the wizard asks whether you want to Setup a new backup, Restore files from a backup, or Restore settings from a previous Duplicati installation (Figure 1). You then enter a name for the backup and optionally assign the backup to a group that you can create, if needed, by pressing the Add a Group button.
When you select the data to be backed up, the wizard suggests a couple of typical locations on the filesystem: My Documents, Files on the desktop, and My Pictures. You can customize this choice by selecting Custom folder list.
If you choose to Include application settings, you'll find only the
$HOME/.config/ directory. The Custom folder list contains a button labeled … that lets you add more directories. After adding a directory, another line appears in the wizard, which you can use to select other directories in the same way. You can use the trash can icon to the right to delete them again.
This approach, however, does not give you access to hidden files. Moreover, you cannot exclude subdirectories or files in this step. That is only possible later after enabling Advanced settings.
The next step is all about encryption: If you uncheck the box for Protect the backups with this password, which is checked by default, Duplicati does not encrypt the backed up data. Otherwise, under Encryption method, you can choose between the integrated AES-256 encryption, built-in and GNU Privacy Guard, external (Figure 2).
After confirming the password, you can move on to select the desired back end in the next step. Besides the services mentioned previously, you can choose some pretty straightforward functions here, such as FTP, SSH, or file-based. The parameters needed in each case depend on the method you choose.
For the Google back end, you just need the account information and the name of a new or existing Google Docs Collection; however, you can Create Collection to set up a new one if needed. The fact that the service is now called Google Drive has gone unnoticed by this interface, but this does not detract from the function.
The back end for Amazon S3, on the other hand, operates on the basis of buckets. Buckets simplify the process of distributing and optimizing the stored data for Amazon. A bucket is a kind of folder that is given a unique URL. For Amazon S3, you need an AWS Access ID, along with the associated key and region code (Figure 3). You can create a bucket in your AWS console if you have an account. Access IDs and passwords for individual user accounts are set in AWS Identity and Access Management. Note that the dialogs designed for Amazon can also be used to accommodate credentials for other bucket-oriented cloud services, such as Host Europe or Dunkel.
In the next step, Advanced Settings, for each option you enable, the wizard treats you to an additional dialog. If you enable all six options (Figure 4), you then need to define three things: when the software should create a backup, the interval for removing old backups, and whether to take limits, such as the volume or bandwidth, into consideration. If these options are not sufficient, you can set others manually, such as with the parameters described for
After you select your options, the wizard schedules your backups and implements your choice of strategy. By default, Duplicati backs up every day at 1:00pm, always creates a full backup the first time out and incremental backups thereafter, and then creates another full backup every month. Additionally, you have the options Always perform an incremental backup, never full and Always perform a full backup, never incremental. However, not even the program itself recommends these variants.
The options for deleting in the next step help save space. You can define, among other things, how many backups the program keeps (default=4). Alternatively, you can specify a date at which the software removes backups. Another option lets you tell Duplicati to ignore timestamps for incremental backups.
If so desired, the wizard will help you comply with limits for uploading and downloading to and from the chosen cloud service. For example, if you want to avoid exceeding the quotas imposed in the free versions of Amazon S3 (5GB) or Google Drive (15GB), you can set a volume size limit.
Buy this article as PDF
Customers can take a free test drive of SLES for HPC on the Azure Cloud
San Francisco-based chip company announces their first fully open source chip platform.
The whole distro gets rebuilt on glibc 2.3
Ubuntu Vendor tries to solve app packaging and distribution problem across distributions.
Founder of ownCloud launches the Nextcloud project.
Will The Machine change the way future programmers think about memory?
The new Torus distributed storage system is available under an open source license on GitHub
Juries decides Google’s use of Java APIs Was Fair Use
But if you are not using the latest Linux kernel, your system is insecure.
Home routers will give room for custom firmware but still comply with FCC rules