Cloud backup with MCrypt and S3cmd
Pay for What You Use
Because you have given up your sacred AWS access credentials, you also have the ability to control other AWS services and, in the case of S3cmd, that includes the superb "pay for what you use" service called CloudFront [5]. If you're unfamiliar with Content Delivery Networks (CDNs), CloudFront is undeniably both powerful and affordable. You can use S3cmd to query, create, and modify several of CloudFront's functions.
I'll take an extremely brief look at AWS's widely used CDN integration with S3cmd. Each instance of a CDN configuration assigned to your account is referred to as a "distribution" (probably because it efficiently distributes your data around the globe). You can display a list of your configured distributions with the command:
# s3cmd cflist
If you want a bit more detail about the configured parameters for all your distributions, use:
# s3cmd cfinfo
You can also query a specific distribution and its parameters by referencing its distributionID
as follows (adding your own ID accordingly):
# s3cmd cfinfo cf://<distributionID>
You can also use cfcreate
, cfdelete
, and cfmodify
to create, delete, and change a distribution without messing about with web interfaces.
Please, Sir, I Want Some More
The list of S3cmd features is really comprehensive for such a diminutive utility. The list includes a --force
overwrite option, which should be used with great care, and a very useful --dry-run
flag, which lets you display the files that will be uploaded and downloaded – without it actually happening. If you're ever worried about breaking things horribly by getting a regex entry incorrect, you will appreciate the --dry-run
feature.
The useful --continue
option only works with downloads, but it should, in theory at least, resume a partially downloaded file so you don't have to bother starting a big download again from scratch. It's fair to say that HTTP resumes have been around for a while and this capability is not a new, earth-shattering feature, but it is still a nice and well-needed touch.
Stop me if I have already mentioned the -r
parameter (otherwise known as --recursive
), which works on uploads, downloads, and deletions if you want to affect subdirectories. Use this with caution if you want to avoid incurring massive data transfer fees or accidental deletions.
Reduced Cost
If you are storing large amounts of data, you might be concerned about minimizing expenses. Apparently (and please be warned that specifications, configurations, and procedures change frequently with emerging technologies, so don't take this as gospel), the standard repository for an uploaded Amazon S3 file spans three data centers. In other words, your file is copied three times across three geographically disparate buildings.
If you only use two data centers, you can use "Reduced Redundancy," and AWS will lower its storage fees. The theory is that these files won't be as critical to you; therefore, you might tolerate losing one or two. Perhaps you will have local backups available or maybe the files have a limited shelf life.
The -rr
switch, or --reduced-redundancy
in longhand, allows you to instruct the clever S3cmd that you need to watch your costs and only use two data centers for storage.
Buy this article as PDF
(incl. VAT)