cURLing up to file transfers

Gone Curling

© Lead Image © Oksana Kunka, 123RF.com

© Lead Image © Oksana Kunka, 123RF.com

Article from Issue 193/2016
Author(s):

cURL is a powerful file transfer tool that is simple to learn but can get you in trouble if you don't keep track of your options.

First released in 1997, cURL (pronounced "see URL" or "curl") [1] is a relatively new tool for the command line. However, it quickly became a standard command-line tool as the need to transfer data between servers or from an Internet site grew. Moreover, copying a directory has become a standard way to install scripts written in interpretive programming languages like Python. Today, cURL is now included in the default installation of most Linux distributions. Functionally, it resembles wget [2], although the two commands differ in syntax.

The basic structure of curl is simple, but can quickly become complex as you add options for the many transfer protocols it supports. Even the command's man page notes that "the number of features will make your head spin!" And, with cURL's proxy support, user authentication, FTP upload, HTTP POST, SSL connections, cookies, and file transfer resume, the complexity of its options can quickly become apparent.

Fortunately, however, many of cURL's options are designed for using specific protocols in specific circumstances and are often unnecessary. Moreover, some of the supported transfer protocols, like GOPHER and TELNET, are frankly obsolete, and many users are likely to use, if not the default HTTP, then HTTPS or SSH, with forays into FTP and one or two others.

Downloading and Uploading

Like any command that involves copying, cURL requires a data source and a target – standard output, if no target is specified, or a specifically named file. As the name of the command implies, the data source is a URL – a protocol that starts with the name of the protocol, followed by an address.

If you do not specify a protocol, cURL will try to guess a suitable one. Consequently, when you use a public Internet site, which almost certainly uses HTTP or HTTPS, you can probably enter an abbreviated form of the URL (Figure 1); for example:

curl designingwithlibreoffice.com
Figure 1: On the Internet, you can usually enter an abbreviated command and leave cURL to figure out the transfer protocol.

However, in some cases, you might need to specify a transfer protocol. That is especially likely if you want to specify the secure HTTPS protocol version of a site by entering one or the other of these commands:

curl https://designingwithlibreoffice.com
curl --request GET 'https://designingwithlibreoffice.com'

Notice the quotation marks in the structure for the --request option.

Both these commands print the web page out below the command in the terminal, which is convenient when you want to check information without downloading, although you might want to pipe the command through less for easier viewing. To save the download by specifying a target file,

curl -o /home/bb/home.html http://designingwithlibreoffice.com

would save the web page to the file home.html. If you wanted to add a directory structure, then you specify --create-dirs, and curl will create any directories that do not already exist in the output file's path. You can even use --remote-name (-O) to save to a third server or site.

Uploading uses a similar structure, except that --upload-file FILE (-T FILE) replaces -o FILE and --request PUT replaces --request GET. You could also use --data DATA (-d) with --request POST when uploading a plain text file, or --data-binary for a binary upload.

In both downloading and uploading, cURL displays a progress meter until its operation is complete. Alternately, you can use -# to display a progress bar consisting of hash tags (Figure 2). To use neither, add the -s option, but beware that the option will also mute any other messages from cURL.

Figure 2: cURL displays either a detailed progress meter or a simpler bar while operating.

cURL can move multiple files using space-separated lists. However, to save to file, you must repeat the -o or -T option, specifying a unique file name for each target. Otherwise, each file in turn will overwrite the target file specified in the first element of the list. For example:

curl -o /home/bb/home.html http://designingwithlibreoffice.com/home.html -o /home/bb/contact.html http://designingwithlibreoffice.com/contact.html

cURL also supports a limited version of regular expressions. For example,

curl http://designingwithlibreoffice.com{home, contact,review}.html

Alternatively, if targets are regularly named, you can use alphanumeric sequences of numerals or upper- or lowercase letters enclosed in square brackets:

curl http://designingwithlibreoffice.com/images[1-100].png
curl http://designingwithlibreoffice.com/images[a-z].png

If a source or target is password protected, cURL will stop operating unless you supply authentication. Simply --user USER (-u) with HTTP will make cURL prompt for a password. With other protocols, you should use the format --user USER:PASSWORD, but only if the username does not include a colon (:). In SSL/SSH, the option --pass [PHRASE] can be used to enter a passphrase. Similarly, if you need to log in for a proxy, add the option --proxy-user USER:PASSWORD (-U).

Other options control how cURL carries out a command. For instance, if you choose, you can specify the maximum file size to use and maximum time to run with --max-file-size [BYTES] and --max-time [SECONDS]. Should an operation time out, you can set the time cURL waits for it to resume with --connect-timeout [SECONDS]. If an operation has failed entirely, you can use --continue-at [BYTES] (-C) to instruct cURL to try to resume where it left off, although you should check to see if a file or two was missed around the place where the operation timed out.

When using FTP, you also have the --quote COMMAND (-Q) option, with which you can send a command once you have made a connection. Among the dozen or so commands that you send are chown, chgrp, chmod, mkdir, password, rm, and dir, all of which may help you manipulate files and continue an operation.

Transfer Protocol Options

Many of cURL's options define the version of a protocol to use. These options are mostly useful if normal use of cURL fails, such as when cURL fails to identify the protocol. Such failures are most likely when a command has multiple entries and you need to specify a change from one protocol to another.

On the Internet, you may never need most of the available options, because cURL defaults to HTTP, the most widely popular transfer protocol. The sole exception may be --location (-l), which directs HTTP- or HTTPS-based operations to follow a redirection when a URL has moved. cURL also has a --location-trusted option, which will send the name and password to a new URL, but it should not be used unless you know that it can be trusted.

SSH/SSL also has one or two options, the most useful of which is probably --pubkey KEY, which gives the path to your public encryption key. At times, the --ssl option is useful, allowing a command to continue with a less secure protocol when SSH is not supported.

However, the transfer protocol to watch most closely is FTP – and not just because it is aging and not as secure as SSH or HTTPS. If you are using FTP, my advice is to refer to the man page constantly. Often, FTP will be an exception to an option or behave somewhat differently than other protocols. If you must use FTP, then include the option --ftp-pasv, which runs the protocol in its securest mode – but remember that security is relative.

A Final Caution

You can learn cURL's basics in 20 minutes. However, as the command structure balloons, take extra time to check the syntax, especially when multiple files or operations are included in a single command.

The trouble is that cURL takes the last option used. This format, while logical, can lead to unexpected results. For example, when multiple files are listed, if you do not specify an option before each file name, a file will use whatever options the file listed before it used. In particular, this habit can easily cause a file to overwrite the files saved before it. Similarly, if you are not careful, the same format can result in cURL suddenly using a transfer protocol or behavior that you did not intend, possibly creating security and functional problems.

The best solutions are either to keep your uses of cURL simple or be as specific as possible, giving paths in full and checking the syntax before pressing the Enter key. Without one of these tactics, cURL can easily seize control and switch from a useful servant to a capricious master.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News