A download manager for the shell
Convenient Downloads
A few lines of shell code and the Gawk scripting language make downloading files off the web a breeze.
Almost everyone downloads data from the Internet. In most cases, you use the the web browser as the interface. Usually, websites offer files for download that you can click on individually in the browser. By default, the files usually end up in your Downloads/
folder. If you need to download a large number of files, manually selecting the storage location can quickly test your patience. Wouldn't it be easier if you had a tool to handle this job?
In this article, I will show you how to program a convenient download manager using Bash and a scripting language like Gawk. As an added benefit, you can add features that aren't available in similar programs, such as the ability to monitor the disk capacity or download files to the right folders based on their extensions.
Planning
Programming a download manager definitely requires good planning and a sensible strategy. Obviously, the download manager should download files, but it should also warn you when the hard disk threatens to overflow due to the download volume. Because most files have an extension (e.g., .jpg
, .iso
, or .epub
), the download manager also can sort your files into appropriately named folders based on the extensions.
To build your own download manager, you will need two command-line programs that do not come preinstalled on modern Linux distributions by default: xclip
and lynx
. The xclip
command-line tool monitors the clipboard, while lynx
functions as a web browser for the terminal. Lynx also comes with several command-line options, making it usable as a link spider that offers many link display options.
During planning, the basic thing you need to consider is how to pass the download page's URL and links into the terminal. xclip
and lynx
can help you do this. This project also includes functions that handle specific subtasks, such as capturing and selecting the links. Even if you don't know in the beginning the content of these functions, I recommend creating a script that includes empty functions as placeholders for the time being (see Listing 1).
Listing 1
Script Framework Without Functions
#!/bin/bash function capture () { :; } function splitter () { :; } function rename () { :; } function download () { :; } function menu () { :; }
The framework in Listing 1 serves as a basis for building the individual functions one by one. Focus on a single function first rather than on the big picture. For each function, you need to consider, separately, what tasks the function handles, how many and what kind of parameters the function needs, and whether there are any return values.
For convenience, you can outsource parts of the script and then include the source code using dot or source notation such as
. outsourcedFunction
By making these functions as abstract as possible and keeping them independent of the script, I was able to include a function for renaming files in other scripts without needing to modify them.
In abstract terms, a function takes either no parameters, one parameter, or multiple parameters and eventually returns something, regardless if you use the function in this script or in a completely different context.
Fundamentals
Listing 2 shows a couple of basic things you need to do for the download manager script: declare basic variables that will determine the program flow later on and store file types in an array.
Listing 2
Basic Variables
VERBOSE=true LYNX="$(which lynx)" XCLIP="$(which xclip)" download_directory=~/Downloads7 filetypes=(jpg jpeg png tiff gif bmp swf svg) filetypes+=(mp4 mp3 mpg mpeg vob m2p ts mov avi wmf asf mkv webm 3gp flv) filetypes+=(gzip zip tar gz tar.gz 7zip) filetypes+=(pdf doc xlsx odt ods epub txt mobi azw azw3) filetypes+=(iso dmg exe deb rpm) filetypes+=(java kt py sh zsh) filetypes+=($(echo ${filetypes[@]} | sed -r 's/.+/\U&/')) free=$(df /home | gawk 'NR == 2{print $4}')
Because you can download a variety of different file types off the web, it is up to you whether or not you add any file types to the array and what structure you choose. However, I recommend keeping some kind of order, for example, by putting the graphic files in one line in your script and video or other file types in another line. The last line records the available storage space in a variable.
Listing 3 contains two functions that your download manager will need to output messages if the VERBOSE
variable is set to true
. Also, notice the two if
statements that check for the presence of lynx
and xclip
. If either tool is missing, the script outputs an error message and terminates with exit 1
. If the download directory does not exist, the script creates it in the last line. If overly verbose warnings or error messages bother you, set VERBOSE
to false
.
Listing 3
Outputting Error Messages
function warn () { if ${VERBOSE}; then echo "WW: ${1}"; fi; } function err () { if ${VERBOSE}; then echo "EE: ${1}"; fi; } if [ -z ${LYNX} ]; then err "Lynx not available on the system." err "Cancel." exit 1 fi if [ -z ${XCLIP} ]; then err "Xclip not available on the system." err "Cancel." exit 1 fi [ ! -e ${download_directory} ] && mkdir -p ${download_directory}
Functions
It makes sense to follow the same approach for functions as you did for the variables. You should design functions so that they can coexist independently in this and any other scripts.
If it is not immediately apparent from a function how many parameters it takes and what it returns, you need to add a comment describing that. The capture
function takes care of capturing the links from the web page and storing them. Listing 4 first creates three arrays that the functions fill with values. The first array stores and processes the URLs.
Listing 4
Functions
01 declare -a download_links 02 declare -a indexed_downloads 03 declare -a indexed_indexes 04 05 function capture () { 06 lynx_options="-dump -listonly -nonumbers" # further potential options 07 lynx_command="lynx $lynx_options $url" # -hiddenlinks=[option], -image_links 08 grep_searchstring="http.+($(sed 's/ /|/g' <<<${filetypes[@]}))$" 09 grep_command="grep -Eoi $grep_searchstring" 10 download_links=(`$lynx_command | $grep_command`) 11 for x in ${download_links[@]}; do 12 file_size=$(wget --spider $x 2>&1 | gawk -F " " '/Length/{print $2}') 13 while true; do 14 [ -z ${indexed_downloads[$file_size]} ] && 15 indexed_downloads[$file_size]=$x && 16 break || (( file_size++ )) 17 done 18 done 19 indexed_indexes=(${!indexed_downloads[@]}) 20 }
Because I want the script to arrange the downloads by size, wget
uses the --spider
option (in line 12) to discover the size. Then the indexed_downloads
array captures the downloadable file, using the file size as the index and the name of the download itself as the value (line 15). This avoids the typical indexes (
, 1
, 2
, 3
, and so on) for the array, and instead the file size gives you, say, 233
, 1004
, 780
, and so on, which Bash prints in ascending order of size when listing all indexes. This also happens in line 19, where the indexed_indexes
array stores the file sizes.
Later, you will see that the potential downloads appear in ascending order of size. Occasionally, two files are exactly the same size. However, the while
loop in lines 13 to 17 fields this problem. To make this index suitable for the download, the script increases the displayed size by one (virtual) byte in this case.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
AlmaLinux 10.0 Beta Released
The AlmaLinux OS Foundation has announced the availability of AlmaLinux 10.0 Beta ("Purple Lion") for all supported devices with significant changes.
-
Gnome 47.2 Now Available
Gnome 47.2 is now available for general use but don't expect much in the way of newness, as this is all about improvements and bug fixes.
-
Latest Cinnamon Desktop Releases with a Bold New Look
Just in time for the holidays, the developer of the Cinnamon desktop has shipped a new release to help spice up your eggnog with new features and a new look.
-
Armbian 24.11 Released with Expanded Hardware Support
If you've been waiting for Armbian to support OrangePi 5 Max and Radxa ROCK 5B+, the wait is over.
-
SUSE Renames Several Products for Better Name Recognition
SUSE has been a very powerful player in the European market, but it knows it must branch out to gain serious traction. Will a name change do the trick?
-
ESET Discovers New Linux Malware
WolfsBane is an all-in-one malware that has hit the Linux operating system and includes a dropper, a launcher, and a backdoor.
-
New Linux Kernel Patch Allows Forcing a CPU Mitigation
Even when CPU mitigations can consume precious CPU cycles, it might not be a bad idea to allow users to enable them, even if your machine isn't vulnerable.
-
Red Hat Enterprise Linux 9.5 Released
Notify your friends, loved ones, and colleagues that the latest version of RHEL is available with plenty of enhancements.
-
Linux Sees Massive Performance Increase from a Single Line of Code
With one line of code, Intel was able to increase the performance of the Linux kernel by 4,000 percent.
-
Fedora KDE Approved as an Official Spin
If you prefer the Plasma desktop environment and the Fedora distribution, you're in luck because there's now an official spin that is listed on the same level as the Fedora Workstation edition.