Prepare calculations and chart results with Bash Math, Shell Style

Database Backup Example

MySQL or MariaDB relational databases are very common inside Linux servers or desktops. Their content can be saved in plain text files, whose content can then be used for any calculation necessary. Of course, the first question that comes to mind is "why not perform those calculations inside the database? Aren't database engines optimized to do just that?"

The answer to this question is yes, relational databases are much faster than Bash, but they can only process their own content! If you need to do anything more than that with your files (from comparing the numbers inside the database with those inside some other file to using the same numbers to decide what to do next), you must do it outside the database. Bash is an ideal environment for such tasks. There are two other advantages of working on plain text backups instead of the actual tables in the live database. One is that you can compare at any moment backups made at different times. Another is that, starting from the same database backup file, you can generate with Bash many partial dumps to combine their content in Bash in ways that would not be possible with any direct database queries. One application of this approach in small business would be finding all customers with unpaid balances over some threshold inside the database, in order to automatically send them reminders by email, each with the corresponding invoice attached. In the same way, a teacher could generate custom exercises for each student using research or financial databases, depending on previous results.

To produce output that is immediately usable for this kind of work, use:

mysqldump -u DBUSER --password=PASSWORD --skip-extended-insert > database-backup.sql

This is the simplest approach, because the --skip-extended-insert option dumps each database record on a separate line. If the database contained only one table called purchaseorders, composed of four columns that corresponded to the unique identifier, date, description, and amount of each purchase, the file database-backup.sql would contain lots of lines with this format:

INSERT INTO `purchaseorders` VALUES (44,'2004-10-20','stationery',1855.0000);

Knowing that the database has the above format, you may save in a separate file all the records that are about stationery purchased between 2004 and 2005 with this command:

egrep "'2004\-|2005\-" stationery-expenses-2004-2005.sql

Then extract the amounts only, as follows:

cut '-d,' -f2,4 stationery-expenses-2004-2005.sql >expense-amounts.txt

Here, I have used the cut command, setting the column delimiter to the comma ('-d,') and selecting only the second and fourth column (-f2,4) to save only dates and amounts inside the file expense-amounts.txt. Now, for example, if I wanted to calculate the average import of those stationery expenses, I would feed the second column of expense-amount to the awk program:

cat expense-amounts.txt | awk '-F,' '{sum+=$2} END { print "Average = ",sum/NR}'

The -F switch tells awk to use commas as column separators instead of spaces, which are its default. The commands inside braces mean "read the file one line at a time, adding to the sum variable the value in the second column; when the file is finished, print the string Average, followed by the value of sum divided by the number of records (NR).

While the above mix of grep, cut, and awk is not the most efficient, I wanted to show you all three tools working together to give you a good idea of their potential for massive number crunching in shell scripts. Combining these programs, you can extract, reformat, and preprocess, with very little effort on your part, lots of numeric data of any nature. The awk trick above works the same way, for example, when you want to calculate the average size of all the files in the current directory, including all its subfolders. Try it for yourself:

find . -type f -exec ls -l {} \; | awk '{sum+=$5} END { print "Average = ",sum/NR}'

Charts

While I am not sure that a picture is always worth a thousand words, a chart is surely worth much more than a huge, unreadable table full of numbers. The tool I recommend to handle such tasks from shell scripts today is the same one I used in the 90s: the venerable Gnuplot [14]. The graphs it creates are not the fanciest, but Gnuplot is very well documented and available on almost any operating system, including Linux distributions optimized for tiny servers. Besides, using Gnuplot is really simple (see Figure 1):

  1. 1 Save the numbers to plot in a plain text data file.
  2. 2 Save the plotting commands for Gnuplot in an instruction file.
  3. 3 Tell Gnuplot to execute the instruction file on the data file.
  4. Figure 1: One plain text file of data and one plain text file of commands, both created on the fly with Bash code: Gnuplot could not be simpler to use or more flexible.

Creating the data file on the fly consists of running awk commands similar to the one used in the database example or applying the techniques I presented in this series' previous installments. Concerning plotting commands, unless you have really uncommon needs, 90 percent of the time a quick online search will return commands ready to copy and paste into your script with little or no changes. The result will be spartan, but it will deliver clear charts like Figure 2 (to see how I generated this, see [15]) that your script can then embed in web pages, send by email, or use in any other way desired.

Figure 2: Gnuplot graphs are still adequate for most situations and easy to generate.

The Big Picture

If all you needed to know about "math" in Bash were its arithmetic operators, a tiny cheatsheet would suffice. Since this is not the case, this installment is different from the previous ones: This time, presenting one, relatively complex script that solves one specific real-world problem would have been too reductive. I wanted to introduce, instead, with the minimum amount of working code, approaches and tools suitable for any kind of number-related processing and show how they fit together (see Figure 3).

Figure 3: Bash makes it easy to collect raw data from any conceivable source and pass the data to the right tools to make your computer do all your required numeric processing.

There are two take-away lessons from this installment. First, it is easy to insert single formulas in Bash scripts, using bc if necessary. Second, very often a Bash script is the most efficient way to run or prepare calculations on large quantities of numbers of any nature, from any source (including the Internet). The real power of Gnuplot, awk, bc, and so on is in how you combine them in the same shell script. Give it a try!

Infos

  1. "Tutorials – Custom Shell Scripts" by Marco Fioretti, Linux Magazine, issue 219, February 2019, pp. 84-88
  2. "Tutorials – Complex Containers" by Marco Fioretti, Linux Magazine, issue 220, March 2019, pp. 84-89
  3. "Tutorials – Shell Flow Control" by Marco Fioretti, Linux Magazine, issue 221, April 2019, pp. 86-91
  4. "Tutorials – Shell Test Conditions and Exit Codes" by Marco Fioretti, Linux Magazine, issue 222, May 2019, pp. 84-88
  5. Genius: http://www.jirka.org/genius.html
  6. Graph: http://www.padowan.dk
  7. bc man page: https://linux.die.net/man/1/bc
  8. Handling time differences: https://stackoverflow.com/questions/4946785/how-to-find-the-difference-in-days-between-two-dates
  9. "Bash Scripting – Arithmetic, Logical, Relational and Bitwise Operators": http://www.yourownlinux.com/2016/12/bash-arithmetic-logical-relational-bitwise-operators.html
  10. "Geographical Distance Between Long and Lat in Bash": http://www.unix.com/shell-programming-and-scripting/250356-geographical-distance-between-long-lat-bash.html
  11. "Calculate Distance and Azimuth": http://www.unix.com/shell-programming-and-scripting/129989-calculate-distance-azimuth.html
  12. "Work the Shell – Calculating the Distance between Two Latitude/Longitude Points" by Dave Taylor, Linux Journal, December 1, 2009: http://www.linuxjournal.com/magazine/work-shell-calculating-distance-between-two-latitudelongitude-points
  13. R: http://www.r-project.org/
  14. Gnuplot: http://www.gnuplot.info
  15. "How to Create Stacked Area Graphs with Gnuplot": http://freesoftware.zona-m.net/how-to-create-stacked-area-graphs-with-gnuplot/

The Author

Marco Fioretti (http://mfioretti.com) is a freelance author, trainer, and researcher based in Rome, Italy. He has been working with free/open source software since 1995 and on open digital standards since 2005. Marco also is a Board Member of the Free Knowledge Institute (http://freeknowledge.eu).

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Tutorial – Shell Scripting

    You do not need to learn low-level programming languages to become a real Linux power user. Shell scripting is all you need.

  • Bash Alternatives

    Don't let your familiarity with the Bash shell stop you from exploring other options. We take a look at a pair of alternatives that are easy to install and easy to use: Zsh and fish.

  • Tutorials – Shell Scripts

    Letting your scripts ask complex questions and give user feedback makes them more effective.

  • Bash scripting

    A few scripting tricks will help you save time by automating common tasks.

  • Binary Data in Bash

    Bash is known for admin utilities and text manipulation tools, but the venerable command shell included with most Linux systems also has some powerful commands for manipulating binary data.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News