Easy steps for optimizing shell scripts
Computation
You have several different options for calculating the sum of numbers that a text file contains. In the example in Listing 14, every line containing the fiftieth
string is interesting. The script evaluates a file that contains one million lines. Every fiftieth
line contains the string.
Listing 14
Looking for a String
#!/bin/bash typeset -i sum=0 while read line; do set $line # Totaling field 6 sum=$((sum+$6)) # Parse output from grep done < <(grep " fiftieth " largefile) echo "Sum total: $sum"
Here, too, you can use Awk as a tool for quick summation (Listing 15). But Awk does not work directly in machine language. For this reason, it makes sense to pass the search for the character string to the grep
command and then add the lines found using Awk (Listing 16).
Listing 15
Faster with Awk
# time awk '/ fiftieth / {sum += $6} END {print "Sum total:", sum}' largefile
Listing 16
Fastest: Awk with grep
# time awk '{sum += $6} END {print "Sum total:", sum}' < <(grep " fiftieth " largefile)
In this example, too, optimization achieved significant speed gains. The variant from Listing 15 reduces the runtime to about one third; the variant from Listing 16 runs twenty times faster than the first alternative (see Table 3).
Table 3
Awk Timer
Category | |||
---|---|---|---|
real |
0m4.471s |
0m1.408s |
0m0.231s |
user |
0m2.374s |
0m1.348s |
0m0.050s |
sys |
0m1.956s |
0m0.013s |
0m0.010s |
Conclusions
The examples in this article show that you can drastically increase the speed of your scripts by skillfully using multifunctional tools such as Awk, Python, or Perl and by avoiding complex constructs with Tr, Sed, or Grep: You will thus consistently avoid many context changes.
However, not every anonymous pipe is detrimental to throughput, as the last example shows. Instead, it is more important to use the strengths of the various tools and keep the number of subprocesses as low as possible.
It is also important to remember that you can improve the readability and thus the maintainability of the scripts by doing without complex chains of commands.
« Previous 1 2
Buy this article as PDF
(incl. VAT)