Process structured text files with Miller

Ordering and Sorting

You can use the Miller commands group-by, group-like, head, join, rename, reorder, sample, sort, tail, and uniq to sort and group your data.

The example from Figure 7 demonstrates the difference between the commands group by and join (Listing 3, lines 1 and 2). group by groups identical lines; join returns only matching lines. With join, you need to specify the -u option in order to handle arbitrarily sorted files.

Figure 7: The group by and join commands let you summarize lines with the same content.

Listing 3

group by and join

$ mlr --csv --rs lf group-by Name csv3.txt csv4.txt
$ mlr --csv --rs lf join -u -j Name -f csv3.txt csv4.txt
$ mlr --csv --rs lf head -n 1 csv3.txt
$ mlr --csv --rs lf tail -n 1 csv3.txt

Keep in mind that both methods require you to specify the columns on which the output is based. Miller always outputs the headers. You need to know that the headers are present for statements that reference specific line numbers, such as the head and tail commands in the last two lines of Listing 3, the result of which appears in Figure 8. If the headers interfere with downstream processing, you might be better off with using the classic Unix head and tail commands instead of Miller.

Figure 8: Miller always outputs the headers.

If you using rename to change the name of the column, you will need to enter the identifier in the following form: oldname,newname. Renaming works for several identifiers at the same time, but sometimes the command looks confusing, because you need to specify the names without breaks, separated only by commas.

Make sure the number of old and new column identifiers gives you an even number – if so, you have not forgotten anything. The command does not change the field names in the file header; the new names only appear in the output. The first line of Listing 4 shows an example of renaming, where the --opprint option gives the output a cleaner appearance (Figure 9).

Listing 4

Renaming

$ mlr --csv --rs lf --opprint rename name,last_name,amount,annual_charge csv3.txt
$ mlr --csv --rs lf reorder -f amount,name,first_name csv3.txt
$ mlr --csv --rs lf sort -f name,first_name csv3.txt csv4.txt
$ mlr --csv --rs lf --opprint uniq -c -g name,first_name,amount csv3.txt csv4.txt

Figure 9: If necessary, you can change the names of the columns and their order. In this case, the --opprint option provides a clearer overview.

To change the column position, use the reorder command. If you specify all existing column names (Listing 4, line 2), the output occurs in the desired order. Miller otherwise puts the changes at the end or beginning.

Using the sort and uniq commands, you can influence the sequence of records or lines. Whereas sort and group-by output equivalent lines in succession, uniq summarizes them (Listing 4, lines 3 and 4).

For the example with uniq, I used the -c option, which gives you a frequency count of the identical rows (Figure 10).

Figure 10: If you need the records in a particular order, without duplicates, use the Miller commands sort and uniq.

Miller Computes

For calculations and statistical analyses, Miller offers the commands stats1, step, and top. You can use stats1 to perform statistical evaluations. The commands in Listing 5 determine the total, the number, the average, the minimum, and the maximum of the amount column; Figure 11 shows the lines of code with the results.

Listing 5

Statistics

$ mlr --csv --rs lf stats1 -a sum -f amount csv3.txt csv4.txt
$ mlr --csv --rs lf stats1 -a count -f amount csv3.txt csv4.txt
$ mlr --csv --rs lf stats1 -a mean -f amount csv3.txt csv4.txt
$ mlr --csv --rs lf stats1 -a min -f amount csv3.txt csv4.txt
$ mlr --csv --rs lf stats1 -a max -f amount csv3.txt csv4.txt

Figure 11: Miller can count and perform simple statistical calculations.

Using step gives you evaluations for each line. The accrued sum per line is given by rsum. You can output the difference between two consecutive lines with delta and the ratio with ratio.

counter gives you a running count of the lines. from-first outputs the difference between the first and the nth line. For examples of such commands, see Listing 6 and Figure 12.

Listing 6

More Stat Tricks

$ mlr --csv --rs lf --opprint step -a rsum -f amount csv3.txt csv4.txt
$ mlr --csv --rs lf --opprint step -a delta -f amount csv3.txt csv4.txt
$ mlr --csv --rs lf --opprint step -a ratio -f amount csv3.txt csv4.txt
$ mlr --csv --rs lf --opprint step -a counter -f amount csv3.txt csv4.txt
$ mlr --csv --rs lf --opprint step -a from-first -f amount csv3.txt csv4.txt

Figure 12: Using Miller operators for linewise calculations.

top lets you determine the highest numeric value in a column. To output the full line with the result, use the -a option. --min gives you the minimum instead. If you want to output several lines by value, use -n lines (Listing 7); Figure 13 shows the process flow.

Listing 7

High and Low

$ mlr --csv --rs lf top -f amount csv3.txt csv4.txt
$ mlr --csv --rs lf top -a -f amount csv3.txt csv4.txt
$ mlr --csv --rs lf top --min -f amount csv3.txt csv4.txt
$ mlr --csv --rs lf --opprint top -n 3 -a -f amount csv3.txt csv4.txt@LI:§§nonumber

Figure 13: Output the highest or lowest value in a column.

Descriptive Data

Use the bar command to visualize numeric values. The options --connect (start display), --hi (maximum value), and -w (number of positions in display) make sure the results fit on the screen or paper (Listing 8). Figure 14 shows examples of simple charts.

Listing 8

Graphing with bar

$ mlr --csv --rs lf --opprint bar --lo 0 --hi 100 -f amount csv3.txt
$ mlr --csv --rs lf --opprint bar -c "*" --lo 20 --hi 50 -w 50 -f amount csv3.txt csv4.txt

Figure 14: The bar command creates simple bar charts from the data.

Depending on the application, you can output the column either as a graphic with bar or with its numeric value, but not both. To get around this problem, use the data twice with different names for the columns.

Listing 9 shows an example that determines the usage of the folders in the home directory and outputs the output file usage.txt twice, once in the usage(MB) column, and additionally in the value column. A different separator is also used for the fields. Then, the script uses Miller to evaluate the resulting file. An example of the output file is shown in Listing 10.

Listing 10

Output of Disk Usage

VZ-Name:usage(MB):value
jefe:108:108
tmp:1:1

Listing 9

Disk Usage

#! /bin/sh
# Determine disk usage...
echo "VZ-Name:usage(MB):value" > /home/tmp/belegung.txt "
cd /home
for i in $(ls -1 /home); do
  # ... and save as CSV
  k=$(du -shm $i | cut -f1)
  echo $i:$k:$k >> /home/tmp/usage.txt
done
# Format output
mlr --csv --ifs : --rs lf --opprint bar -c "*" --lo 0 --hi 500 -w 50 -f value /home/tmp/usage.txt

« Previous 1 2 3 Next »

Buy this article as PDF

Express-Checkout as PDF

Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES

Print Issues

Digital Issues

SUBSCRIPTIONS

Print Subs

Digisubs

TABLET & SMARTPHONE APPS

US / Canada

UK / Australia

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News

TUXEDO Computers Unveils Linux Laptop Featuring AMD Ryzen CPU

Games , Hardware , laptop , Linux

This latest release is the first laptop to include the new CPU from Ryzen and Linux preinstalled.
XZ Gets the All-Clear

Arch Linux , Fedora , Linux , open source , Security , Ubuntu

The back door xz vulnerability has been officially reverted for Fedora 40 and versions 38 and 39 were never affected.
Canonical Collaborates with Qualcomm on New Venture

Artificial Inte... , Linux , open source , Security , Ubuntu

This new joint effort is geared toward bringing Ubuntu and Ubuntu Core to Qualcomm-powered devices.
Kodi 21.0 Open-Source Entertainment Hub Released

audio , Multimedia , Music , open source , streaming video , Video

After a year of development, the award-winning Kodi cross-platform, media center software is now available with many new additions and improvements.
Linux Usage Increases in Two Key Areas

Games , Linux , open source , Steam

If market share is your thing, you'll be happy to know that Linux is on the rise in two areas that, if they keep climbing, could have serious meaning for Linux's future.
Vulnerability Discovered in xz Libraries

Fedora , Linux , malware , Security

An urgent alert for Fedora 40 has been posted and users should pay attention.
Canonical Bumps LTS Support to 12 years

Linux , open source , Operating Systems , Ubuntu

If you're worried that your Ubuntu LTS release won't be supported long enough to last, Canonical has a surprise for you in the form of 12 years of security coverage.
Fedora 40 Beta Released Soon

Fedora , Gnome , open source , Plasma , Wayland

With the official release of Fedora 40 coming in April, it's almost time to download the beta and see what's new.
New Pentesting Distribution to Compete with Kali Linux

Linux , open source , Tools , Ubuntu

SnoopGod is now available for your testing needs
Juno Computers Launches Another Linux Laptop

Hardware , laptop , Linux , Ubuntu

If you're looking for a powerhouse laptop that runs Ubuntu, the Juno Computers Neptune 17 v6 should be on your radar.

Process structured text files with Miller

Ordering and Sorting

Miller Computes

Descriptive Data

Buy this article as PDF

Buy Linux Magazine

Related content

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

News

TUXEDO Computers Unveils Linux Laptop Featuring AMD Ryzen CPU

XZ Gets the All-Clear

Canonical Collaborates with Qualcomm on New Venture

Kodi 21.0 Open-Source Entertainment Hub Released

Linux Usage Increases in Two Key Areas

Vulnerability Discovered in xz Libraries

Canonical Bumps LTS Support to 12 years

Fedora 40 Beta Released Soon

New Pentesting Distribution to Compete with Kali Linux

Juno Computers Launches Another Linux Laptop

Process structured text files with Miller

Ordering and Sorting

Miller Computes

Descriptive Data

Buy this article as PDF

Buy Linux Magazine

Related content

Subscribe to our Linux Newsletters Find Linux and Open Source Jobs Subscribe to our ADMIN Newsletters

Support Our Work

News

Tag Cloud

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters