Useful tricks for processing CSV files
csvquote
All the tricks shown so far rely on the seldom mentioned but vital assumption that all the CSV files you need to process are clean, with no quoted fields that contain commas, newlines, or any other separators. Luckily, as it often happens in the open source world, there is a little-known tool that's made to order to solve this problem, called csvquote [4]. It is not available as a Linux package, but its installation is deliciously simple: Download the source code, unpack it, and then type these two commands at the prompt, just like the Linux users of yore did in the 1990s:
> make $> sudo make install
This will install the csvquote program as well as csvheader, which is a convenient script that prints out the field numbers next to the first row of data.
Using csvquote is as simple as it can be: Just put it before and after all the other tools that will do the actual processing, like this:
> csvquote input.csv | script1 | script2|... | csvquote -u > output.csv
The first call hides any troublesome characters hiding in the input file by "temporarily replacing the special characters inside quoted fields with harmless nonprinting characters that can be processed as data by regular text tools" [4].
The run at the end of the processing pipeline, with the -u option, restores all those nonprinting characters to their previous values. By default, csvquote assumes that fields are enclosed in double quotes and are separated by commas, and records are separated by the standard new line character (\n). However, it is possible to process files with different separators with the -d, -q, and -r switches.
While csvquote cannot handle multi-character delimiters, I have yet to encounter CSV files using these characters (fingers crossed). Also, by its very nature, csvquote makes it impossible for the other tools to which it is connected to replace characters as commas and newlines inside quoted fields. The workaround is to search and replace the special character csvquote puts in their place (see [4] for these special characters).
Conclusion
CSV files will never be usable as real, heavy-load databases, but they are so common and easy to process that they are impossible to ignore. The tricks I've collected here are just a small selection of the many more you can find, as I did, on online forums. (See also the "Missing: CSVfix" box). Try these tricks, and you'll be happy!
Missing: CSVfix
The main problem with CSV files is that it takes only one missing or extra separator or quote to make all the tricks mentioned in these pages fail. At that point, you are forced to visually inspect the file and fix the problem manually.
This wasn't always the case. More than 10 years ago, you could repair many of these problems (as well as lots of other stuff) with CSVfix [5]. Unfortunately, CSVfix was orphaned about a decade ago, leaving only the documentation and a bundle of source code, which, as far as I understand, would need significant work to run again on modern distributions.
This is a shame, because CSVfix could, among many other things:
- Reorder, remove, split, and merge fields
- Convert case
- Trim leading and trailing spaces
- Search for specific content using regular expressions
- Filter out duplicate data or data on exclusion lists
- Split large CSV files into smaller files based on field contents
- Perform arithmetic calculations on individual fields
It would be wonderful if this tutorial encouraged a developer to take up the maintenance of this project.
Infos
- "Tracking Your Finances with Plain Text Accounting" by Marco Fioretti, Linux Magazine, issue 285, August 2024, p. 38, https://www.linux-magazine.com/Issues/2024/285/hledger
- Formulas in CSV files: https://mzonline.com/blog/2020-08/formulas-csv-files
- csvtool: https://github.com/maroofi/csvtool
- csvquote: https://github.com/dbro/csvquote
- CSVfix: https://wlbr.de/csvfix/
« Previous 1 2
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Zorin OS 18 Hits over a Million Downloads
If you doubt Linux isn't gaining popularity, you only have to look at Zorin OS's download numbers.
-
TUXEDO Computers Scraps Snapdragon X1E-Based Laptop
Due to issues with a Snapdragon CPU, TUXEDO Computers has cancelled its plans to release a laptop based on this elite hardware.
-
Debian Unleashes Debian Libre Live
Debian Libre Live keeps your machine free of proprietary software.
-
Valve Announces Pending Release of Steam Machine
Shout it to the heavens: Steam Machine, powered by Linux, is set to arrive in 2026.
-
Happy Birthday, ADMIN Magazine!
ADMIN is celebrating its 15th anniversary with issue #90.
-
Another Linux Malware Discovered
Russian hackers use Hyper-V to hide malware within Linux virtual machines.
-
TUXEDO Computers Announces a New InfinityBook
TUXEDO Computers is at it again with a new InfinityBook that will meet your professional and gaming needs.
-
SUSE Dives into the Agentic AI Pool
SUSE becomes the first open source company to adopt agentic AI with SUSE Enterprise Linux 16.
-
Linux Now Runs Most Windows Games
The latest data shows that nearly 90 percent of Windows games can be played on Linux.
-
Fedora 43 Has Finally Landed
The Fedora Linux developers have announced their latest release, Fedora 43.

