Search more efficiently with ugrep
Filters
Ugrep tries to determine the type of an examined file based on the data it contains, the file name extension, and the signature (the "magic byte"). In this way, the search can be specially prepared for certain file types (i.e., filtered).
Here the filter extracts the text components from the data streams. These filters execute a command, a script, or a specific function, with pipes if necessary. They are prepended to the search process via the --filter=<Filter>
or --filter-magic-label=<Label>:<MagicByte>
option.
In the form --filter=<filter>
, the <filter>
consists of an expression of the form <Ext>:<command line>
. <Ext>
is a comma-separated list of file name extensions for which you want the filter to apply, such as .doc,.docx,.xls
. The *
character is a special case that acts on all files, especially those for which there are no other filters.
The <command>
line must be constructed to read input via the standard input channel and write the results to the standard output channel. Typical commands include cat
(pass everything) and head
(pass the first lines of text), but tools like exiftool
(extract and pass metadata) or pdftotext
(extract text from PDFs) can also be included this way. Some commands, like pdftotext
, require options to work correctly – in this case pdftotext % -
. You then need to quote spaces in the command lines to protect them:
--filter='pdf:pdftotext % -'
The --filter-magic-label=<Label>:<Magic>
option lets you extend the filtering mechanism to data streams that ugrep then classifies by reference to the magic byte. Details can be found in the man page.
Multiple filters can be specified as comma-separated lists. A combined definition for PDF and Office documents might look like the one shown in Listing 3.
Listing 3
Combined Filter Definition
--filter="pdf:pdftotext % -,odt,doc,docx,rtf,xls,xlsx,ppt,pptx:soffice --headless --cat %"
Conclusions
Ugrep belongs on every computer. It replaces and complements the standard commands quite excellently, and anyone who has to deal with text searches should familiarize themselves with it. The incremental search alone is so useful that it more than justifies the minimal training time.
Infos
« Previous 1 2
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
So Long Neofetch and Thanks for the Info
Today is a day that every Linux user who enjoys bragging about their system(s) will mourn, as Neofetch has come to an end.
-
Ubuntu 24.04 Comes with a “Flaw"
If you're thinking you might want to upgrade from your current Ubuntu release to the latest, there's something you might want to consider before doing so.
-
Canonical Releases Ubuntu 24.04
After a brief pause because of the XZ vulnerability, Ubuntu 24.04 is now available for install.
-
Linux Servers Targeted by Akira Ransomware
A group of bad actors who have already extorted $42 million have their sights set on the Linux platform.
-
TUXEDO Computers Unveils Linux Laptop Featuring AMD Ryzen CPU
This latest release is the first laptop to include the new CPU from Ryzen and Linux preinstalled.
-
XZ Gets the All-Clear
The back door xz vulnerability has been officially reverted for Fedora 40 and versions 38 and 39 were never affected.
-
Canonical Collaborates with Qualcomm on New Venture
This new joint effort is geared toward bringing Ubuntu and Ubuntu Core to Qualcomm-powered devices.
-
Kodi 21.0 Open-Source Entertainment Hub Released
After a year of development, the award-winning Kodi cross-platform, media center software is now available with many new additions and improvements.
-
Linux Usage Increases in Two Key Areas
If market share is your thing, you'll be happy to know that Linux is on the rise in two areas that, if they keep climbing, could have serious meaning for Linux's future.
-
Vulnerability Discovered in xz Libraries
An urgent alert for Fedora 40 has been posted and users should pay attention.