Search more efficiently with ugrep
Filters
Ugrep tries to determine the type of an examined file based on the data it contains, the file name extension, and the signature (the "magic byte"). In this way, the search can be specially prepared for certain file types (i.e., filtered).
Here the filter extracts the text components from the data streams. These filters execute a command, a script, or a specific function, with pipes if necessary. They are prepended to the search process via the --filter=<Filter>
or --filter-magic-label=<Label>:<MagicByte>
option.
In the form --filter=<filter>
, the <filter>
consists of an expression of the form <Ext>:<command line>
. <Ext>
is a comma-separated list of file name extensions for which you want the filter to apply, such as .doc,.docx,.xls
. The *
character is a special case that acts on all files, especially those for which there are no other filters.
The <command>
line must be constructed to read input via the standard input channel and write the results to the standard output channel. Typical commands include cat
(pass everything) and head
(pass the first lines of text), but tools like exiftool
(extract and pass metadata) or pdftotext
(extract text from PDFs) can also be included this way. Some commands, like pdftotext
, require options to work correctly – in this case pdftotext % -
. You then need to quote spaces in the command lines to protect them:
--filter='pdf:pdftotext % -'
The --filter-magic-label=<Label>:<Magic>
option lets you extend the filtering mechanism to data streams that ugrep then classifies by reference to the magic byte. Details can be found in the man page.
Multiple filters can be specified as comma-separated lists. A combined definition for PDF and Office documents might look like the one shown in Listing 3.
Listing 3
Combined Filter Definition
--filter="pdf:pdftotext % -,odt,doc,docx,rtf,xls,xlsx,ppt,pptx:soffice --headless --cat %"
Conclusions
Ugrep belongs on every computer. It replaces and complements the standard commands quite excellently, and anyone who has to deal with text searches should familiarize themselves with it. The incremental search alone is so useful that it more than justifies the minimal training time.
Infos
« Previous 1 2
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
![Learn More](https://www.linux-magazine.com/var/linux_magazin/storage/images/media/linux-magazine-eng-us/images/misc/learn-more/834592-1-eng-US/Learn-More_medium.png)
News
-
NVIDIA Released Driver for Upcoming NVIDIA 560 GPU for Linux
Not only has NVIDIA released the driver for its upcoming CPU series, it's the first release that defaults to using open-source GPU kernel modules.
-
OpenMandriva Lx 24.07 Released
If you’re into rolling release Linux distributions, OpenMandriva ROME has a new snapshot with a new kernel.
-
Kernel 6.10 Available for General Usage
Linus Torvalds has released the 6.10 kernel and it includes significant performance increases for Intel Core hybrid systems and more.
-
TUXEDO Computers Releases InfinityBook Pro 14 Gen9 Laptop
Sporting either AMD or Intel CPUs, the TUXEDO InfinityBook Pro 14 is an extremely compact, lightweight, sturdy powerhouse.
-
Google Extends Support for Linux Kernels Used for Android
Because the LTS Linux kernel releases are so important to Android, Google has decided to extend the support period beyond that offered by the kernel development team.
-
Linux Mint 22 Stable Delayed
If you're anxious about getting your hands on the stable release of Linux Mint 22, it looks as if you're going to have to wait a bit longer.
-
Nitrux 3.5.1 Available for Install
The latest version of the immutable, systemd-free distribution includes an updated kernel and NVIDIA driver.
-
Debian 12.6 Released with Plenty of Bug Fixes and Updates
The sixth update to Debian "Bookworm" is all about security mitigations and making adjustments for some "serious problems."
-
Canonical Offers 12-Year LTS for Open Source Docker Images
Canonical is expanding its LTS offering to reach beyond the DEB packages with a new distro-less Docker image.
-
Plasma Desktop 6.1 Released with Several Enhancements
If you're a fan of Plasma Desktop, you should be excited about this new point release.