DocFetcher
Bloodhound
DocFetcher is a practical local search tool that is easy to configure and use – even for large data collections.
Modern operating systems take up several Gigabytes of space just for the many application programs, and they sometimes contain up to several hundred thousand individual files. If you add your extensive music or photo collection, you can quickly lose track.
Modern desktop environments offer indexing and search applications for existing data, and the Linux environment includes several special search programs. However, many of these programs are not very intuitive, and some even expect you to install a database as a backend. In addition, many of the tools often do not support full-text searches. If you are looking for a lean, practical, and powerful search tool for your workstation, DocFetcher is a very interesting alternative.
You can download the Java application from the project page, where you will also find installation instructions [1]. As a prerequisite, you need a reasonably up-to-date Java runtime environment; DocFetcher harmonizes perfectly with the current OpenJDK environments, which you can usually install directly from your distribution's software repositories.
Unpack the downloaded ZIP archive with the DocFetcher files using a tool like Ark, File Roller, or Xarchiver. You can then move the subdirectory you created to a directory of your choice. To start the program from a desktop menu, however, you need to manually create a menu entry (see the box entitled "Installation").
Installation
Many Linux distributions do not include DocFetcher in their package sources. Ubuntu, for example, does not yet include a package for DocFetcher. It is thus often necessary to install DocFetcher manually.
Listing 1 shows how to unpack the ZIP archive downloaded from the project page into the /usr/local/bin/
directory. In Listing 2, you will find the content for /usr/share/applications/docfetcher.desktop
to help you create a matching entry in the Start menu of the desktop environment.
Adjust the version number in the commands if necessary. If you prefer a location other than /usr/local/bin/docfetcher/
, remember to change the paths appropriately. If you are still using a system without GTK3 libraries, you also need to swap DocFetcher-GTK3.sh
for DocFetcher-GTK2
.sh in the Exec
line.
Listing 1
Unzipping DocFetcher
$ unzip docfetcher-1.1.19-portable.zip $ sudo mv DocFetcher-1.1.19/ /usr/local/bin/docfetcher
Listing 2
Creating a Menu Entry
Version=1.0 Name=DocFetcher GenericName=Document Index and Search X-GNOME-FullName=DocFetcher Document Index and Search Comment=Index and Search your computer Type=Application Categories=System;Utility;FileTools;Java; Exec=/usr/local/bin/docfetcher/DocFetcher-GTK3.sh Terminal=false StartupNotify=true Icon=/usr/local/bin/docfetcher/img/docfetcher128.png
Start Your Engines
When you first launch DocFetcher, some systems start with a dialog where you can change the keyboard shortcut from the default ([Ctrl]+[F8]). If the shortcut is already mapped, a message asks you to confirm by pressing OK. The program window, which is divided into five panes, then appears. In the top-left corner, you will find an input field for the minimum and maximum file size that DocFetcher should consider for the search.
Select the file types you want DocFetcher to find from a dropdown list; the program enables all supported formats by default. Below is the search area, and top-right is an input line for the search terms. Below this area, the software lists the results with information on match relevance and file size; an area in the bottom right displays the contents of the selected file (Figure 1).
DocFetcher needs to index the contents of the mounted storage media in order to search reliably and quickly even in large data sets. You can trigger this indexing from the Create Index From dialog, which you can access by right-clicking in the search area in the bottom-left of the main window. Then select either a folder or an archive file. In Microsoft environments, DocFetcher supports indexing of PST files containing messages, contacts, tasks, or appointments.
To limit the size of files that the program should consider, enter the minimum and maximum values in the boxes in the upper-left corner. The process of indexing the data collection, which relies on Apache Lucene [2], takes some time during the first run, but this step will significantly speed up searching in these folders (Figure 2).
After indexing is complete, you will find the indexed directories and archives in the Search Scope pane. Enter the desired search terms in the search box. After you press the Search button, DocFetcher searches through the indexed data and lists the locations. Files containing the search term appear together with information such as the file size. Below you will find the text passages where the search term appears. DocFetcher highlights the term in yellow (refer to Figure 1).
Multiple Terms
In addition to the simple keyword search, DocFetcher also offers simultaneous searching for several keywords. You can also search for word sequences or specify terms to exclude from the search. If you want to search for two terms, enter the two terms with the AND
operand. DocFetcher searches for files in which both terms occur together, although they can occur at any location in the text. If you want the application to find an exact word order, you need to put the words in quotes.
You can exclude a term from the search by prefixing it with a minus sign. For a wildcard search, use a question mark or asterisk. The question mark replaces exactly one character in a search term; the asterisk replaces several characters. Especially when searching for compound nouns and technical terms, the asterisk is most helpful.
The search sometimes reveals results that are not needed at all. With the option to exclude unneeded formats, you can quickly thin out the list of hits. Uncheck the boxes to the left of the individual file formats in the Document Types window segment. Alternatively, use the Search Scope pane to limit the search to the relevant directory trees.
In the results display, you can scroll through the terms found page by page by clicking on the arrows to the left or right above the search display. The matches are shown with a yellow background. The up/down arrow buttons are used to navigate from match to match; DocFetcher highlights the search key in green.
Updates
As soon as you store new data in the directory hierarchies integrated by DocFetcher, you have to update the index to include all files in later searches. To update the index, right-click on the index in the search area and select the Update… option from the context menu. DocFetcher now integrates the new files and directories into the index in a process that is far faster than the initial indexing.
You can use the same context menu to list the documents in a folder without searching through them. Select the List Documents option. The software then displays the individual files in the results display top right in the program window. You can only apply this function to a single directory, not to higher-level directories that only contain subdirectories themselves.
To remove individual files from the folder, right-click the file and select Open Parent Folder from the context menu. The file manager opens, listing the files in the parent folder. Alternatively, you can display the folder contents by right-clicking on the directory in the lower-left corner of the search area and selecting Open Folder from the context menu.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Juno Computers Launches Another Linux Laptop
If you're looking for a powerhouse laptop that runs Ubuntu, the Juno Computers Neptune 17 v6 should be on your radar.
-
ZorinOS 17.1 Released, Includes Improved Windows App Support
If you need or desire to run Windows applications on Linux, there's one distribution intent on making that easier for you and its new release further improves that feature.
-
Linux Market Share Surpasses 4% for the First Time
Look out Windows and macOS, Linux is on the rise and has even topped ChromeOS to become the fourth most widely used OS around the globe.
-
KDE’s Plasma 6 Officially Available
KDE’s Plasma 6.0 "Megarelease" has happened, and it's brimming with new features, polish, and performance.
-
Latest Version of Tails Unleashed
Tails 6.0 is based on Debian 12 and includes GNOME 43.
-
KDE Announces New Slimbook V with Plenty of Power and KDE’s Plasma 6
If you're a fan of KDE Plasma, you'll be thrilled to hear they've announced a new Slimbook with an AMD CPU and the latest version of KDE Plasma desktop.
-
Monthly Sponsorship Includes Early Access to elementary OS 8
If you want to get a glimpse of what's in the pipeline for elementary OS 8, just set up a monthly sponsorship to help fund its continued existence.
-
DebConf24 to be Held in South Korea
Busan will be the location of the latest DebConf running July 28 through August 4
-
Fedora Unleashes Atomic Desktops
Fedora has combined its solid distribution with rpm-ostree system to make it possible to deliver a new family of Fedora spins, called Fedora Atomic Desktops.
-
Bootloader Vulnerability Affects Nearly All Linux Distributions
The developers of shim have released a version to fix numerous security flaws, including one that could enable remote control execution of malicious code under certain circumstances.