DocFetcher

Bloodhound

Article from Issue 214/2018
Author(s):

DocFetcher is a practical local search tool that is easy to configure and use – even for large data collections.

Modern operating systems take up several Gigabytes of space just for the many application programs, and they sometimes contain up to several hundred thousand individual files. If you add your extensive music or photo collection, you can quickly lose track.

Modern desktop environments offer indexing and search applications for existing data, and the Linux environment includes several special search programs. However, many of these programs are not very intuitive, and some even expect you to install a database as a backend. In addition, many of the tools often do not support full-text searches. If you are looking for a lean, practical, and powerful search tool for your workstation, DocFetcher is a very interesting alternative.

You can download the Java application from the project page, where you will also find installation instructions [1]. As a prerequisite, you need a reasonably up-to-date Java runtime environment; DocFetcher harmonizes perfectly with the current OpenJDK environments, which you can usually install directly from your distribution's software repositories.

Unpack the downloaded ZIP archive with the DocFetcher files using a tool like Ark, File Roller, or Xarchiver. You can then move the subdirectory you created to a directory of your choice. To start the program from a desktop menu, however, you need to manually create a menu entry (see the box entitled "Installation").

Installation

Many Linux distributions do not include DocFetcher in their package sources. Ubuntu, for example, does not yet include a package for DocFetcher. It is thus often necessary to install DocFetcher manually.

Listing 1 shows how to unpack the ZIP archive downloaded from the project page into the /usr/local/bin/ directory. In Listing 2, you will find the content for /usr/share/applications/docfetcher.desktop to help you create a matching entry in the Start menu of the desktop environment.

Adjust the version number in the commands if necessary. If you prefer a location other than /usr/local/bin/docfetcher/, remember to change the paths appropriately. If you are still using a system without GTK3 libraries, you also need to swap DocFetcher-GTK3.sh for DocFetcher-GTK2.sh in the Exec line.

Listing 1

Unzipping DocFetcher

$ unzip docfetcher-1.1.19-portable.zip
$ sudo mv DocFetcher-1.1.19/ /usr/local/bin/docfetcher

Listing 2

Creating a Menu Entry

Version=1.0
Name=DocFetcher
GenericName=Document Index and Search
X-GNOME-FullName=DocFetcher Document Index and Search
Comment=Index and Search your computer
Type=Application
Categories=System;Utility;FileTools;Java;
Exec=/usr/local/bin/docfetcher/DocFetcher-GTK3.sh
Terminal=false
StartupNotify=true
Icon=/usr/local/bin/docfetcher/img/docfetcher128.png

Start Your Engines

When you first launch DocFetcher, some systems start with a dialog where you can change the keyboard shortcut from the default ([Ctrl]+[F8]). If the shortcut is already mapped, a message asks you to confirm by pressing OK. The program window, which is divided into five panes, then appears. In the top-left corner, you will find an input field for the minimum and maximum file size that DocFetcher should consider for the search.

Select the file types you want DocFetcher to find from a dropdown list; the program enables all supported formats by default. Below is the search area, and top-right is an input line for the search terms. Below this area, the software lists the results with information on match relevance and file size; an area in the bottom right displays the contents of the selected file (Figure 1).

Figure 1: The DocFetcher interface shows all the important information.

DocFetcher needs to index the contents of the mounted storage media in order to search reliably and quickly even in large data sets. You can trigger this indexing from the Create Index From dialog, which you can access by right-clicking in the search area in the bottom-left of the main window. Then select either a folder or an archive file. In Microsoft environments, DocFetcher supports indexing of PST files containing messages, contacts, tasks, or appointments.

To limit the size of files that the program should consider, enter the minimum and maximum values in the boxes in the upper-left corner. The process of indexing the data collection, which relies on Apache Lucene [2], takes some time during the first run, but this step will significantly speed up searching in these folders (Figure 2).

Figure 2: DocFetcher keeps you in the know during indexing.

After indexing is complete, you will find the indexed directories and archives in the Search Scope pane. Enter the desired search terms in the search box. After you press the Search button, DocFetcher searches through the indexed data and lists the locations. Files containing the search term appear together with information such as the file size. Below you will find the text passages where the search term appears. DocFetcher highlights the term in yellow (refer to Figure 1).

Multiple Terms

In addition to the simple keyword search, DocFetcher also offers simultaneous searching for several keywords. You can also search for word sequences or specify terms to exclude from the search. If you want to search for two terms, enter the two terms with the AND operand. DocFetcher searches for files in which both terms occur together, although they can occur at any location in the text. If you want the application to find an exact word order, you need to put the words in quotes.

You can exclude a term from the search by prefixing it with a minus sign. For a wildcard search, use a question mark or asterisk. The question mark replaces exactly one character in a search term; the asterisk replaces several characters. Especially when searching for compound nouns and technical terms, the asterisk is most helpful.

The search sometimes reveals results that are not needed at all. With the option to exclude unneeded formats, you can quickly thin out the list of hits. Uncheck the boxes to the left of the individual file formats in the Document Types window segment. Alternatively, use the Search Scope pane to limit the search to the relevant directory trees.

In the results display, you can scroll through the terms found page by page by clicking on the arrows to the left or right above the search display. The matches are shown with a yellow background. The up/down arrow buttons are used to navigate from match to match; DocFetcher highlights the search key in green.

Updates

As soon as you store new data in the directory hierarchies integrated by DocFetcher, you have to update the index to include all files in later searches. To update the index, right-click on the index in the search area and select the Update… option from the context menu. DocFetcher now integrates the new files and directories into the index in a process that is far faster than the initial indexing.

You can use the same context menu to list the documents in a folder without searching through them. Select the List Documents option. The software then displays the individual files in the results display top right in the program window. You can only apply this function to a single directory, not to higher-level directories that only contain subdirectories themselves.

To remove individual files from the folder, right-click the file and select Open Parent Folder from the context menu. The file manager opens, listing the files in the parent folder. Alternatively, you can display the folder contents by right-clicking on the directory in the lower-left corner of the search area and selecting Open Folder from the context menu.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News