Close Search

Tutorials – Recoll

Article from Issue 212/2018
Author(s):

Even in the age of cloud computing, personal computers often hold thousands of files: text files, spreadsheets, word processing docs, configuration files, and HTML files, as well as email and other message formats. If it takes too long to find the file you need, chase it down with the Recoll local search engine.

Recoll [1] is free software for Linux and Windows systems that adds a local search engine to your desktop or local network. And if you think that desktop search engines don't make sense in this age of cloud computing, I beg to disagree!

Look inside any school, NGO, small/medium enterprise, or individual computer used for more than a few years: Almost always, you will find big archives of mostly textual content that will never be uploaded in the cloud or otherwise exposed to an online search engine. Sometimes the reason is mere lack of time, bandwidth, or money. Sometimes it is privacy. Sometimes the reason is easier compliance with regulations like the European General Data Protection Regulation (GDPR) [2]. In all cases, deploying local search capability could make thousands and thousands of files much more useful for their owners.

Recoll is an excellent answer to the need for a local search engine. The Recoll search tool offers flexible interfaces, good documentation, and easy installation. Thanks to a relatively simple search language, Recoll can analyze and index text inside all the most common document formats, even when those documents are "hidden" inside other files (for example, an OpenDocument file zipped and attached to an email message). In most cases, you can preview or open the files found with your search by just clicking on them inside the Recoll window.

The first part of this tutorial explains how Recoll works and how to install it and configure its most critical functions. The second part describes the Recoll search syntax and offers a few tips to help you make the best use of Recoll.

Architecture and User Interfaces

Strictly speaking, Recoll is just a wrapper, albeit a great one, for the open source information retrieval library called Xapian [3]. It is Xapian, not Recoll, that performs all the high-level indexing and classification of your documents. Xapian is also directly usable via scripts in Perl, Java, and other languages. But it is Recoll that makes the desktop search really usable, by doing all the rest of the work, from overall configuration to obscure, low-level tasks like stemming. Stemming is the process of reducing similar words to their common root. It is thanks to stemming that you can search for a word like "hacker" and receive results for "hackerS" or "hacking" in addition to the original search term.

The other tasks that Recoll handles directly are extracting text from your files, decoding your queries and, of course, presenting their results in a format that makes it easy to browse and open them from a graphical interface.

With the right libraries and plugins, you can perform Recoll searches directly from Python and other languages, or from desktop environments like Unity or KDE. This article will focus on the native Recoll GUI, its web-based equivalent, and, of course, the evergreen command-line option.

Installation

Recent Recoll versions are available as binary packages for Windows and the most popular Linux distributions. On Ubuntu, for example, type the following commands at the prompt to add a Personal Package Archive (PPA) repository for recoll and install both the graphical and command-line interfaces:

#> sudo add-apt-repository ppa:recoll-backports/recoll-1.15-on
#> sudo apt-get update
#> sudo apt-get install recoll -y

(Don't be fooled by the 1.15 in the repository name: The command will install the current version of Recoll, whatever it is). After those commands, typing recoll in the desktop search bar will show you the icon that opens the Recoll native GUI. To search at the command prompt or in a shell script, use the command recollq. Use the recollindex command to generate an index.

You must install the Recoll web interface separately. Go to the Github page for the web interface [4], download the master.zip archive for your version of Recoll, and unzip it to expand a folder called recoll-webui-master. The file inside the folder called webui-standalone.py is a mini web server, which you can reach with your browser at the address http://localhost:8080. The mini web server is a bit slow, but it works right away for all the users of the local network, with one (well documented) caveat: You cannot directly open local files from the links in its listings unless you explicitly authorize Firefox to do so (see the box entitled "Authorizing Firefox").

Authorizing Firefox

To authorize Firefox to let you open local files, add the contents of the file examples/firefox-user.js into ~/.mozilla/firefox/<profile>/user.js and restart Firefox.

If you plan to use Recoll on a regular basis, you might wish to configure your Linux system to start it automatically when the system boots. See your Linux distro's documentation for more on configuring an application to launch at system startup.

Indexing Configuration

No search engine is better than its index. Telling Recoll how to create and maintain the index is the most critical part of the configuration (Figure 1).

Figure 1: A detail of the most crucial, but not difficult, Recoll configuration phase: The detailed definition of which files should be indexed, and how.

Recoll has a system-wide configuration file (/usr/share/recoll/examples/recoll.conf on Ubuntu), but each user also gets a personal configuration – with a higher priority. The personal configuration file is stored in $HOME/.recoll/recoll.conf. The first time you start it, the Recoll GUI will ask you how to configure the index and will save your choices in your personal file. Among other things, you may define which files types should be indexed and the default language.

By default, Recoll will only have one index for your whole home directory, but it may handle many, totally independent indexes. The only requirement is that each index has a dedicated configuration directory. The simplest way to make Recoll create a separate configuration and index seems to be to create an empty directory and then start the software from the command line with the -c option pointing to it:

#>mkdir $HOME/.recoll-customconfig
#>recoll -c $HOME/.recoll-customconfig

You can search in more indexes simultaneously by adding them in the Preferences | External Index Dialog of the GUI. Don't forget that, when you search on multiple indexes, Recoll will use all their data, but it will only use the configuration of one index: the default index, or the index explicitly set with the RECOLL_CONFDIR environment variable or the -c option.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Recoll

    Whether you’re looking for a letter to the Internal Revenue Service or an email from an online trader, the Recoll desktop search machine will help you find it with just a few mouse clicks.

  • Paperwork Document Manager

    Paperwork was developed to manage the paperless office – a dream as old as desktop PCs.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News