A new semantic search engine for the KDE desktop
Loaded for Bear

Baloo replaces Nepomuk as the semantic search engine on the KDE desktop, but it gets off to a bumpy start.
The Nepomuk semantic search has been highly controversial since its introduction in KDE – both for users and for application developers. The release of KDE Applications 4.13 brings a new solution named Baloo to the KDE desktop – but not without some growing pains.
Desktop environments are pretty dumb, as users discover when they look for a file whose name they simply cannot remember. Semantic desktops were designed to address this problem. A semantic desktop has access to information about data, such as what data manifested, when data was created, and how data relate to each other. Using this information, a user can, for example, search for a file on the hard disk that John Doe emailed in March.
The KDE developers wanted to give their desktop environment an appropriate semantic search engine. They turned to the results of the Nepomuk (Networked Environment for Personalized, Ontology-Based Management of Unified Knowledge; Figure 1) research project [1] funded by the European Union from 2006 to 2008 to the tune of several million euros . It was intended to provide everything necessary to promote and simplify the development of a semantic desktop.

RDF and Other Calamities
Nepomuk required the RDF (Resource Description Framework) to describe and store relations [2]. The implementation of Nepomuk in KDE collects information about stored data from all KDE applications, links and processes this information, and serves it up to the search function.
Since the introduction of Nepomuk in KDE 4, many users repeatedly complained about its poor performance and lack of stability. Application developers, in turn, found the API too complicated and called for extra features. Although the KDE developers made improvements over the years, many users continue to disable Nepomuk. Especially in interactions with Akonadi, the underpinnings of the PIM programs, Nepomuk generated excessive load.
Virtuoso, Vishuoso, and a Restart
The KDE developers identified the Virtuoso database, which ran in the background and which Nepomuk used to store its RDF data, as the major obstacle to better performance. Although Virtuoso itself worked pretty fast, it hogged extremely large amounts of memory. The KDE developers therefore started to implement their own RDF store named Vishuoso [3]. However, they continually stumbled over the requirements of the EU research project. They were partly vague, incomplete, and sometimes even redundant [4].
For these reasons, the KDE developers decided to take a radical step: They ditched RDF and the old Nepomuk and developed a successor named Baloo [5]. In contrast to Nepomuk, Baloo, which is named after a character in The Jungle Book (think "Bear Necessities"), is designed to be more frugal with resources, be more reliable, and deliver superior results faster. Internally, Baloo is modular, so in future, it will be easier to add new features and improve existing ones. Additionally, the design was overhauled with the intention of preventing failures.
The KDE developers have not completely rewritten Baloo; rather, they have recycled parts of the code. Thus, they refer to Baloo as the "next generation of Nepomuk search." The changes to the programming interface are thus manageable; application developers should find it relatively easy to adapt their software.
Relations and Stores
Baloo manages relations between two "uniquely identifiable identifiers." A file has the unique identifier file:<x>
, whereas Akonadi creates a unique identifier of the form akonadi:?item=<x>
for most PIM data.
Each relation ends up in a separate and appropriate data store. In the simplest case, this is a table with two columns in a SQLite database. Baloo deliberately does not require a specific storage format or data store API. The advantage of this setup is that the data store can be tailored to match the information to be stored, thus letting Baloo store the data in the best possible way.
The search itself is performed by the search stores. Each search store only takes care of certain data. For example, one exclusively searches for files (File Search), another searches in email (Email Search), and third searches in contacts (Contact Search). Each of these search stores provide a specific API through which the search can be triggered; however, they can also provide additional APIs.
Currently, Baloo only manages files and comes with a matching search store and data store. The collected data is stored in a SQLite database. The search is additionally supported by the Xapian software, which indexes the collected dataset. Akonadi already stores all its PIM data itself; the search in this dataset is handled by search stores for contacts and email [6]. Thanks to this new architecture with individual data and search stores, Baloo is unlikely to require too much memory and should deliver matching search results extremely quickly.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Direct Download
Read full article as PDF:
Price $2.95
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
News
-
An All-Snap Version of Ubuntu is In The Works
Along with the standard deb version of the open-source operating system, Canonical will release an-all snap version.
-
Mageia 9 Beta 2 Ready for Testing
The latest beta of the popular Mageia distribution now includes the latest kernel and plenty of updated applications.
-
KDE Plasma 6 Looks to Bring Basic HDR Support
The KWin piece of KDE Plasma now has HDR support and color management geared for the 6.0 release.
-
Bodhi Linux 7.0 Beta Ready for Testing
The latest iteration of the Bohdi Linux distribution is now available for those who want to experience what's in store and for testing purposes.
-
Changes Coming to Ubuntu PPA Usage
The way you manage Personal Package Archives will be changing with the release of Ubuntu 23.10.
-
AlmaLinux 9.2 Now Available for Download
AlmaLinux has been released and provides a free alternative to upstream Red Hat Enterprise Linux.
-
An Immutable Version of Fedora Is Under Consideration
For anyone who's a fan of using immutable versions of Linux, the Fedora team is currently considering adding a new spin called Fedora Onyx.
-
New Release of Br OS Includes ChatGPT Integration
Br OS 23.04 is now available and is geared specifically toward web content creation.
-
Command-Line Only Peropesis 2.1 Available Now
The latest iteration of Peropesis has been released with plenty of updates and introduces new software development tools.
-
TUXEDO Computers Announces InfinityBook Pro 14
With the new generation of their popular InfinityBook Pro 14, TUXEDO upgrades its ultra-mobile, powerful business laptop with some impressive specs.