A new semantic search engine for the KDE desktop
Loaded for Bear
Baloo replaces Nepomuk as the semantic search engine on the KDE desktop, but it gets off to a bumpy start.
The Nepomuk semantic search has been highly controversial since its introduction in KDE – both for users and for application developers. The release of KDE Applications 4.13 brings a new solution named Baloo to the KDE desktop – but not without some growing pains.
Desktop environments are pretty dumb, as users discover when they look for a file whose name they simply cannot remember. Semantic desktops were designed to address this problem. A semantic desktop has access to information about data, such as what data manifested, when data was created, and how data relate to each other. Using this information, a user can, for example, search for a file on the hard disk that John Doe emailed in March.
The KDE developers wanted to give their desktop environment an appropriate semantic search engine. They turned to the results of the Nepomuk (Networked Environment for Personalized, Ontology-Based Management of Unified Knowledge; Figure 1) research project [1] funded by the European Union from 2006 to 2008 to the tune of several million euros . It was intended to provide everything necessary to promote and simplify the development of a semantic desktop.
RDF and Other Calamities
Nepomuk required the RDF (Resource Description Framework) to describe and store relations [2]. The implementation of Nepomuk in KDE collects information about stored data from all KDE applications, links and processes this information, and serves it up to the search function.
Since the introduction of Nepomuk in KDE 4, many users repeatedly complained about its poor performance and lack of stability. Application developers, in turn, found the API too complicated and called for extra features. Although the KDE developers made improvements over the years, many users continue to disable Nepomuk. Especially in interactions with Akonadi, the underpinnings of the PIM programs, Nepomuk generated excessive load.
Virtuoso, Vishuoso, and a Restart
The KDE developers identified the Virtuoso database, which ran in the background and which Nepomuk used to store its RDF data, as the major obstacle to better performance. Although Virtuoso itself worked pretty fast, it hogged extremely large amounts of memory. The KDE developers therefore started to implement their own RDF store named Vishuoso [3]. However, they continually stumbled over the requirements of the EU research project. They were partly vague, incomplete, and sometimes even redundant [4].
For these reasons, the KDE developers decided to take a radical step: They ditched RDF and the old Nepomuk and developed a successor named Baloo [5]. In contrast to Nepomuk, Baloo, which is named after a character in The Jungle Book (think "Bear Necessities"), is designed to be more frugal with resources, be more reliable, and deliver superior results faster. Internally, Baloo is modular, so in future, it will be easier to add new features and improve existing ones. Additionally, the design was overhauled with the intention of preventing failures.
The KDE developers have not completely rewritten Baloo; rather, they have recycled parts of the code. Thus, they refer to Baloo as the "next generation of Nepomuk search." The changes to the programming interface are thus manageable; application developers should find it relatively easy to adapt their software.
Relations and Stores
Baloo manages relations between two "uniquely identifiable identifiers." A file has the unique identifier file:<x>
, whereas Akonadi creates a unique identifier of the form akonadi:?item=<x>
for most PIM data.
Each relation ends up in a separate and appropriate data store. In the simplest case, this is a table with two columns in a SQLite database. Baloo deliberately does not require a specific storage format or data store API. The advantage of this setup is that the data store can be tailored to match the information to be stored, thus letting Baloo store the data in the best possible way.
The search itself is performed by the search stores. Each search store only takes care of certain data. For example, one exclusively searches for files (File Search), another searches in email (Email Search), and third searches in contacts (Contact Search). Each of these search stores provide a specific API through which the search can be triggered; however, they can also provide additional APIs.
Currently, Baloo only manages files and comes with a matching search store and data store. The collected data is stored in a SQLite database. The search is additionally supported by the Xapian software, which indexes the collected dataset. Akonadi already stores all its PIM data itself; the search in this dataset is handled by search stores for contacts and email [6]. Thanks to this new architecture with individual data and search stores, Baloo is unlikely to require too much memory and should deliver matching search results extremely quickly.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Juno Tab 3 Launches with Ubuntu 24.04
Anyone looking for a full-blown Linux tablet need look no further. Juno has released the Tab 3.
-
New KDE Slimbook Plasma Available for Preorder
Powered by an AMD Ryzen CPU, the latest KDE Slimbook laptop is powerful enough for local AI tasks.
-
Rhino Linux Announces Latest "Quick Update"
If you prefer your Linux distribution to be of the rolling type, Rhino Linux delivers a beautiful and reliable experience.
-
Plasma Desktop Will Soon Ask for Donations
The next iteration of Plasma has reached the soft feature freeze for the 6.2 version and includes a feature that could be divisive.
-
Linux Market Share Hits New High
For the first time, the Linux market share has reached a new high for desktops, and the trend looks like it will continue.
-
LibreOffice 24.8 Delivers New Features
LibreOffice is often considered the de facto standard office suite for the Linux operating system.
-
Deepin 23 Offers Wayland Support and New AI Tool
Deepin has been considered one of the most beautiful desktop operating systems for a long time and the arrival of version 23 has bolstered that reputation.
-
CachyOS Adds Support for System76's COSMIC Desktop
The August 2024 release of CachyOS includes support for the COSMIC desktop as well as some important bits for video.
-
Linux Foundation Adopts OMI to Foster Ethical LLMs
The Open Model Initiative hopes to create community LLMs that rival proprietary models but avoid restrictive licensing that limits usage.
-
Ubuntu 24.10 to Include the Latest Linux Kernel
Ubuntu users have grown accustomed to their favorite distribution shipping with a kernel that's not quite as up-to-date as other distros but that changes with 24.10.