Putting free digital assistants to the test
Friend and Helper
Researchers from the University of Michigan have built an intelligent personal assistant akin to Siri and Cortana from free components. Although the Sirius Project focuses on the server load created by digital assistant software, we are interested in the usability of Sirius and its successor Lucida.
What does the chief engineer of a spaceship in the 23rd century do to operate a computer from the 20th century? He picks up the mouse and says, "Hello, computer" (Star Trek IV: The Voyage Home, Paramount Pictures, 1986). During his journey through time, Montgomery "Scotty" Scott nonetheless had to hit the keys eventually.
Owners of modern smartphones, on the other hand, can go a long way with OK, Google
, Hey, Siri
, or Hey, Cortana
; the speech assistants understand many questions or instructions formulated in everyday language. You can only guess how many algorithms are behind the proprietary marvels.
Things are quite different with the open source intelligent personal assistant Sirius [1], which was developed in 2015 by the research group Clarity Lab at the University of Michigan [2]. The software, published under the BSD license, bundles together the free speech recognition systems CMU Sphinx [3] (PocketSphinx and Sphinx4), Kaldi [4], image recognition based on OpenCV [5], the question-answering system OpenEphyra [6], and UC Berkley's deep learning framework Caffe [7]. A Wikipedia dump forms the basis for OpenEphyra's data corpus. With aid from all of these components, Sirius is in a position to answer typed or spoken questions and to recognize objects in images (Figure 1).
The developers at Clarity Lab formulated the aim of the software in an abstract [8] for the Sirius tutorial that took place during the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-20). They proceed from the assumption that the demand for intelligent personal assistants (IPAs) will increase in the future and ask what server architectures will have to look like to handle the workload of these programs. Because of a lack of open source IPAs to calculate the load, they developed Sirius so they could represent the resource requirements realistically.
How does Sirius fare in practical use? Is the program a suitable helper on the Linux desktop? Those running the test considered these questions, and carefully examined Sirius and its successor, Lucida [9]. They installed the software on Ubuntu 14.04 and Ubuntu 16.04, used the Sirius speech recognition, tested its question-answering system, and scrutinized its image recognition abilities. Lucida is not yet as far along. So far, only a simple question and answer game has operated in its demo version, which the testing team briefly exercised.
Ready-to-Assemble Kit
The Clarity Lab website offers a download that includes the Sirius application, Sirius Suite, and the web front-end server; the Sirius Suite alone with a Caffe snapshot; and the Wikipedia dump for the question-answering system [10].
After unpacking the Sirius archive, you switch to the sirius-1.0.1/sirius-application
directory. A few scripts here import the software expected by Sirius, load components from the Internet, and compile and install them. The scripts are written for Ubuntu 14.04; if you use this somewhat older LTS version (that is nevertheless supported until 2019), you should enter the following four commands:
sudo ./get-dependencies.sh sudo ./get-opencv.sh ./get-kaldi.sh ./compile-sirius-servers.sh
If you use the current Ubuntu 16.04, adjust the get-dependencies.sh
script in the text editor beforehand and comment out the entry for adding the external FFmpeg repository (ppa:kirillshkrogalev/ffmpeg-next
). The external package source is no longer necessary because FFmpeg is in the official Xenial repositories.
Next, execute the first three commands, but before you call up ./compile-sirius-servers.sh
, place a symbolic link from /usr/bin/libtoolize
to /usr/bin/libtool
, because the Kaldi makefile searches for this binary.
A fast Internet connection is an advantage, because the scripts download a whole host of software. With the OpenCV download, around 3GB of data are copied onto the disk; Kaldi takes up 2GB. The Sirius archive itself is 470MB in size, and the Wikipedia dump encompasses some 11GB. When completely installed, Sirius and its components occupy around 25GB of disk space.
The scripts that bring the speech recognition, image recognition, and question-answering system into the arena are in the sirius-application/run-scripts
directory with start
at the beginning of their file names. All three components are implemented as server services. The scripts you use to direct your requests to the servers are also found here with test
in their file names.
Good Listener
In their first attempt, the test team fed a few of the WAV files stored in the sirius-application/inputs/questions
directory to Sirius automatic speech recognition (ASR) and started the ASR server in a terminal in succession with one of the three available back ends (Kaldi, PocketSphinx, and Sphinx4):
./start-asr-server.sh kaldi ./start-asr-server.sh PocketSphinx ./start-asr-server.sh sphinx4
We then called up the sirius-asr-test.sh
script in a second terminal together with a question (Provided) and saw the result from Sirius (Figure 2). Sometimes it worked well, sometimes only after waiting a while, and sometimes not at all; the communication with Sphinx4 using Ubuntu 16.04 completely misfired. For the comparison, the test team recorded the sentences themselves (Recorded) with a microphone and sent them to all three back ends. With the aid of five example sentences, Table 1 shows what Kaldi, PocketSphinx, and Sphinx4 understood.
Table 1
Sirius ASR Back Ends
Recording | Source | Kaldi | PocketSphinx | Sphinx4 |
---|---|---|---|---|
Who invented the telegraph? |
||||
|
Provided |
who invented the telegraph |
who invented the telegraph |
who invented the telegraph |
|
Recorded |
we went at the telegraph |
we're going to the telegraph |
with only scowled |
Where is the Louvre Museum located? |
||||
|
Provided |
where is the liberal museum love the change yeah |
where is the liver uneasy and located |
where's the louvre museum located |
|
Recorded |
where was the little free museums okay tent |
where is the u. over a museum located |
london back while passengers are |
Where did John Lennon die? |
||||
|
Provided |
where do you john lennon dot |
where did john lennon got |
where did john lennon died |
|
Recorded |
when it it's john lennon die |
where did john lennon die |
only after all how often run |
What is the population of France? |
||||
|
Provided |
what is the population of france |
what is the population of forms |
what is the population of france |
|
Recorded |
uh what is the population of france |
what is the population of trunks |
in a half and unload newark crown |
What is the speed of light? |
||||
|
Provided |
which is the speed of light |
what is the speed of light |
what is the speed of light |
|
Recorded |
well just the speed of flights |
what does the speed of light |
the injury to half moon last |
The quality of text recognition is very patchy: With the WAV files provided, only the Sphinx4 back end worked almost flawlessly. On the other hand, with the testers' own recordings, the correctly recognized sentences remain a strange exception. The developers may have trained their speech recognition libraries primarily with the files they enclosed, which are spoken with an American accent throughout. With the test team's own recordings (in British English with a German accent), Sphinx4 particularly was unable to cope; the other engines at least recognized individual words.
Quality of the audio should not explain the lack of understanding, because a decent microphone was used. The testers recorded their sentences at random with a headset and a different frequency response, and the recordings still delivered inferior results. The Google and Apple speech recognition engines recognized almost all the questions on the test team's smartphones.
Answer Me
If the digital assistant understands a question, it would be great if it could answer it as well. The Sirius developers employ the question-answering system OpenEphyra [6] for this step.
A Wikipedia dump without semantic distinctions serves as the data corpus. The developers created this with Indri [11], a search engine specialized for large text corpora. You can download the Wikipedia knowledge database from the Sirius download page and extract it into the sirius-application/question-answer
directory.
Now start the QA server with the start-qa-server.sh
script from the sirius-application/run-scripts
directory. On the Ubuntu 16.04 test machine, this did not work without further ado; a call to ant
– which uses the XML build files for OpenEphyra and documentation files – in the sirius-application/question-answer
directory was necessary before the server started working. If you receive an insufficient threads configured warning, you can fix it with a simple hack and comment out this line in the sirius-application/question-answer/src/info/ephyra/OpenEphyraServer.java
file:
con1.setThreadPool(new QueuedThreadPool(NTHREADS));
After taking care of this problem, you must call up the compile-sirius-servers.sh
script once more and restart the QA server.
Now you can ask questions in a second terminal; for example:
./sirius-qa-test.sh "what is the speed of light"
After a confirmation that the question has come through, a message appears stating that the question has gone to the server. After a short wait, the answer pops up in the terminal (Figure 3).
Because spoken and typed questions are both possible, it would be great if you could combine these. That is no problem with Sirius; you simply start the ASR service along with the QA server and use the following script for communication:
./sirius-asr-qa-test.sh ../inputs/real/who.is.the.current.president.of.the.united.states.wav
Depending on the ASR back end, the analysis then continues. After this part has successfully transcribed the question, however, the QA service still requires some time to find the answer, so patience is needed.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Gnome OS Transitioning Toward a General-Purpose Distro
If you're looking for the perfectly vanilla take on the Gnome desktop, Gnome OS might be for you.
-
Fedora 41 Released with New Features
If you're a Fedora fan or just looking for a Linux distribution to help you migrate from Windows, Fedora 41 might be just the ticket.
-
AlmaLinux OS Kitten 10 Gives Power Users a Sneak Preview
If you're looking to kick the tires of AlmaLinux's upstream version, the developers have a purrfect solution.
-
Gnome 47.1 Released with a Few Fixes
The latest release of the Gnome desktop is all about fixing a few nagging issues and not about bringing new features into the mix.
-
System76 Unveils an Ampere-Powered Thelio Desktop
If you're looking for a new desktop system for developing autonomous driving and software-defined vehicle solutions. System76 has you covered.
-
VirtualBox 7.1.4 Includes Initial Support for Linux kernel 6.12
The latest version of VirtualBox has arrived and it not only adds initial support for kernel 6.12 but another feature that will make using the virtual machine tool much easier.
-
New Slimbook EVO with Raw AMD Ryzen Power
If you're looking for serious power in a 14" ultrabook that is powered by Linux, Slimbook has just the thing for you.
-
The Gnome Foundation Struggling to Stay Afloat
The foundation behind the Gnome desktop environment is having to go through some serious belt-tightening due to continued financial problems.
-
Thousands of Linux Servers Infected with Stealth Malware Since 2021
Perfctl is capable of remaining undetected, which makes it dangerous and hard to mitigate.
-
Halcyon Creates Anti-Ransomware Protection for Linux
As more Linux systems are targeted by ransomware, Halcyon is stepping up its protection.