Document management for the small office
Defying Chaos
Even in a small office, countless letters, email messages, and PDFs arrive daily. Document management systems help you avoid drowning in the flood of documents.
It's been more than a decade since the proclamation of the paperless office, with special document management systems (DMSs) proposed as the tool to manage arbitrary documents without miles of shelving. DMSs typically operate as client-server applications that users can access by means of a database back end.
Most of these DMS applications are at home in medium to large enterprises and are hopelessly oversized for use in small home offices. Successfully using a DMS becomes even more difficult when the requirements include Linux support. Nevertheless, I searched for DMSs for Linux workstations that relieve the strain on small offices without time-consuming training and permanent maintenance. In my search, I've taken a look at Krystal DMS, LogicalDOC, Paperwork, and Referencer (see also the "Not Tested" box).
Not Tested
OpenKM [1] was intended to be the fifth candidate in this test. Although it has a Linux version – including a community release and commercial and cloud packages – in our lab, the software proved to be extremely recalcitrant, with no usable installation routines for small offices or for less savvy admins, as well as no current documentation. Instead, you are expected to install the required packages manually, individually, and separately (including a Tomcat application server, a MySQL database, and applications such as ImageMagick and Ghostscript), followed by editing of complex configuration files – again by hand.
Although the manufacturer provides help documents, they are hopelessly out of date and caused attempted installations on current Linux distributions to fail. Some recent Linux versions also no longer offer the required packages. For Fedora and Red Hat Linux, the documentation refers to OpenOffice Suite 3.1.1, which was released August 31, 2009, and has seen countless new releases in the meantime.
The Debian and Ubuntu documentation also is out of date: It describes the configuration for the long-since-replaced SysVinit system but does not tell you how to handle the service units of the current systemd session manager. The Apache web server configuration no longer works as described, either. For all of these reasons, I did not test OpenKM for this article.
Requirements
Ideally, the DMS should reproduce the workflow of a document starting with its creation, through its entire lifecycle, to final deletion. The DMS should handle not only printed documents, but also files that exist electronically in various formats (e.g., email).
The DMS does not just act as an archiving system for quick access to archived documents using keywords, date stamps, or other attributes. It also needs to optimize the flow of information in organizations by introducing distribution mechanisms for eligible recipients, document linking, and access monitoring.
A modular design should also ensure trouble-free processing of documents in third-party applications, including popular office suites or Enterprise Content Management (ECM) systems.
Multiplatform capability to allow the use of the client on mobile devices like tablets is also becoming increasingly important. Today, this also includes cloud connections for access to documents in the DMS independently of stationary IT. Last, but not least, regulatory requirements for archiving also need to be met wherever you are in the world.
In the Small Office
Small offices do not typically require large DMSs that are usually difficult to install and configure and require regular maintenance on top. However, alternatives for small offices also need to handle input sources, such as printed documents, files of different formats, and stored email. Ideally, they should also include a scan engine that enables reading and text recognition of printed originals. Keywording and other storage functions are in the DMS's domain, as well as interfaces for the major office suites (see Table 1).
Table 1
Overview DMS Functions
| Krystal DMS | LogicalDOC | Paperwork | Referencer |
---|---|---|---|---|
Modular design |
Yes |
Yes |
Yes |
No |
Localization |
Yes* |
Yes |
Yes |
Yes |
Client-server architecture |
Yes |
Yes |
No |
No |
Web-based interface |
Yes |
Yes |
No |
No |
Scanning module |
Yes* |
Yes* |
Yes |
No |
Multiple sheet scanning |
Yes* |
Yes* |
Yes |
No |
OCR module |
Yes (external) |
Yes (external) |
Yes |
No |
Import function |
Yes |
Yes |
Yes |
Yes |
Export function |
Yes* |
Yes |
Yes |
Yes (external) |
Viewer |
Yes |
Yes |
Yes |
No |
Indexing and searching |
Yes |
Yes |
Yes |
Yes |
Version history |
Yes |
Yes |
No |
No |
Comments |
No |
Yes |
No |
Yes |
Cloud connection |
No |
Yes |
No |
Yes |
Mobile apps |
Yes |
Yes |
No |
No |
Link to CMS systems |
No |
Yes |
No |
No |
*Available only in the commercial versions. |
Less relevant in small DMS solutions, however, is sophisticated mechanisms for granting rights and modules for interacting with major league ERP and ECM solutions. Also, the ability to use an app to access the DMS software from a mobile device, such as a tablet or smartphone, is less important in this working environment. What proves to be as important in the service portfolio solution for small offices and individual workstations, however, is easy installation and configuration of the software.
The Trouble with OCR
Reliable detection of scanned originals remains problematic on Linux. If the DMS applications do not have their own OCR modules, users are forced in many cases to rely on third-party solutions. In a Linux Magazine lab, we tested an OCR team consisting of Tesseract and gImageReader. The solution turned out to be technologically mature and therefore usable (see the "Tesseract and gImageReader" box).
Tesseract and gImageReader
Hewlett-Packard (HP) worked on the Tesseract [2] text recognition engine between 1985 and 1995. For 10 years, development lay dormant because HP had abandoned this market segment. In 2005, Google acquired the software and, after revising the code, released it to the developer community as free software under the Apache license. Subsequently Tesseract spread throughout the Linux universe. Thanks to the modular design, Tesseract is also multilingual, and even German blackletter types are now detected if you have the matching modules in place. Not even foreign languages with many nonstandard characters can pose unsolvable problems for the software.
Because OCR engines are typically command-line-only applications, third parties have developed various graphical interfaces over the years to make the programs easier to use. The GUI environments often cover one or several special engines.
gImageReader [3] has established itself as a relatively unknown front end for Tesseract OCR. In addition to ease of use, it promises a particularly lean design and therefore comes without unnecessary bells and whistles. Both software packages are available in software repositories of the popular Linux distributions. You can thus install at the push of a button on your flavor of Linux, then simply call the graphical front end, which automatically launches the OCR engine in the background, so you can scan originals and launch the recognition process (Figure 1).
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
New Slimbook EVO with Raw AMD Ryzen Power
If you're looking for serious power in a 14" ultrabook that is powered by Linux, Slimbook has just the thing for you.
-
The Gnome Foundation Struggling to Stay Afloat
The foundation behind the Gnome desktop environment is having to go through some serious belt-tightening due to continued financial problems.
-
Thousands of Linux Servers Infected with Stealth Malware Since 2021
Perfctl is capable of remaining undetected, which makes it dangerous and hard to mitigate.
-
Halcyon Creates Anti-Ransomware Protection for Linux
As more Linux systems are targeted by ransomware, Halcyon is stepping up its protection.
-
Valve and Arch Linux Announce Collaboration
Valve and Arch have come together for two projects that will have a serious impact on the Linux distribution.
-
Hacker Successfully Runs Linux on a CPU from the Early ‘70s
From the office of "Look what I can do," Dmitry Grinberg was able to get Linux running on a processor that was created in 1971.
-
OSI and LPI Form Strategic Alliance
With a goal of strengthening Linux and open source communities, this new alliance aims to nurture the growth of more highly skilled professionals.
-
Fedora 41 Beta Available with Some Interesting Additions
If you're a Fedora fan, you'll be excited to hear the beta version of the latest release is now available for testing and includes plenty of updates.
-
AlmaLinux Unveils New Hardware Certification Process
The AlmaLinux Hardware Certification Program run by the Certification Special Interest Group (SIG) aims to ensure seamless compatibility between AlmaLinux and a wide range of hardware configurations.
-
Wind River Introduces eLxr Pro Linux Solution
eLxr Pro offers an end-to-end Linux solution backed by expert commercial support.