Workflow-based data analysis with KNIME
Analyze This!

© Lead Image Photo by mari lezhava on Unsplash
They say data is "the new oil," but all that data you collect is only valuable if it leads to new insights. An open source analysis tool called KNIME lets you analyze data through graphical workflows – without the need for programming or complex spreadsheet manipulation.
Data analysts like to use flexible scripting languages such as R or Python that come with large ecosystems of libraries and extensions. But many users don't want to have to write and debug their own custom programs just to analyze data.
Visual workflows offer a different approach. You can use visual workflows to break down the analysis processes into modular, sequential steps. Each step is symbolized by a graphic element called a node. Each node performs an action, which might be a calculation, a formatting function, or another step related to data analysis and manipulation. By linking the nodes on the screen, users can create workflows for complex investigations of the data – without producing any code.
Visual workflows are the central element of the KNIME Analytics Platform. In the KNIME environment, a workflow is a graph with nodes showing a series of sequential steps for processing and analyzing the data. The user defines a pathway for the data by connecting the output of one node to the input of another node. A type system ensures that you can only connect compatible output and input. Real programming code is only necessary if you want to integrate KNIME with R or Python – or if you want to develop your own modules.
KNIME, which is pronounced "nime," was originally known as the Konstanz Information Miner; it began in 2006 at the Department of Bioinformatics and Data Mining at the University of Konstanz under the direction of Prof. Michael Berthold. The basic idea was to make data analysis easily accessible and affordable for users from different disciplines. The developers, therefore, turned their creation into an open source project, paying particular attention to making the tool extensible and user-friendly.
KNIME Analytics Platform is written in Java and is based on Eclipse and Open Services Gateway Initiative (OSGI) technology. The latest version (version 3.5.1 at the time this article was written) is available at the project website [1], where you will also find other introductory materials, including blogs, videos, and sample workflows.
The KNIME Analytics Platform is open source; anyone can download and use it free of charge. Start the installer or unpack the archive, and KNIME is ready to go. However, you might wish to add some KNIME extensions, which are the source for many additional nodes. To add an extension, go to File | Install KNIME Extensions, check the list for the desired extension, and follow the instructions.
Looking Around
Figure 1 shows the KNIME user interface. The KNIME Explorer in the top-left corner of the workspace provides an overview of the workflows. Click the View menu and select Workflow Coach to access the Workflow Coach, which offers suggestions on building a workflow. The Node Repository (bottom left) lists the nodes of all installed extensions.

The nodes are the building blocks of any KNIME workflow, and the vast library of available nodes gives KNIME its versatility and power. KNIME nodes perform tasks such as:
- data access
- data manipulation
- visualization
- analytics
- reporting
- flow control
- scripting
- big data
The node description in the window on the right of the user interface gives the user the necessary documentation on the function and use of a node. In the middle is the workflow editor, where users string the nodes together to develop the actual workflow. During the course of developing a workflow, the user connects, configures, and executes the KNIME nodes individually and collectively (Figure 2).

Once a node has been executed, which is indicated by a green traffic light symbol below the node, you can display the resulting data as a table, bar chart, or other format. If the traffic light symbol is red, the node is not yet configured. Yellow means the node is ready for execution.
Nodes can have between zero and an arbitrary number of inputs and outputs (ports). The ports' shape and color indicate what kind of data the node needs or outputs at a particular input or output (a black triangle indicates a table).
Sample Scenario
The best way to get to know KNIME is to work through an example workflow (Figure 3). The editors of a fictitious online magazine would like to know more about their readers' preferences. They have gathered some data from a rating system that gives readers the opportunity to rate magazine articles with up to five stars. Each article is also classified in at least one of the five categories: Hardware, Software, Development, Security, and Internet.

The editors hope to draw conclusions from the ratings on reader preferences and to identify groups of readers with common interests. They then want to suggest further articles to each reader that match their interests.
The starting points for the analysis are a SQLite database with the article ratings (Table 1) and a CSV file with the classification of the categories for the articles. Both data sources are therefore available in a table format.
Table 1
Evaluation from a Sqlite Database
Reader ID | Article ID | Evaluation |
---|---|---|
Reader 1 |
Article 11 |
1 |
Reader 93 |
Article 31 |
3 |
Reader 45 |
Article 3 |
4 |
… |
Loading the Data
The first step of any data analysis is loading the required data (the red nodes in Figure 3). For KNIME, the data can come from a variety of sources, such as text files, documents, databases, or web services.
Once loaded in KNIME, it doesn't matter where the data came from, because the software always converts the data into an internal format. In this case, I need to load data from an SQLite database and a CSV file. To help you access databases, KNIME has several nodes to assist with data input, including the Database Reader
node, which delivers the results of an SQL query as a KNIME table.
The node works with different database types and receives the connection information via its inbox. You can prefix an SQLite connector
, which is configured with the connection information and forwards it to other nodes.
The easiest way for KNIME to load text files is to use the File Reader
. This node automatically attempts to guess the file format, including column separators and row length, and displays a preview of the table in its configuration dialog.
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.
News
-
Linux Kernel Reducing Long-Term Support
LTS support for the Linux kernel is about to undergo some serious changes that will have a considerable impact on the future.
-
Fedora 39 Beta is Now Available for Testing
For fans and users of Fedora Linux, the first beta of release 39 is now available, which is a minor upgrade but does include GNOME 45.
-
Fedora Linux 40 to Drop X11 for KDE Plasma
When Fedora 40 arrives in 2024, there will be a few big changes coming, especially for the KDE Plasma option.
-
Real-Time Ubuntu Available in AWS Marketplace
Anyone looking for a Linux distribution for real-time processing could do a whole lot worse than Real-Time Ubuntu.
-
KSMBD Finally Reaches a Stable State
For those who've been looking forward to the first release of KSMBD, after two years it's no longer considered experimental.
-
Nitrux 3.0.0 Has Been Released
The latest version of Nitrux brings plenty of innovation and fresh apps to the table.
-
Linux From Scratch 12.0 Now Available
If you're looking to roll your own Linux distribution, the latest version of Linux From Scratch is now available with plenty of updates.
-
Linux Kernel 6.5 Has Been Released
The newest Linux kernel, version 6.5, now includes initial support for two very exciting features.
-
UbuntuDDE 23.04 Now Available
A new version of the UbuntuDDE remix has finally arrived with all the updates from the Deepin desktop and everything that comes with the Ubuntu 23.04 base.
-
Star Labs Reveals a New Surface-Like Linux Tablet
If you've ever wanted a tablet that rivals the MS Surface, you're in luck as Star Labs has created such a device.