Python graphics libraries for data visualization
Pinpoint Accuracy
Python's powerful Matplotlib, Bokeh, PyQtGraph, and Pandas libraries lend programmers a helping hand when visualizing complex data and their relationships.
Daily life is inundated by data collected, processed, and made available again in edited form. Examples include weather, temperature, precipitation, humidity, air pressure, wind, sales, and load measurements, as well as vehicle and services data.
A first step is to evaluate data in the simplest form as a table that provides an overview of individual values. Beyond this, a suitable graphical visualization helps you to understand the relationships at a glance. Python has excellent libraries to implement this visualization step, including Matplotlib [1], PyQtGraph [2], Bokeh [3], and Pandas [4]. Appropriate libraries for other languages are summarized the "Visualization in Other Languages" box.
Visualization in Other Languages
If you are not a fan of the Python programming language, you might be able to find a matching library in another language. Perl has PLplot [5] and the chart modules Chart::Clicker [6] and GD::Graph [7], and PHP comes with the GD library [8] but can also be combined with JpGraph [9]. JavaScript-based applications can be fitted out with the InfoVis Toolkit [10], D3 [11], or Crossfilter [12]. Suitable candidates for combining JavaScript and jQuery include jqPlot [13], Visualize [14], and Flot [15]. Alternatively, R statistical language [16] incudes functions for plotting data out of the box. An overview of other, similar frameworks and libraries is available, for example, from the website at Datavisualization.ch [17], and Brian Suda's blog has a great overview of libraries [18].
In addition to showing data as two- or three-dimensional images, the respective API libraries contain interaction (PyQtGraph, Bokeh) and data analysis (Pandas) methods. With a short sample program, I can show you in each case how to take the initial hurdles in stride. (See the "Installation and Variants" box.)
Installation and Variants
Matplotlib, PyQtGraph, and Pandas can all be set up conveniently from the repositories in the major distributions using their respective package managers. Under Debian and Ubuntu, for example, the corresponding packages go by the names python-matplotlib, python-pyqtgraph, and python-pandas. However, Bokeh is not available as a package in many distributions. The best idea is to retrieve the package from the project website and install it via Anaconda or Pip, the package management utility for Python packages from the Python Package Index (PyPI).
The listings in this article are mostly based on Python 2.x, but with few or no changes they will run in Python 3.x, which encodes all strings as a sequence of Unicode characters (e.g., to simplify the use of special characters). A handy migration guide provides insight into the differences between the two versions [19].
Matplotlib
The sophisticated Matplotlib library focuses exclusively on presenting graphs, histograms, power spectra, and bar, pie, and error charts in the Cartesian coordinate system in two- and three-dimensional space. Supplementary toolkits like Basemap [20] and Cartopy [21] offer additional projection types and can combine your data with map data.
The website for the project has more suggestions. When browsing around the chart image gallery, you will find the associated, sometimes quite extensive, program code required to produce each chart.
Matplotlib is used in Python scripts, the Python shell, and the IPython Notebook [22] (for Python 2.x) or Jupyter Notebook [23] (for Python 2.7 and 3.3 or greater). The library blends well with web-based application servers and toolkits for creating graphical user interfaces.
Table 1
Daily Temperature
Monday | Tuesday | Wednesday | Thursday | Friday |
|
---|---|---|---|---|---|
Daytime (°C) |
22 |
23 |
19 |
24 |
15 |
Nighttime (°C) |
12 |
14 |
10 |
14 |
9 |
As an initial example, I look at visualizing daytime and nighttime temperatures measured over five days. Table 1 lists the collected data; in Figure 1 you can see the Python code in the IPython Notebook, and Figure 2 shows the graph it generated.
The chart is defined by importing the two libraries numpy
and matplotlib.pyplot
under the program-specific identifiers np
and plt
(Figure 1, lines 1 and 2).
Line 4 sets the appropriate number of bars for the number of days of measurements. Lines 5 and 6 contain the definition of the temperature values for day and night as a list. In line 7 the arange
method is used to define a range based on the number of days with measurements, and line 8 sets the appearance of a single bar to 35 percent of the total width.
To prepare the graph with the bars in the background, the command in line 9 generates two objects fig
and ax
. Lines 10 and 11 define the appearance of the bars, that is, their position on the x axis, the output of the temperatures as the y value, and the width and color of the bar. The daytime values appear in red and the nighttime values in blue.
Lines 13 to 16 supplement the chart with a y axis label (°C), title, scale, and x-axis label (days of the week). In line 18, the legend for the two bars are set at the top right in a separate box. Finally, plt.show()
renders the chart in the output window (Figure 2).
A second example – the code provided in Listing 1 – visualizes the partitioning of a memory device as a pie chart. Line 1 first includes the matplotlib.pyplot
graphic library under the local identifier plt
. Then lines 3 to 6 define for each piece of pie the name (chartLabels
), the size in percent, the color, and the distance between the pie slice and the center of the graph. The greater the numeric value, the further away from the center the respective piece of pie is shown.
Listing 1
Matplotlib Pie Chart
The call in line 8 renders the pie chart along with a shadow, and the method in Line 9 makes a uniform circle of what would otherwise be a somewhat oval appearance. Finally, line 10 calls plt.show()
to output the prepared pie chart (Figure 3).
PyQtGraph
PyQtGraph, a package written entirely in Python, can be used across operating systems. In addition to graphical representations (visualizations), it offers an interface for application and GUI development by docking onto Qt with two frameworks: PyQt [24] and PySide [25].
In terms of content, it offers visualization for scientific purposes, and a correspondingly high degree of accuracy, by supplementing the functionality of Matplotlib, adding the ability to select a section of the graphical output interactively with the mouse, and then scaling the section and rotating and moving it along the coordinate axes. PyQtGraph is suitable for creating applications in which the representation of the data in the output changes at program run time, or in which the user can intervene (e.g., load curves, animations, video, and movie excerpts).
The example shown in Figure 4 shows a curve in two images side by side. The left curve provides an overview (value range 0 to 1,000) with two markers at 300 and 700 on the x axis; the right curve contains only the snippet inside these marks.
To move the section, first left-click in the highlighted area of the left diagram. Then, hold down the mouse button and drag the pointer left or right. To change the boundaries of the excerpt, left-click on the respective selection, hold down the left mouse button, and drag the boundary of the excerpt. In both cases, the representation in the right chart changes synchronously. You can use the right mouse button to access a contextual menu in which you can modify the presentation (e.g., for the coordinate grid system). You can configure each diagram separately (Figure 5).
Listing 2 shows the underlying program code. After defining the required modules (lines 1 to 3), the Qt application is initialized in line 5. Lines 7 to 9 define a graphic window, 700x200 pixels here, and a corresponding window title.
Listing 2
Graph and Excerpt
Starting in line 11 the two graphs are specified. First, anti-aliasing of the curves shown is enabled, and the scope and data points are defined. The variable p1
represents the left diagram; the counterpart p2
is on the right. Lines 15 and 21 define corresponding titles. Line 16 specifies the color for plotting the data points as an RGBA value. The curve appears in white with an alpha value of 200.
You define the initial section for the right curve in line 17 and the corresponding functions for updating in lines 24 to 28. Lines 30 and 31 create the link between the two charts by referencing the previously defined functions. The final lines 34 and 38 ensure that the Qt application is called correctly.
Bokeh
Bokeh helps you create interactive graphics and pictures to be included in a web page. A web server such as Apache or Nginx then serves up this page with the figure. The library combines the figure with HTML and JavaScript, allowing the viewer to change the appearance of the graphic in their web browser (e.g., selecting a view, a data range, or the color of the data points).
Bokeh is designed to handle large amounts of data. It features a Matplotlib compatibility layer and can be combined with IPython Notebook. The package also includes extensive sample data you can immediately use for test purposes [26], such as the data in Figure 6, which shows an interactive periodic table.
Clicking on an element square opens a small window with further details, as shown for the element calcium (Ca) here. The additional information includes the full name of the element, atomic number, element type, atomic mass, color code according to the CPK model [27], which is a convention for the color representation of atoms in molecule models (named after the chemists Robert Corey, Linus Pauling, and Walter Koltun), as well as the electronic configuration.
The program code for Bokeh examples is quite extensive and complex, so I have not printed it here. Basically, it is equivalent to Matplotlib and PyQtGraph code.
Buy this article as PDF
(incl. VAT)