Python graphics libraries for data visualization

Pinpoint Accuracy

© Lead Image © Sergii Gnatiuk, 123RF.com

© Lead Image © Sergii Gnatiuk, 123RF.com

Article from Issue 197/2017
Author(s):

Python's powerful Matplotlib, Bokeh, PyQtGraph, and Pandas libraries lend programmers a helping hand when visualizing complex data and their relationships.

Daily life is inundated by data collected, processed, and made available again in edited form. Examples include weather, temperature, precipitation, humidity, air pressure, wind, sales, and load measurements, as well as vehicle and services data.

A first step is to evaluate data in the simplest form as a table that provides an overview of individual values. Beyond this, a suitable graphical visualization helps you to understand the relationships at a glance. Python has excellent libraries to implement this visualization step, including Matplotlib [1], PyQtGraph [2], Bokeh [3], and Pandas [4]. Appropriate libraries for other languages are summarized the "Visualization in Other Languages" box.

Visualization in Other Languages

If you are not a fan of the Python programming language, you might be able to find a matching library in another language. Perl has PLplot [5] and the chart modules Chart::Clicker [6] and GD::Graph [7], and PHP comes with the GD library [8] but can also be combined with JpGraph [9]. JavaScript-based applications can be fitted out with the InfoVis Toolkit [10], D3 [11], or Crossfilter [12]. Suitable candidates for combining JavaScript and jQuery include jqPlot [13], Visualize [14], and Flot [15]. Alternatively, R statistical language [16] incudes functions for plotting data out of the box. An overview of other, similar frameworks and libraries is available, for example, from the website at Datavisualization.ch [17], and Brian Suda's blog has a great overview of libraries [18].

In addition to showing data as two- or three-dimensional images, the respective API libraries contain interaction (PyQtGraph, Bokeh) and data analysis (Pandas) methods. With a short sample program, I can show you in each case how to take the initial hurdles in stride. (See the "Installation and Variants" box.)

Installation and Variants

Matplotlib, PyQtGraph, and Pandas can all be set up conveniently from the repositories in the major distributions using their respective package managers. Under Debian and Ubuntu, for example, the corresponding packages go by the names python-matplotlib, python-pyqtgraph, and python-pandas. However, Bokeh is not available as a package in many distributions. The best idea is to retrieve the package from the project website and install it via Anaconda or Pip, the package management utility for Python packages from the Python Package Index (PyPI).

The listings in this article are mostly based on Python 2.x, but with few or no changes they will run in Python 3.x, which encodes all strings as a sequence of Unicode characters (e.g., to simplify the use of special characters). A handy migration guide provides insight into the differences between the two versions [19].

Matplotlib

The sophisticated Matplotlib library focuses exclusively on presenting graphs, histograms, power spectra, and bar, pie, and error charts in the Cartesian coordinate system in two- and three-dimensional space. Supplementary toolkits like Basemap [20] and Cartopy [21] offer additional projection types and can combine your data with map data.

The website for the project has more suggestions. When browsing around the chart image gallery, you will find the associated, sometimes quite extensive, program code required to produce each chart.

Matplotlib is used in Python scripts, the Python shell, and the IPython Notebook [22] (for Python 2.x) or Jupyter Notebook [23] (for Python 2.7 and 3.3 or greater). The library blends well with web-based application servers and toolkits for creating graphical user interfaces.

Table 1

Daily Temperature

 

Monday

Tuesday

Wednesday

Thursday

Friday

Daytime (°C)

22

23

19

24

15

Nighttime (°C)

12

14

10

14

9

As an initial example, I look at visualizing daytime and nighttime temperatures measured over five days. Table 1 lists the collected data; in Figure 1 you can see the Python code in the IPython Notebook, and Figure 2 shows the graph it generated.

Figure 1: The code in IPython Notebook for generating the chart.
Figure 2: The weather chart generated with Matplotlib.

The chart is defined by importing the two libraries numpy and matplotlib.pyplot under the program-specific identifiers np and plt (Figure 1, lines 1 and 2).

Line 4 sets the appropriate number of bars for the number of days of measurements. Lines 5 and 6 contain the definition of the temperature values for day and night as a list. In line 7 the arange method is used to define a range based on the number of days with measurements, and line 8 sets the appearance of a single bar to 35 percent of the total width.

To prepare the graph with the bars in the background, the command in line 9 generates two objects fig and ax. Lines 10 and 11 define the appearance of the bars, that is, their position on the x axis, the output of the temperatures as the y value, and the width and color of the bar. The daytime values appear in red and the nighttime values in blue.

Lines 13 to 16 supplement the chart with a y axis label (°C), title, scale, and x-axis label (days of the week). In line 18, the legend for the two bars are set at the top right in a separate box. Finally, plt.show() renders the chart in the output window (Figure 2).

A second example – the code provided in Listing 1 – visualizes the partitioning of a memory device as a pie chart. Line 1 first includes the matplotlib.pyplot graphic library under the local identifier plt. Then lines 3 to 6 define for each piece of pie the name (chartLabels), the size in percent, the color, and the distance between the pie slice and the center of the graph. The greater the numeric value, the further away from the center the respective piece of pie is shown.

Listing 1

Matplotlib Pie Chart

 

The call in line 8 renders the pie chart along with a shadow, and the method in Line 9 makes a uniform circle of what would otherwise be a somewhat oval appearance. Finally, line 10 calls plt.show() to output the prepared pie chart (Figure 3).

Figure 3: The partitioning scheme of a hard disk as a pie chart with Matplotlib.

PyQtGraph

PyQtGraph, a package written entirely in Python, can be used across operating systems. In addition to graphical representations (visualizations), it offers an interface for application and GUI development by docking onto Qt with two frameworks: PyQt [24] and PySide [25].

In terms of content, it offers visualization for scientific purposes, and a correspondingly high degree of accuracy, by supplementing the functionality of Matplotlib, adding the ability to select a section of the graphical output interactively with the mouse, and then scaling the section and rotating and moving it along the coordinate axes. PyQtGraph is suitable for creating applications in which the representation of the data in the output changes at program run time, or in which the user can intervene (e.g., load curves, animations, video, and movie excerpts).

The example shown in Figure 4 shows a curve in two images side by side. The left curve provides an overview (value range 0 to 1,000) with two markers at 300 and 700 on the x axis; the right curve contains only the snippet inside these marks.

Figure 4: A graph with an interactively displaceable section shown separately on the right.

To move the section, first left-click in the highlighted area of the left diagram. Then, hold down the mouse button and drag the pointer left or right. To change the boundaries of the excerpt, left-click on the respective selection, hold down the left mouse button, and drag the boundary of the excerpt. In both cases, the representation in the right chart changes synchronously. You can use the right mouse button to access a contextual menu in which you can modify the presentation (e.g., for the coordinate grid system). You can configure each diagram separately (Figure 5).

Figure 5: Using a context menu to modify the view shown in Figure 4.

Listing 2 shows the underlying program code. After defining the required modules (lines 1 to 3), the Qt application is initialized in line 5. Lines 7 to 9 define a graphic window, 700x200 pixels here, and a corresponding window title.

Listing 2

Graph and Excerpt

 

Starting in line 11 the two graphs are specified. First, anti-aliasing of the curves shown is enabled, and the scope and data points are defined. The variable p1 represents the left diagram; the counterpart p2 is on the right. Lines 15 and 21 define corresponding titles. Line 16 specifies the color for plotting the data points as an RGBA value. The curve appears in white with an alpha value of 200.

You define the initial section for the right curve in line 17 and the corresponding functions for updating in lines 24 to 28. Lines 30 and 31 create the link between the two charts by referencing the previously defined functions. The final lines 34 and 38 ensure that the Qt application is called correctly.

Bokeh

Bokeh helps you create interactive graphics and pictures to be included in a web page. A web server such as Apache or Nginx then serves up this page with the figure. The library combines the figure with HTML and JavaScript, allowing the viewer to change the appearance of the graphic in their web browser (e.g., selecting a view, a data range, or the color of the data points).

Bokeh is designed to handle large amounts of data. It features a Matplotlib compatibility layer and can be combined with IPython Notebook. The package also includes extensive sample data you can immediately use for test purposes [26], such as the data in Figure 6, which shows an interactive periodic table.

Figure 6: Interactive periodic table of the elements.

Clicking on an element square opens a small window with further details, as shown for the element calcium (Ca) here. The additional information includes the full name of the element, atomic number, element type, atomic mass, color code according to the CPK model [27], which is a convention for the color representation of atoms in molecule models (named after the chemists Robert Corey, Linus Pauling, and Walter Koltun), as well as the electronic configuration.

The program code for Bokeh examples is quite extensive and complex, so I have not printed it here. Basically, it is equivalent to Matplotlib and PyQtGraph code.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Tutorial – Matplotlib

    Matplotlib offers so many options that you may have trouble deciding on which ones to use for your plots.

  • Analytics with Python and KDD

    The Knowledge Discovery in Data Mining (KDD) method breaks the business of data analytics into easy-to-understand steps. We'll show you how to get started with KDD and Python.

  • PyScript

    PyScript lets you use your favorite Python libraries on client-side web pages.

  • Packet Analysis with Scapy

    The Scapy packet manipulation program lets you analyze and manipulate packets to create incident response reports or examine network security.

  • Introduction

    This month in Linux Voice.

comments powered by Disqus