Working with the JSON data format

Data Dog

© Lead Image © Fabian Schmidt, Fotolia.com

© Lead Image © Fabian Schmidt, Fotolia.com

Article from Issue 247/2021
Author(s): , Author(s):

JSON data format is a standard feature of today's Internet – and a common option for mobile and desktop apps – but many users still regard it as something of a mystery. We'll take a close look at JSON format and some of the free tools you can use for reading and manipulating JSON data.

Our world of web applications and fast, interactive mobile devices calls for the free exchange of data in easily accessible forms. Standard formats promote interoperability and minimize development time. Open formats also make it easy to import data into other applications. Over the years, several popular alternatives have emerged. CSV, XML, and YAML are well known and easy to adapt to different applications (see the box entitled "Comparing Formats" and Listings 1-3 for examples). One format that is used extensively for web applications, mobile applications, and even some conventional desktop tools is JavaScript Object Notation (JSON) [1].

Comparing Formats

Using comma-separated values (CSV) ensures the data remains manageable for the most part, but the format is neither standardized nor particularly flexible. In a file like the one in Listing 1, neither the character set nor the separator between columns is fixed. In practical applications, spaces, tabs, hyphens, or semicolons are sometimes used instead of commas. Corresponding key data either has to be agreed upon or inferred from the file itself. Furthermore, the format does not allow nested records, arrays, or binary data.

The Extensible Markup Language (XML) is far more structured and flexible than CSV. A notable feature of XML is the data field enclosed in the field name with the form <fieldname>value</fieldname> (Listing 2). In practice, it makes sense to choose field names that let you infer the contents. The order of the fields is usually variable in a layer, and fields can be missing. One disadvantage of XML is that an XML file is significantly larger due to the need to continually repeat the field labels for each entry.

YAML is a recursive acronym for YAML Ain't Markup Language. The YAML specification describes a very compact way to serialize data. Hyphens and indentations using spaces serve as the basis for denoting fields. YAML borrows from XML, as well as from the format in which three programming languages (Python, Perl, and C) describe their data structures. Listing 3 shows the book inventory data as a YAML structure.

JSON [1] is based on JavaScript. The format is also very compact and flexible. In contrast to YAML, JSON explicitly identifies objects and their attributes, whereas in YAML, the assignment is derived from the context of the indentation depth.

Listing 1

CSV File

Stephen Fry; The Hippopotamus; 1994
Ian Rankin; Set In Darkness; 2009
Ken Follett; The Pillars of the Earth; 1989

Listing 2

XML File

<inventory>
  <book>
    <author>Stephen Fry</author>
    <title>The Hippopotamus</title>
    <publication>1994</publication>
  </book>
  <book>
    <author>Ian Rankin</author>
    <title>Set In Darkness</title>
    <publication>2009</publication>
  </book>
  <book>
    <author>Ken Follett</author>
    <title>The Pillars of the Earth</title>
    <publication>1989</publication>
  </book>
</inventory>

Listing 3

YAML File

---
book:
- author: Stephen Fry
  title: The Hippopotamus
  publication: '1994'
- author: Ian Rankin
  title: Set In Darkness
  publication: '2009'
- author: Ken Follett
  title: The Pillars of the Earth
  publication: '1989'

JSON is wildly popular as a tool for passing information between web apps – for instance, it is currently the de facto standard for REST services – yet for many users, the details of JSON format are shrouded in mystery. This article takes a close look at JSON and some of the tools available for reading, manipulating, and importing JSON data.

Understanding JSON

The notation of JSON is analogous to objects, records, or dictionaries – depending on what that structure is currently called in your favorite programming language. Even though JSON format is based on JavaScript, parsers exist in almost all programming languages. In addition to Awk and C/C++, you can integrate JSON with Fortran, Go, Lisp, Lua, Python, and Visual Basic.

In everyday life, you will find the format in the data-sharing Jupyter Notebook app [2], in geographical specifications like GeoJSON [3] (Listing 4), and even in databases like MongoDB.

Listing 4

GeoJSON File

{
  "type": "Feature",
  "geometry": {
    "type": "Point",
    "coordinates": [125.6, 10.1]
  },
  "properties": {
    "name": "Dinagat Islands"
  }
}

Taking a closer look at the JSON data structure, you will see that it is in an easy-to-read, text-based format. Parentheses, colons, and commas separate the individual elements; the data can be nested as desired. This means, for example, that you can map lists, arrays, or objects. Table 1 summarizes the elementary data types that JSON supports.

Table 1

JSON Data Types

Data Type

Description

Strings

All Unicode characters except ", \, and control characters

Numbers

Numeric values including hexadecimal and exponential values, for example 0x42 and .2e-3

Boolean values

Logic values true and false

Arrays

Comma-separated, unordered lists of properties, although objects without properties are also allowed

Objects with properties

Notation as key-value pairs

Null values

null, NIL, or ()

Mark the individual structure levels with brackets and indentation for better readability (Listing 5). Pairs of curly braces ({ and }) each form a unit. Square brackets ([ and ]) are used to indicate fields (also known as arrays). Individual field elements follow the form of an enumeration and are separated by commas. Each field element consists of a key-value pair separated by a colon (:).

Listing 5

JSON File

{"book": [
  {
    "author": "Stephen Fry",
    "title": "The Hippopotamus",
    "publication": "1994"
  },
  {
    "author": "Ian Rankin",
    "title": "Set In Darkness",
    "publication": "2009"
  },
  {
    "author": "Ken Follett",
    "title": "The Pillars of the Earth",
    "publication": "1989"
  }
]}

JSON was originally created in the early 2000s to exchange data between web applications. JSON worked quite well in the web context (even though you cannot always parse it unambiguously). To structure data, JSON falls back on conventions familiar to anyone who has programmed in a C-based language (C, C++, C#, Java, JavaScript, Perl, Python, and others).

JSON is specified according to RFC 8259 [4] and ECMA-404; common extensions are JSONP (JSON with padding), JSONPP (JSON with padding and parameters), and JSONML, which combines XML and JSON together. The character set for all JSON formats is Unicode (UTF-8), which eliminates the character-set guessing game that you will be familiar with from CSV.

You can use JSON to exchange smaller volumes of data between applications in an agile way. However, if the transferred data volume increases (e.g., if you have millions of measurements from a sensor), JavaScript-based Python libraries like Ipywidgets, Bokeh, and Plotly often fail. In the face of large data volumes, binary transport mechanisms are a better option for handling the load.

Tool Overview

Several command-line tools are available for parsing, processing, and outputting JSON data. Table 2 summarizes some of the available tools. All of these tools are available as packages for Debian GNU/Linux, Ubuntu, Devuan, Linux Mint, and macOS.

Table 2

JSON Tools

Tool

Language

Application

aeson-pretty [5]

Haskell

Output JSON in a readable way

jc [6]

Python

Convert output to JSON

jid [7]

Go

Interactively filter JSON

jo [8]

C

JSON output in the shell

jq [9]

C

Output and filter JSON in a readable way

Jshon [10]

C

Read and generate JSON

JSONLint [11]

PHP

Validate JSON data

Not all of the tools in Table 2 are intuitive, and some of them only develop their full impact in a specific context. You will find more information on these tools in the various documents and cheat sheets available online [12].

Easily Readable JSON Output

When it comes to pretty printing, aeson-pretty, jc, jo, jq, and Jshon all have something to say. Some of the tools have a command-line parameter for printing, for example, -p for jo.

In Listing 6, cat and aeson-pretty work together for readable output via a pipe. Jq delivers the same results with the next call, but the output is in color (Figure 1):

$ jq . book_inventory.json

Listing 6

Printing with aeson-pretty

$ cat book-inventory.json | aeson-pretty
{
  "book": [
    {
      "publication": "1994",
      "author": "Stephen Fry",
      "title": "The Hippopotamus"
    },
    {
      "publication": "2009",
      "author": "Ian Rankin",
      "title": "Set In Darkness"
    },
    {
      "publication": "1989",
      "author": "Ken Follett",
      "title": "The Pillars of the Earth"
    }
  ]
}
$
Figure 1: The Jq tool puts in your JSON output and keeps the output readable.

The dot in the call to jq is not immediately understandable. It stands for the expression to be processed; in this case, it denotes all objects specified as parameters in the JSON file. You can define colorizing of the output using two options, -C (--colour-output) and -M (--monochrome-output).

Some users prefer compact output with as few (space) characters as possible. In Listing 7, see aeson-pretty with the -c (short for --compact) option. This option reduces the number of characters in the output by 45 percent, from 428 to 236 bytes. Compared to Listing 5 and Listing 6, the results still convey the same information, but with only half the amount of data.

Listing 7

Compact Output with aeson-pretty

$ cat book-inventory.json | aeson-pretty -c
{"book":[{"publication":"1994","author":"Stephen Fry","title":"The Hippopotamus"},
{"publication":"2009","author":"Ian Rankin","title":"Set In Darkness"},
{"publication":"1989","author":"Ken Follett","title":"The Pillars of the Earth"}]}

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • File Inspector

    Spotify, the Internet music service, collects data about its users and their taste in music. Mike Schilli requested a copy of his files to investigate them with Go.

  • Xidel

    Xidel lets you easily extract and process data from XML, HTML, and JSON documents.

  • Migrating Music

    Use a Python API to migrate a music library from SQL to a NoSQL document database.

  • Perl: YouTube Statistics

    Hobby YouTuber Mike Schilli is interested in whether his videos go viral. What better way to check skyrocketing viewer numbers than letting a Perl script analyze the daily trends and watch for unexpected upswings?

  • Gimme Output

    Armed with just json.tool and jq, Charly preps the JSON data delivered by his Philips Hue bridge so that even humans can read it – an essential step towards improving the usability of his home automation system.

comments powered by Disqus