Working with the JSON data format
Data Dog
JSON data format is a standard feature of today's Internet – and a common option for mobile and desktop apps – but many users still regard it as something of a mystery. We'll take a close look at JSON format and some of the free tools you can use for reading and manipulating JSON data.
Our world of web applications and fast, interactive mobile devices calls for the free exchange of data in easily accessible forms. Standard formats promote interoperability and minimize development time. Open formats also make it easy to import data into other applications. Over the years, several popular alternatives have emerged. CSV, XML, and YAML are well known and easy to adapt to different applications (see the box entitled "Comparing Formats" and Listings 1-3 for examples). One format that is used extensively for web applications, mobile applications, and even some conventional desktop tools is JavaScript Object Notation (JSON) [1].
Comparing Formats
Using comma-separated values (CSV) ensures the data remains manageable for the most part, but the format is neither standardized nor particularly flexible. In a file like the one in Listing 1, neither the character set nor the separator between columns is fixed. In practical applications, spaces, tabs, hyphens, or semicolons are sometimes used instead of commas. Corresponding key data either has to be agreed upon or inferred from the file itself. Furthermore, the format does not allow nested records, arrays, or binary data.
The Extensible Markup Language (XML) is far more structured and flexible than CSV. A notable feature of XML is the data field enclosed in the field name with the form <fieldname>value</fieldname>
(Listing 2). In practice, it makes sense to choose field names that let you infer the contents. The order of the fields is usually variable in a layer, and fields can be missing. One disadvantage of XML is that an XML file is significantly larger due to the need to continually repeat the field labels for each entry.
YAML is a recursive acronym for YAML Ain't Markup Language. The YAML specification describes a very compact way to serialize data. Hyphens and indentations using spaces serve as the basis for denoting fields. YAML borrows from XML, as well as from the format in which three programming languages (Python, Perl, and C) describe their data structures. Listing 3 shows the book inventory data as a YAML structure.
JSON [1] is based on JavaScript. The format is also very compact and flexible. In contrast to YAML, JSON explicitly identifies objects and their attributes, whereas in YAML, the assignment is derived from the context of the indentation depth.
Listing 1
CSV File
Stephen Fry; The Hippopotamus; 1994 Ian Rankin; Set In Darkness; 2009 Ken Follett; The Pillars of the Earth; 1989
Listing 2
XML File
<inventory> <book> <author>Stephen Fry</author> <title>The Hippopotamus</title> <publication>1994</publication> </book> <book> <author>Ian Rankin</author> <title>Set In Darkness</title> <publication>2009</publication> </book> <book> <author>Ken Follett</author> <title>The Pillars of the Earth</title> <publication>1989</publication> </book> </inventory>
Listing 3
YAML File
--- book: - author: Stephen Fry title: The Hippopotamus publication: '1994' - author: Ian Rankin title: Set In Darkness publication: '2009' - author: Ken Follett title: The Pillars of the Earth publication: '1989'
JSON is wildly popular as a tool for passing information between web apps – for instance, it is currently the de facto standard for REST services – yet for many users, the details of JSON format are shrouded in mystery. This article takes a close look at JSON and some of the tools available for reading, manipulating, and importing JSON data.
Understanding JSON
The notation of JSON is analogous to objects, records, or dictionaries – depending on what that structure is currently called in your favorite programming language. Even though JSON format is based on JavaScript, parsers exist in almost all programming languages. In addition to Awk and C/C++, you can integrate JSON with Fortran, Go, Lisp, Lua, Python, and Visual Basic.
In everyday life, you will find the format in the data-sharing Jupyter Notebook app [2], in geographical specifications like GeoJSON [3] (Listing 4), and even in databases like MongoDB.
Listing 4
GeoJSON File
{ "type": "Feature", "geometry": { "type": "Point", "coordinates": [125.6, 10.1] }, "properties": { "name": "Dinagat Islands" } }
Taking a closer look at the JSON data structure, you will see that it is in an easy-to-read, text-based format. Parentheses, colons, and commas separate the individual elements; the data can be nested as desired. This means, for example, that you can map lists, arrays, or objects. Table 1 summarizes the elementary data types that JSON supports.
Table 1
JSON Data Types
Data Type | Description |
---|---|
Strings |
All Unicode characters except ", \, and control characters |
Numbers |
Numeric values including hexadecimal and exponential values, for example 0x42 and .2e-3 |
Boolean values |
Logic values true and false |
Arrays |
Comma-separated, unordered lists of properties, although objects without properties are also allowed |
Objects with properties |
Notation as key-value pairs |
Null values |
null, NIL, or () |
Mark the individual structure levels with brackets and indentation for better readability (Listing 5). Pairs of curly braces ({
and }
) each form a unit. Square brackets ([
and ]
) are used to indicate fields (also known as arrays). Individual field elements follow the form of an enumeration and are separated by commas. Each field element consists of a key-value pair separated by a colon (:)
.
Listing 5
JSON File
{"book": [ { "author": "Stephen Fry", "title": "The Hippopotamus", "publication": "1994" }, { "author": "Ian Rankin", "title": "Set In Darkness", "publication": "2009" }, { "author": "Ken Follett", "title": "The Pillars of the Earth", "publication": "1989" } ]}
JSON was originally created in the early 2000s to exchange data between web applications. JSON worked quite well in the web context (even though you cannot always parse it unambiguously). To structure data, JSON falls back on conventions familiar to anyone who has programmed in a C-based language (C, C++, C#, Java, JavaScript, Perl, Python, and others).
JSON is specified according to RFC 8259 [4] and ECMA-404; common extensions are JSONP (JSON with padding), JSONPP (JSON with padding and parameters), and JSONML, which combines XML and JSON together. The character set for all JSON formats is Unicode (UTF-8), which eliminates the character-set guessing game that you will be familiar with from CSV.
You can use JSON to exchange smaller volumes of data between applications in an agile way. However, if the transferred data volume increases (e.g., if you have millions of measurements from a sensor), JavaScript-based Python libraries like Ipywidgets, Bokeh, and Plotly often fail. In the face of large data volumes, binary transport mechanisms are a better option for handling the load.
Tool Overview
Several command-line tools are available for parsing, processing, and outputting JSON data. Table 2 summarizes some of the available tools. All of these tools are available as packages for Debian GNU/Linux, Ubuntu, Devuan, Linux Mint, and macOS.
Table 2
JSON Tools
Tool | Language | Application |
---|---|---|
aeson-pretty [5] |
Haskell |
Output JSON in a readable way |
jc [6] |
Python |
Convert output to JSON |
jid [7] |
Go |
Interactively filter JSON |
jo [8] |
C |
JSON output in the shell |
jq [9] |
C |
Output and filter JSON in a readable way |
Jshon [10] |
C |
Read and generate JSON |
JSONLint [11] |
PHP |
Validate JSON data |
Not all of the tools in Table 2 are intuitive, and some of them only develop their full impact in a specific context. You will find more information on these tools in the various documents and cheat sheets available online [12].
Easily Readable JSON Output
When it comes to pretty printing, aeson-pretty, jc, jo, jq, and Jshon all have something to say. Some of the tools have a command-line parameter for printing, for example, -p
for jo.
In Listing 6, cat
and aeson-pretty
work together for readable output via a pipe. Jq delivers the same results with the next call, but the output is in color (Figure 1):
$ jq . book_inventory.json
Listing 6
Printing with aeson-pretty
$ cat book-inventory.json | aeson-pretty { "book": [ { "publication": "1994", "author": "Stephen Fry", "title": "The Hippopotamus" }, { "publication": "2009", "author": "Ian Rankin", "title": "Set In Darkness" }, { "publication": "1989", "author": "Ken Follett", "title": "The Pillars of the Earth" } ] } $
The dot in the call to jq is not immediately understandable. It stands for the expression to be processed; in this case, it denotes all objects specified as parameters in the JSON file. You can define colorizing of the output using two options, -C
(--colour-output
) and -M
(--monochrome-output
).
Some users prefer compact output with as few (space) characters as possible. In Listing 7, see aeson-pretty with the -c
(short for --compact
) option. This option reduces the number of characters in the output by 45 percent, from 428 to 236 bytes. Compared to Listing 5 and Listing 6, the results still convey the same information, but with only half the amount of data.
Listing 7
Compact Output with aeson-pretty
$ cat book-inventory.json | aeson-pretty -c {"book":[{"publication":"1994","author":"Stephen Fry","title":"The Hippopotamus"}, {"publication":"2009","author":"Ian Rankin","title":"Set In Darkness"}, {"publication":"1989","author":"Ken Follett","title":"The Pillars of the Earth"}]}
Buy this article as PDF
(incl. VAT)