Article from Issue 193/2016

Big data is like The Matrix – Better without the sequel

q Right, before I understand what NoSQL is, can you give me a quick rundown on SQL?

a That's a great place to start! SQL (or Structured Query Language, often pronounced sequel) is the standard method of getting data out of databases. Its popularity is shown by the fact that it's in almost every major database name: MySQL, PostgreSQL, SQLServer, Oracle….

q Hang on, there's no SQL in the name Oracle.

a OK, the name thing doesn't work with every database, but the Oracle database is still based around SQL as is MariaDB.

The language is emblematic of the relational style of databases that's dominated the data-storage industry pretty much since there have been computers with enough storage to call themselves databases. The idea behind relational databases is that information can be mapped to tables, and these tables can be linked to create complex data models.

For example, if you ran a shop, you could have a database with a table of stock. Each line in the table would be about one item that you stocked and include things like the number of items you currently had. You could also have a table for suppliers that included their address, payment information, etc. The stock table could include a supplier reference number that you could look up in the supplier table. When you needed information from the database, your software could combine these two tables (known as joining them) so that you got a view with each item and the details about the supplier of that item.

One of the great features about the relational model is that each bit of information is only stored once. This is great for data integrity because it means that when a piece of data changes, you only have to update it in one place. If, in our previous example, a supplier changed their address, you'd only have to update the supplier table, and then every query would pull out the right result.

q SQL's sounding pretty good so far. What has anyone got against it?

a Not many people are really against SQL, but there are some classes of information where it's not necessarily the best fit.

SQL databases were designed as ultra-reliable stores for important data, and they still fit this role well. However the model comes with some overheads, and when you start to deal with really large volumes of data, these overheads can get significant. There is a wide range of NoSQL databases, so I don't want to generalize, but typically, they work best when you've got a large amount of data to store and want it spread out across a lot of machines to process it.

q Can you give me any examples of when NoSQL works well?

a One that springs to mind is the ELK (Elasticseach, Logstash, Kibana) stack for monitoring machines. This is typically set up when you have a large number of machines that you want to keep an eye on. They send all their logs to a central Logstash server that processes them and puts the log information into an Elasticsearch database. You might also gather other information, such as the CPU usage or free memory statistics and push them all into the data store as well. Kibana is a web front end that can provide real-time visualizations on what the data looks like.

You can push lots of data in even if it's in different formats – you could have Apache logs in the same store as system logs and Elasticsearch wouldn't care. It leaves the task of decoding the information to the user rather than trying to encapsulate it in the structure of the tables.

In this setup, you can very quickly end up with very large amounts of data, yet at the same time, you want to be able to process it very quickly. Once the logs are written, they never change. Rather than all the guarantees about data consistency that made SQL databases great for their style of data stores, what's really important in an ELK stack is speed. Any delays in writing data to the database mean more processor resources are needed. Likewise, complex visualizations need to be generated in almost real time.

q OK, so NoSQL allows you to store a wide variety of differently organized data in the same place and get it back quickly?

a Well, yes and no. In the previous example, we looked at Elasticsearch, which is a document-orientated database, meaning that it's a place you can keep pushing records in any format (which are the documents). This is one of the most common types of NoSQL database, and many of the most famous new databases work in this way, such as MongoDB and Couchbase. However, others work in very different ways.

For example, key-value stores simply allow you to store data that can be retrieved using a key (unlike other types of database, the records can only be accessed by the key and not by the other data in the record). These key-value stores (e.g., Riak, MemcacheDB, and Apache Cassandra) are highly scalable and perform well even under heavy load. There are also graph databases for storing highly connected data. There isn't a set group of database types that classify as NoSQL; it's just anything that's not relational.

One big difference between SQL and NoSQL databases is that NoSQL databases sometimes can have slight inconsistencies in data.

q Inconsistencies in data? That sounds pretty bad! What's the point in a database if the data isn't correct?

a Relational SQL databases are usually designed around the ACID principal. That is, each action is Atomic (either the entire action happens or none of it does), Consistent (after every action, the entire database is correct), Isolated (only one action happens at a time) and Durable (actions are permanent even if there are failures). This means that whatever you do with the database, nothing can ever be wrong with the data. For example, you can't end up reading the wrong data because a write operation is half-way through, or a broken write operation can get half the data in but not the other half.

NoSQL databases are often designed around the BASE principal. That is, Basic Availability (the database keeps running), Soft State (the data can change at any time because actions aren't atomic) and Eventual Consistency (at some point in the future the database will be right, but it may serve slightly stale data at some points).

BASE guarantees are quite weak, but they're fine for occasions where it's OK for the result of a query to sometimes be a second or two old (such as in a messaging application), and these weaker guarantees make it much easier to scale the databases to huge numbers of users. However, if you need to always get the correct and most up-to-date information with no exceptions, you need a database with ACID guarantees.

q So if you don't use SQL, what language do you use to query these databases?

a There's no standard, and each different database has developed their own language. Some have created new query languages from scratch, and others have built on top of existing programming languages (e.g., JavaScript in MongoDB). Several NoSQL databases can be queried in SQL.

q Wait, hang on a minute. I thought NoSQL meant that there was, well, no SQL.

a Well, it once did – and for many people it still does. However, some projects have put SQL query layers on top of non-relational databases to make them more approachable for people who are familiar with this language. NoSQL is now held by some to mean not only SQL. In truth, the name NoSQL has always been a misnomer because the key factor for this type of database isn't the query language that they use but the fact that they aren't relational.

q This all sounds really interesting. How can I try out NoSQL?

a As NoSQL is diverse class of databases, it's hard to give a single place to start. A good place to dip your toe in the water is the Redis online demonstration [1], which allows you to interact with this key-value store via your web browser (Figure 1). Just don't forget that this is only one type of one class of NoSQL database and that they all work in different ways.

Figure 1: NoSQL with NoInstalling – try the Redis key-value store database online.


  1. Try Redis: https://try.redis.io/

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More