Disaster tolerance with Apache Cassandra
Highly Available
The size and scope of today's Internet companies require more than your average SQL. Apache Cassandra is one of the NoSQL systems filling the need for high availability at scale.
Apache Cassandra is an open source NoSQL distributed database that stores and manages large volumes of data on standard servers. Cloud providers use Cassandra for configurations with many data centers spread across global networks.
The story of Apache Cassandra began in 2007 when Facebook engineers Prashant Malik and Avinash Lakshman developed a very early version for Facebook's inbox search. The challenge was to store the data for huge datasets residing on hundreds of servers. A year later, Facebook released Cassandra on Google Code, making it an open source project. In 2009, it joined the Apache incubator, paving the way to it becoming a top-level Apache Foundation project. Since then, many well-known companies have implemented Cassandra or a commercial version (DataStax Enterprise), including Apple, Netflix, Twitter, Sony, eBay, Walmart, and FedEx. Cassandra and other NoSQL alternatives are part of a new generation of data tools designed to fulfill the massive storage needs of the Internet era. A conventional relational database, such as an SQL database, is difficult to cluster, subdivide, or scale horizontally. Companies can either keep their data at a single location and let their customers contend with long wait times to access it remotely, or they can operate two instances of the database. Neither of these scenarios is viable for a modern international company that needs both global data availability and the ability to grow without incurring additional costs. NoSQL systems are built to be extremely scalable. To increase performance, you can simply add additional nodes to the cluster on the fly. To double the performance of the database, you just need to add the same number of nodes as the cluster already has. Apache Cassandra is based on Java and has symmetrical nodes organized in clusters, rather than the master and named nodes used with SQL implementations. Cassandra is useful for real-time data storage for online applications with multiple transactions. You can also use Cassandra as a read-intensive database for business intelligence systems. If you're accustomed to SQL, you'll find that the Cassandra Query Language (CQL) is strongly reminiscent of SQL in terms of syntax and keywords. Cassandra is designed for a distributed environment. To fully implement Cassandra's disaster tolerance capabilities on a massive scale, companies need to distribute the data across different regions or even different cloud providers. If one instance fails, some latency may occur, but the data remains available.
CAP Theorem
The CAP theorem is a principle of computer science that helps to explain why NoSQL systems like Cassandra differ from conventional data tools. The CAP theorem (or Brewer's theorem), which describes the relationship between consistency (C), availability (A), and partition tolerance (P), was first articulated by Eric Allen Brewer, Professor Emeritus of Computer Science at University of California, Berkeley and Vice President of Infrastructure at Google. CAP forms the basis for planning a distributed architecture. The basic parts of the CAP decision framework are:
- Consistency: "Each read operation accesses the last write operation or an error." A consistent system returns the same value from each node that is requested.
- Availability: "Each request receives an error-free response." Whatever happens within the cluster does not affect the clients. A highly available system always sends an answer, even if half of the cluster is already dead.
- Partition tolerance: "The system continues to work despite network problems between nodes." A partition-tolerant system continues to run even if there are serious communication problems within the cluster.
The CAP theorem states that no distributed system can fully achieve all three of these objectives. Because a distributed database must continue to operate if the network stops or part of the system is down, the third objective (partition tolerance) is required. That means a distributed database can either be consistent and partition-tolerant (CP) but less available; or it can be highly available and partition-tolerant (AP) and less consistent (Figure 1). These two mutually exclusive options are best understood if you consider the basic trade-offs between consistency and availability. If you wish to maximize availability, the system must continue to receive data when the distributed nodes are not able to communicate with each other (e.g., it can't just stop working and send an error message). But a scenario that calls for a node to provide data to a user when it is unable to verify that the data is up to date does not fulfill the ideal for consistency. On the other hand, if you wish to maximize consistency, the system will need to ensure that the data provided to the user is the latest version or else return an error, which means that, if the nodes cannot communicate, the system would not be fully available.
Cassandra is known as an AP system, because it maximizes availability and partition tolerance at the expense of consistency. Cassandra's developers are willing to tolerate some inconsistency in order to ensure that the database remains available when operating in a partitioned state. This emphasis on availability over consistency is one reason why Cassandra is so highly scalable compared to many conventional database options. The steps necessary to maximize consistency do not scale for multiple nodes and large datasets. However, although Cassandra emphasizes availability in the partitioned state, it does include synchronization features that provide consistency among the distributed nodes in normal operating conditions.
No SPoF
Cassandra's AP design, with its emphasis on availability, requires that the system eliminate all Single Points of Failure (SPoF). (In a relational database, by contrast, each master node is a potential SPoF.) In order to eliminate SPoF, either all components must be designed redundantly or the design must reflect a masterless architecture, in which the nodes are peers. In the case of Cassandra, every node can process a request, no matter if it needs to read or write. If one of these nodes fails, its data must be available at another location – waiting to restore a backup is not an option for a system designed to achieve zero downtime. Instead, the data is provisionally replicated before anyone needs it. Cassandra lets you define a replication factor. If you set the replication to 3
or 5
, each data element is replicated in the corresponding number of nodes. Redundant replication causes additional costs; however, the cost of storage is small compared to the loss of reputation and long-term economic damage associated with lost data. It is also important to remember that replicas should not reside next to each other on the servers. Servers in the same rack tend to fail together. Any event can paralyze not only the rack, but the entire data center. It is therefore advisable to opt for georedundant replication.
Automatic Recovery
More servers means the higher the probability of a failure. A cluster must be capable of restoring itself independently. There are two mechanisms for this restoration:
- Announced redirects: When one node fails, other nodes start keeping updates for the failed node. If the node is regenerated within a reasonable amount of time (typically one to four hours, depending on the configuration), these handover packages are reinstalled on the node, restoring it completely and autonomously.
- Repair/NodeSync: If network delays or similar issues cause problems, a cluster performs a health check and recovery operation. In Apache Cassandra, this is known as a "Repair."
The nodes constantly communicate with one another in order to implement workaround options when needed. In Cassandra, nodes immediately report when a new node enters the cluster or an old node fails. When a client application connects to the database using the specified IP address, it loads the database metadata and prepares to send a request to each subsequent node if the actual target node is not reached. Depending on the setting, an application may repeatedly request other nodes. Each node in the cluster can receive a request for data. A node receiving the request acts as a "coordinator node" and sends the request to the nodes responsible for data. This system requires knowledge of the cluster metadata, which the nodes constantly exchange. Each node knows the cluster schema and the position of all usable nodes.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
AlmaLinux Unveils New Hardware Certification Process
The AlmaLinux Hardware Certification Program run by the Certification Special Interest Group (SIG) aims to ensure seamless compatibility between AlmaLinux and a wide range of hardware configurations.
-
Wind River Introduces eLxr Pro Linux Solution
eLxr Pro offers an end-to-end Linux solution backed by expert commercial support.
-
Juno Tab 3 Launches with Ubuntu 24.04
Anyone looking for a full-blown Linux tablet need look no further. Juno has released the Tab 3.
-
New KDE Slimbook Plasma Available for Preorder
Powered by an AMD Ryzen CPU, the latest KDE Slimbook laptop is powerful enough for local AI tasks.
-
Rhino Linux Announces Latest "Quick Update"
If you prefer your Linux distribution to be of the rolling type, Rhino Linux delivers a beautiful and reliable experience.
-
Plasma Desktop Will Soon Ask for Donations
The next iteration of Plasma has reached the soft feature freeze for the 6.2 version and includes a feature that could be divisive.
-
Linux Market Share Hits New High
For the first time, the Linux market share has reached a new high for desktops, and the trend looks like it will continue.
-
LibreOffice 24.8 Delivers New Features
LibreOffice is often considered the de facto standard office suite for the Linux operating system.
-
Deepin 23 Offers Wayland Support and New AI Tool
Deepin has been considered one of the most beautiful desktop operating systems for a long time and the arrival of version 23 has bolstered that reputation.
-
CachyOS Adds Support for System76's COSMIC Desktop
The August 2024 release of CachyOS includes support for the COSMIC desktop as well as some important bits for video.