An introduction to PostgreSQL
But How Do You Run It?
Infrastructure engineers are often more interested in how software is run than in how it is being used. In terms of performance, running PostgreSQL is quite simple: A single huge bare-metal instance will almost always offer superior performance for a lower price than an equivalent instance from a popular cloud provider. To give you a simple example, you can host a PostgreSQL instance on an AMD EPYC 7401P machine with 24 cores/48 threads, 512GB RAM, and 2x 960GB NVMe storage for under EUR300 per month. An equivalent AWS RDS instance provider would cost at least an order of magnitude more to just run the instance.
If a single bare-metal instance offers such good performance, then why isn't everybody running PostgreSQL this way? Today's databases are expected to run 24/7 with little or no downtime. A single bare-metal machine means that failures and even upgrades require downtime. Therefore, apart from a specific scenario when turning the database off for half an hour is not a problem, you will rarely see such a PostgreSQL setup.
A far more popular option involves at least two machines where one is known as the primary (formerly, "master") and the rest are replicas. The data from clients is written to the primary server only, and the replicas only get the copies of the data from the primary server. However, read operations also can be performed on replicas, reducing the load on the primary server. In most scenarios, a setup with a single primary and one or more replicas works reasonably well.
A more advanced setup involves multi-master (i.e., multi-primary) replication where more than one server can accept writes from clients, and these writes are then replicated between all servers. While multi-master replication offers availability and distributed access, it suffers in terms of consistency, performance, and integrity. Currently, this is where a lot of work is happening in PostgreSQL because this kind of set up is the most complex.
Modern Data Types
While PostgreSQL is praised for its features, they can also be a double-edged sword. Be aware that once you start using features that are not present in other databases, migrating to a different database will be more difficult. Moreover, intensive use of features like triggers and stored procedures is discouraged as they consume precious resources – and usually you want to reduce the load on your database machine, not increase it. That's why it's common to put a cache like Redis in front of the database to handle the most common queries to delay having to "scale vertically" (a euphemism for upgrading your hardware).
That said, as long as you are aware of the trade-offs, using all of PostgreSQL's possible features is not a problem. One especially appreciated class of features in PostgreSQL is its advanced data types. If you learned SQL, you are familiar with basic types such as INT
, CHAR
, DATE
, and so on. Modern databases go beyond these types and provide users with more advanced and complex data types that suit common use cases. PostgreSQL is no exception, and it's worth knowing these data types whether you decide to use them or not.
hstore
The hstore
data type is a key-value store within a single PostgreSQL value. It's useful for storing sets of key-value pairs within a single column, offering flexibility for semi-structured data. As an example, you can create a simple table:
CREATE TABLE products ( id serial PRIMARY KEY, attributes hstore );
and then insert the attributes as follows:
INSERT INTO products (attributes) VALUES ('color => "blue",size => "medium"');
You can use hstore
where you might have used MongoDB or another NoSQL database in the past: in scenarios where you have data whose structure is not clearly or fully defined, requiring it to be stored without a fixed table structure. In this scenario, you can modify attributes without modifying database schema. This is also useful if you are not sure which attributes will appear in the future and so on. In general, since the initial excitement over NoSQL waned, this kind of approach has been discouraged because it often leads to problems in the long run. However, for data that is primarily key-value in nature, such as configuration settings, profile attributes, or any scenario where the data structure is inherently a map or dictionary, hstore
is an appropriate choice.
Buy this article as PDF
(incl. VAT)