Considering FOSS Databases

Doghouse – Databases

Article from Issue 272/2023

There are many FOSS databases available inexpensively today, and they might serve new projects well.

Recently a friend started taking a postgraduate course in software engineering, and part of that course was the topic of databases. My friend had learned a little about them in his undergraduate degree in computer engineering, but not that much. Perhaps he would have learned more if he had studied "Computer Science."

When I started at my first full-time job after university, databases as we know them were just getting started. System R at IBM and Ingres at the University of California, Berkeley, were research projects, trying to determine the best way to store and access data. Often funded by research grants from the government, tapes of the code were available at a nominal charge.

If you did not use a database to store your data, you had to deal with various issues such as backing up your data to get a consistent view of your total data, different byte and word size, different endianness types (little endian vs. big endian), loss of data due to power failures and system crashes due to lack of journaling, pulling large amounts of data over the very slow networking of the day, doing transaction processing (which is often used in business), and a variety of other issues. Eventually databases could also store and run complicated functions inside the engine itself, locating the processing right next to the data.

There was also a formalism of data when you used a database. Companies created data dictionaries and had data administrators who thought about the data itself, and not just the programs that were creating data or using data. Programmers and data administrators concentrated more effort on filtering bad data from getting into your database, which makes programming better overall.

Languages for accessing the data were created, and the one that most people know today is Structured Query Language (SQL), which has been extended over time to have new abilities.

Unfortunately for databases, the prevailing model of selling them in 1986 was as very expensive closed-source products. Commercial databases might cost as much as $100,000 per server, and that was when $100,000 was "a lot of money." Consequently, in 1986 only about four percent of Unix systems had a database engine on the system, and that market was split between the market leaders (Oracle, Informix, Sybase, Ingres, and more).

"Commercial" operating systems such as Digital Equipment Corporation's (DEC's) VMS or IBM's MVS systems were considered to be the premier platforms for databases, and the database engines would write their files into the filesystems.

Not on Unix. On Unix the database companies insisted on writing directly to partitions of the disk, bypassing the filesystems altogether and the filesystem buffers, making it difficult to have data logs and backups done. Plus the performance of the databases was poorer on Unix systems than on a proprietary "commercial" OS of the same hardware platform.

As a product manager of Unix systems at DEC, I worked with an Ingres salesman to embed a relational database engine into every Ultrix, our trademarked Unix-like system we sold at a price so low that the customer didn't even notice they were paying for it. Instead of only one percent of our Ultrix systems having a database engine for use by programs, now 100 percent of the systems could use them.

After we had accomplished this contract, I asked our Ingres salesman why Ingres worked so well on VMS (where his developers did their programming) and not so great on over 100 different Unix systems they supported.

He was a little embarrassed at first, but with prompting he gave me a white paper entitled, essentially, "The Eleven Reasons Why Unix Systems S  *#k as a Database Platform," written by some of his engineers.

According to the paper, Unix needed multi-threaded I/O, good threading in general (many early Unix systems had single-threaded libraries incapable of multi-threading), synchronous filesystems, mmap (the ability to map files into virtual memory), ability to lock virtual memory into RAM, and a few other components, all of which were available through software called POSIX real-time extensions which Ultrix already had, but were not installed through default. The Ingres engineers were using compatibility libraries to replace the functionality they used on VMS, but did not know they could have that functionality native on Ultrix.

I informed the Ingres engineers that all they had to do was ask the customers to install that software package (which they'd already paid for) and they could have Ingres run as efficiently on Ultrix as it did on VMS. The next release of Ingres took advantage of the POSIX real-time extensions.

Today we have many FOSS databases to use, most of them low or no cost for the license. It could be worthwhile to use them in new projects.

The Author

Jon "maddog" Hall is an author, educator, computer scientist, and free software pioneer who has been a passionate advocate for Linux since 1994 when he first met Linus Torvalds and facilitated the port of Linux to a 64-bit system. He serves as president of Linux International®.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus