5 Great Open-Source Database Solutions J Simpson July 2, 2020 The database landscape has been quickly evolving in recent years. This shifting landscape is impacting the way developers interact with databases, which influences the databases themselves. Open-source databases are becoming more and more prevalent in light of these shifting needs. According to the Open Source Data Management Software Survey from 2019, conducted by database experts Percona, 89% of developers are using at least one open-source database. This is in part due to the prevalence of developers and enterprises adopting more than one database solution. 92% of developers are using more than one database. Matt Yonkovit of Percona explains the reason for this multi-database trend: “It’s hard for one database to do everything well, so the trend is definitely to use the best database for the job, rather than try and fit into a single technology.” Gartner predicts that by 2022, 70% of applications will feature some sort of open-source database. Incorporating an open-source database into your development workflow is becoming increasingly important. Soon, it will be non-optional. It’s best to familiarize yourself with five of the top open-source database options on the market to get ahead of the curve. To help you get started, we’re assessing some of the most widely-used open-source database solutions and offering insights on where to focus your efforts. 5 Great Open Source Databases Neo4j Documentation | Download Neo4j is a popular, robust graph database. It’s open-source, secure enough for enterprise-level security, and offers fast performance and reliability. Neo4j also features its own query language, Cypher, which can easily be implemented in Apache Spark and Gremlin-based products, thanks to open-source toolkits. Neo4j features a comprehensive graph platform, including Neo4j Bloom for visual exploration; Neo4j ETL and Kettle for data integration; APOC procedures and graph analytic algorithmic libraries; and powerful desktop development tools like Neo4j browser. Graph databases make it easy to understand the interactions between data. They’re also optimized to deal with data interactions. In a test run by Neo4j, developers compared Neo4j against Apache Spark GraphX using the Union-Find and PageRank algorithms. Neo4j outperformed GraphX by a factor of nearly two-to-one for Union-Find and four-to-one for PageRank. Neo4j is one of the most scalable databases available today. It’s also easy to implement, once the relations are established, that is. Matching related items in disjointed modes can be taxing on resources, similar to employing foreign key constraints in a relational database. This is mostly an issue during data imports. If you’re using Neo4j in a high-volume environment, one potential problem you might run into is a master-slave approach to queries. This means that each node has to store the entire database rather than using a distributed model. This has the potential to cause bottlenecks if you have a large volume of read-write queries at the same time. The other main drawback is a lack of range indexes. Instead, Neo4j only features hash indexes. This can make sorting data rather taxing on your system’s resources. Advantages Of Neo4j: Cypher query language Open-source Fast performance Widely supported APOC library Disadvantages of Neo4j: Master-slave distribution Lack of range indexes PostgreSQL Documentation | Download PostgreSQL is one of the longest-running open-source relational databases out there. It’s been going strong for over 25 years, meaning there’s a huge range of resources available for developers and database architects. It’s one of DBEngine’s Top 5 rated databases, even ranked above many commercial databases. PostgreSQL was initially created by Dr. Michael Stonebraker, winner of the Turing Award, following his previous venture into databases, Ingres. PostgreSQL may be open source, but it’s been used to power a number of successful commercial products, from EnterpriseDB to Amazon Redshift. For a while, PostgreSQL (which is also often referred simply as Postgres) wasn’t garnering many headlines, due to its longevity. PostgreSQL simply works. This functionality has seen Postgres enjoying a renewed moment in recent years, as developers are learning to appreciate tried-and-true dependability. PostgreSQL is widely customizable, allowing you to define your own data types and index types. You can also create your own custom plug-ins to further customize how you use the database. It also handles transactions well, surpassing the basic CRUD format. Database operations frequently require making multiple revisions at the same time. That’s easy to implement with Postgres’ transactions. Thorough documentation is another one of PostgreSQL’s strong suits. It features comments for every code command, detailing what that code does and doesn’t do. This makes understanding code simple, even for non-programmers. Finally, PostgreSQL is secure. You can control access for every level of your database, from the OS to the network, allowing you to set different access priorities. Users can range from read only to read/write or any other configuration you could care to use. You can also use these features to secure applications using Postgres, as well. Many of Postgres’ shortcomings have been corrected in recent years, due to its resurgence. Relational databases can be slow and unwieldy, especially when dealing with large datasets. Citus makes PostgreSQL more scalable, making it possible to run on multiple nodes, increasing parallel computing power. Some view it being open source as a drawback, as that means it doesn’t come with a warranty or liability protection. Considering how widespread and robust it is, it doesn’t really need one. PostgreSQL Advantages: Stable Secure Transactions Customizable schema Documentation PostgreSQL Disadvantages: Can be slow Lack of warranty MariaDB Documentation | Download MariaDB, supported by the MariaDB Foundation, is a powerful database solution that is entirely free and open source. It was created to offer powerful, versatile database software at no cost. MariaDB started out as an offshoot of MySQL. That means it performs many of the same functions as the popular open-source database software, but with numerous improvements. MariaDB is fast, for one thing. MariaDB reads and writes 24% faster with MariaDB compared to MySQL. It also supports more connections than MySQL, with the ability to handle more than 200,000 connections simultaneously. Replication is also 2x faster with MariaDB compared to MySQL. MariaDB can be used in nearly any environment you can think of. It can be queried using most of the popular querying languages. This includes PHP, making MariaDB a good choice for plug-and-play web apps and projects. Finally, MariaDB supports a number of storage engines, including PBXT, XtraDB, Maria, and Federated X. This means MariaDB can be used for nearly any data-related function you can think of. MariaDB is taking important steps to becoming a fully-functional, completely open-source database solution. Last year, they rolled out a managed cloud service, making the database suitable for powering cloud-based solutions like predictive analytics. One of MariaDB’s main downsides is a lack of free customer support, which is a given with open-source software. If you want full-blown tech support, you might do better to look elsewhere. MariaDB Advantages: Faster than MySQL Supports more than 200,000 connections Can be used in nearly any environment Supports numerous storage engines Increasingly supports cloud-based solutions MariaDB Disadvantages: No OS X version Lack of support Documentation is incomplete CockroachDB Documentation | Download CockroachDB is a distributed SQL database designed to tackle many of the issues facing traditional databases. CockroachDB is scalable across multiple cloud platforms and utterly reliable, with always-on data availability which can be segmented by location, CockroachDB’s cloud capabilities are further enhanced by CockroachCloud, a cloud-based service that makes CockroachDB painless to implement. This helps to solve some of the issues facing the open-source database upon its initial release. That’s not to say CockroachDB doesn’t have its problems, however. CockroachDB struggles with PostgreSQL compatibility. You’ll need a workaround if you want to use Postgres features with CockroachDB. Some issues, especially those dealing with location or indexing, might not have a solution at all. If you’re dealing with geospatial features (PostGIS) or require full-indexing for your database, you might do well to look elsewhere. CockroachDB Advantages: Distributed Scalable Consistent OLTP CockroachDB Disadvantages: Difficulty integrating with PostgreSQL Limited Indexing Lacks stored procedures, triggers, events, and UDFs RethinkDB Documentation | Download If you’re looking for an open-source alternative to MongoDB, RethinkDB is a good pick. It’s an excellent way to serve JSON documents to a real-time app. It also features a robust query language, which makes it easy to join tables or sort data. RethinkDB easily scales to multiple machines. This prevents the likelihood of outages, as you might find with a central server. It can also be run in the cloud via a Docker file, which can run the database on Amazon Web Services (AWS) or Google Cloud. RethinkDB is a bit barebones, however. You’re not able to run queries from the command line interface, for one thing. There are no user accounts, either, so you’ll have to set up your own users and authentications using a third-party resource like Auth0. RethinkDB Advantages: Easy to install Changefeeds Powerful query language Automatic master promotion Scalable RethinkDB Disadvantages: No user accounts No CLI query Honorable Mentions Redis CouchDB Timescale Did we leave out your favorite database? Let us know in the comments below, and we’ll add it here. Final Thoughts Data becomes more invaluable with each passing year. Data has become an industry in its own right, as more and more businesses figure out ways to leverage their assets in a quickly-shifting business world. With increasing amounts of data being produced each day, and more and more products and services relying on that data, databases are becoming more prevalent, and more important, every day. Choose wisely and save yourself the trouble of having to migrate in a year.