kentoh - Fotolia
A side-by-side comparison of MongoDB vs Cassandra databases
How do you choose one NoSQL database over another? Expert Christopher Tozzi explains the difference between MongoDB and Cassandra.
MongoDB and Cassandra both fall into the NoSQL family of databases. They also both happen to be open source. The similarities, though, mostly end there.
Similarities between MongoDB and Cassandra
Before delving into how MongoDB and Cassandra are different, let's document what they have in common.
They're both databases, obviously. More importantly, they are both examples of NoSQL databases. NoSQL is a type of database architecture in which data is stored in a relatively unstructured fashion. Compared to more traditional SQL-style databases, NoSQL can be a more efficient way of storing the large quantities of unstructured data that organizations commonly use for big data operations.
MongoDB and Cassandra are also both open source -- although commercial implementations are available, too. But even in that respect, their performance is not identical. MongoDB is governed by GNU Affero General Public License 3.0, whereas Cassandra is subject to Apache License 2.0.
Both databases have been around for about a decade. Cassandra debuted in 2008 and MongoDB in 2009. They're equally new in that respect when compared to databases like MySQL, which originated in the mid-1990s.
Finally, both databases support the big three OSes: Windows, Linux and macOS. Cassandra, though, also runs on Berkeley Software Distribution-based Oses.
MongoDB vs. Cassandra: Key differences
In most other important respects, though, MongoDB and Cassandra are different beasts.
High availability strategy
MongoDB's and Cassandra's respective data availability strategies are perhaps the biggest factors that set them apart.
In a Cassandra deployment, you can set up multiple master nodes. If one or several master nodes fail, your database will remain available as long as at least one master is still standing. This highly distributed and redundant model makes it easy to achieve high availability (HA) in Cassandra -- provided, of course, that you have the spare infrastructure available to set up multiple master nodes.
MongoDB doesn't ignore the issue of HA, but its strategy is based on the idea of automatic failover. You can only set up one master node in a MongoDB cluster. If the master fails, a slave node will automatically be converted to become the new master. That ensures continuity, but the process doesn't happen instantly. It typically takes nearly a minute.
Whether a minute of data storage disruption is acceptable or not depends on exactly what you are trying to do. But, however you slice it, Cassandra is more resilient and highly available than MongoDB.
MongoDB's limitation to a single master node also has important implications for the speed at which data can be written to a database. Data writes must be recorded on the master, and since a MongoDB cluster has only one master, its ability to write new information to the database is strictly limited by the capacity of that single master node.
With Cassandra, each master node can accept different writes in parallel. Therefore, the more master nodes you have, the more data you can write at once. If you need to write a lot of data, your allegiance should probably lean towards Cassandra.
Although both databases fall within the NoSQL family, NoSQL is a broad category. When you get down to the details, NoSQL database architectures can vary significantly, and that is true of MongoDB and Cassandra.
Compared to Cassandra, MongoDB offers a more unstructured architecture. MongoDB enables you to define objects, which can have basically any properties you want. In addition, you can organize objects within a hierarchy in basically any way you want.
In contrast, Cassandra offers a table-like storage model that more closely resembles traditional SQL-style databases -- although Cassandra is not as rigid as a traditional database.
In addition to being more similar to SQL-style databases from a data structure perspective, Cassandra's query language, which is called CQL, also closely resembles the query languages for traditional databases.
CQL and SQL are not identical, but in general, queries that work on SQL will work on CQL. This comes in handy if your data analysts already know SQL well.
MongoDB has its own query interface. It's basically JSON formatting, and you can learn it easily enough. But it's not something you're going to know already.
MongoDB vs. Cassandra: Uses cases
So, when is MongoDB a better solution than Cassandra or vice versa?
Generally speaking, MongoDB is best for workloads with lots of highly unstructured data. If you don't know, or have a minimal ability to anticipate, the scale and types of data that you'll be working with, MongoDB's flexible data structures will suit you better than Cassandra.
That said, to use MongoDB effectively, you'll have to be able to cope with the possibility of some downtime if your master node fails, as well as with limited write speeds. And don't forget, you'll also have to learn a new query language.
Cassandra is the best choice for use cases working with SQL-style data types. Cassandra also works well if you require very fast write speeds. And if the learning curve of a new query language intimidates you, you'll benefit from the similarity between CQL and SQL.
In short, if you want a database that's similar to MySQL and the like but offers somewhat more flexibility and scalability, choose Cassandra. If you need a higher degree of flexibility and are willing to learn some new tricks, MongoDB's your answer.