kentoh - Fotolia

Get started Bring yourself up to speed with our introductory content.

A side-by-side comparison of MongoDB vs Cassandra databases

How do you choose one NoSQL database over another? Expert Christopher Tozzi explains the difference between MongoDB and Cassandra.

MongoDB and Cassandra both fall into the NoSQL family of databases. They also both happen to be open source. The similarities, though, mostly end there.

Similarities between MongoDB and Cassandra

Before delving into how MongoDB and Cassandra are different, let's document what they have in common.

They're both databases, obviously. More importantly, they are both examples of NoSQL databases. NoSQL is a type of database architecture in which data is stored in a relatively unstructured fashion. Compared to more traditional SQL-style databases, NoSQL can be a more efficient way of storing the large quantities of unstructured data that organizations commonly use for big data operations.

MongoDB and Cassandra are also both open source -- although commercial implementations are available, too. But even in that respect, their performance is not identical. MongoDB is governed by GNU Affero General Public License 3.0, whereas Cassandra is subject to Apache License 2.0.

Both databases have been around for about a decade. Cassandra debuted in 2008 and MongoDB in 2009. They're equally new in that respect when compared to databases like MySQL, which originated in the mid-1990s.

Finally, both databases support the big three OSes: Windows, Linux and macOS. Cassandra, though, also runs on Berkeley Software Distribution-based Oses.

MongoDB vs. Cassandra: Key differences

In most other important respects, though, MongoDB and Cassandra are different beasts.

High availability strategy

MongoDB's and Cassandra's respective data availability strategies are perhaps the biggest factors that set them apart.

But, however you slice it, Cassandra is more resilient and highly available than MongoDB.

In a Cassandra deployment, you can set up multiple master nodes. If one or several master nodes fail, your database will remain available as long as at least one master is still standing. This highly distributed and redundant model makes it easy to achieve high availability (HA) in Cassandra -- provided, of course, that you have the spare infrastructure available to set up multiple master nodes.

MongoDB doesn't ignore the issue of HA, but its strategy is based on the idea of automatic failover. You can only set up one master node in a MongoDB cluster. If the master fails, a slave node will automatically be converted to become the new master. That ensures continuity, but the process doesn't happen instantly. It typically takes nearly a minute.

Whether a minute of data storage disruption is acceptable or not depends on exactly what you are trying to do. But, however you slice it, Cassandra is more resilient and highly available than MongoDB.

Write speed

MongoDB's limitation to a single master node also has important implications for the speed at which data can be written to a database. Data writes must be recorded on the master, and since a MongoDB cluster has only one master, its ability to write new information to the database is strictly limited by the capacity of that single master node.

With Cassandra, each master node can accept different writes in parallel. Therefore, the more master nodes you have, the more data you can write at once. If you need to write a lot of data, your allegiance should probably lean towards Cassandra.

Data structures

Although both databases fall within the NoSQL family, NoSQL is a broad category. When you get down to the details, NoSQL database architectures can vary significantly, and that is true of MongoDB and Cassandra.

Compared to Cassandra, MongoDB offers a more unstructured architecture. MongoDB enables you to define objects, which can have basically any properties you want. In addition, you can organize objects within a hierarchy in basically any way you want.

In contrast, Cassandra offers a table-like storage model that more closely resembles traditional SQL-style databases -- although Cassandra is not as rigid as a traditional database.

Query language

In addition to being more similar to SQL-style databases from a data structure perspective, Cassandra's query language, which is called CQL, also closely resembles the query languages for traditional databases.

CQL and SQL are not identical, but in general, queries that work on SQL will work on CQL. This comes in handy if your data analysts already know SQL well.

MongoDB has its own query interface. It's basically JSON formatting, and you can learn it easily enough. But it's not something you're going to know already.

MongoDB vs. Cassandra: Uses cases

So, when is MongoDB a better solution than Cassandra or vice versa?

Generally speaking, MongoDB is best for workloads with lots of highly unstructured data. If you don't know, or have a minimal ability to anticipate, the scale and types of data that you'll be working with, MongoDB's flexible data structures will suit you better than Cassandra.

That said, to use MongoDB effectively, you'll have to be able to cope with the possibility of some downtime if your master node fails, as well as with limited write speeds. And don't forget, you'll also have to learn a new query language.

Cassandra is the best choice for use cases working with SQL-style data types. Cassandra also works well if you require very fast write speeds. And if the learning curve of a new query language intimidates you, you'll benefit from the similarity between CQL and SQL.

In short, if you want a database that's similar to MySQL and the like but offers somewhat more flexibility and scalability, choose Cassandra. If you need a higher degree of flexibility and are willing to learn some new tricks, MongoDB's your answer.

This was last published in March 2018

Dig Deeper on Big data architecture

Join the conversation

12 comments

Send me notifications when other members comment.

Please create a username to comment.

Which NoSQL database -- MongoDB or Cassandra -- would work better in your organization?
Cancel
All nodes in Cassandra are equal - there are no master nodes (unless you count them all as masters). There is an important quality for Cassandra use - you will need to know your queries/access patterns in order to model the data properly and scale. It isn't made for random, ad-hoc queries that are essentially table scans.
Cancel
Thanks for your comment. The article could have been clearer on this point. A better way to put it is to say that as long as your Cassandra database is hosted across multiple nodes, you have multiple "masters," because each node fulfills the functionality of a master node.
Cancel
Hi Chris Tozzi,

I'm using MongoDB since 2012 and I would like to point out that it is possible to have multiple masters (in a MongoDB cluster we call them primaries) in a MongoDB cluster by using Shards.

MongoDB has two deployment options: (a) Replica Sets and (b) Sharded Cluster. 

A Replica Set is a group of 3 or more mongod process responsible for a data set. In a replica set, you will have a node with the role of primary and the other nodes with a role of secondary (another possible role is arbiter, but it's irrelevant for this discussion). The Replica Set provides data availability, so if the primary of the replica set crashes or becomes inaccessible the remaining members will elect a new primary (this process is transparent to the application).

A Sharded Cluster is a group of Replica Sets (so, by design, it inherits all the data availability capabilities of the Replica Set deployment) where the data of your collection or collections will be spread across them. In that manner, you have multiple primaries that can read and write to the database in parallel. 
Cancel
Hi Chris, 

MongoDB has evolved a lot since version 3.0. Nowadays, your comment about downtime isn't accurate. MongoDB has adopt the Jepsen testing as part of their test pipeline. MongoDB 3.4 has passed all the Jepsen tests. 

Write throughtput and latency are much better in MongoDB 3.6 and you are right about having to learning a new query language, but MongoDB also have a BI connector that allows anyone to write SQL-style queries to be processed by MongoDB. 

Learning MongoDB's query language will allow you to explore MongoDB's aggregation framework. A very powerful tool to extract information from your data. 


Cancel

Hi Chris,

thanks for the article, just let's get some stuff cleaned up / corrected. Other then mentioned, MongoDB does have multiple master nodes. This is archived via sharded clusters. Sharded clusters, themselves consist of replica sets. A replica set is a group of at least three nodes. One is a primary node, more as secondaries. In case the primary is no more accessible the remaining members will elect a new primary - this is totally black boxed for the application.  By putting your data on various shards you get many primaries which can act in parallel.

Michael
Cancel
I’ve used MongoDB in production since 2011. You can actually write to secondaries as well with MongoDB’s causal consistency sessions. There’s also many other errors in the article. For me *not* having to use SQL was one major benefit, since SQL is the one I don’t know and find it unintuitive to use. MongoDB on the other hand makes it really easy to work with JSON all the way and leverage JavaScript full stack.
Cancel
I got lost at But, however you slice it, Cassandra is more resilient and highly available than MongoDB.

No. That is incorrect. With journaling enabled and a proper write concern, MongoDB data is both highly available and consistent. 

Instead, with a multi-master model you could have an application who can only reach master A and another that can only reach master B and conflicts that cannot be resolved will arise.
Cancel
MongoDB master node fail-over takes about 10s, not 1 minute on a busy production cluster. This latency is necessary to guarantee the data read consistency.
If you have multiple master nodes, read consistency is not guaranteed

Cancel
Hi, thanks for your article, but I think that MongoDB is a better way to store data for the following advantages: It does not have a schema which makes it more comfortable when data is uploaded since the data does not have to be saved by priority, which it makes it more pleasant to the user, it is open source, anyone can learn to use it, Gridfs option which allows to store and recover binary files in a distributed way. Regards!
Cancel
It is fair to say both the databases has different purpose to fulfill. As MongoDB is capable of doing things similar to RDBMS so doing analysis on data stored in MongoDB is easy compared to Cassandra. And the vibrant user community of MongoDB is missing in Cassandra. Cassandra is picking up but slowly. It will be nice to say MongoDB is still far ahead of other NoSQL databases in terms of features, security and usability.  
Cancel
I'm not so sure if we can trust this article at all. It's being written by "Technology Writer" who has never been a developer/ops in his life. The facts stated in this article are merely gotten by Google search. I wonder if the author has even tried to install Cassandra or MongoDB :)
Cancel

-ADS BY GOOGLE

SearchCloudApplications

SearchSoftwareQuality

SearchHRSoftware

SearchSAP

SearchERP

DevOpsAgenda

Close