Like many emerging technologies, NoSQL has gone through a hype cycle that saw widespread implementation followed by decidedly mixed results. The cargo-cult mentality that had everyone jumping on board the non-relational database train has certainly led to some high profile, embarrassing roll-backs. There is no one party totally to blame. Vendors presented their solution as the silver bullet to fix all problems while developers with big dreams for the future talked up the need for ever-increasing database scalability to attract investor dollars.
Certainly, business owners can be forgiven for hoping that apainless solution to database woes was at hand at last. But now that the first generation has peaked and the solution is beginning to mature, the time has come to regroup and review the lessons learned over the past decade. Here are three common mistakes that organizations are still making when it comes to NoSQL—and how to avoid them.
#1: NoSQL is about more than just scalability
When you're talking about large scale, the database isn't the problem. The issue is in understanding how to operationally manage it.
Eric Redmond, author of Seven Databases in Seven Weeks
Eric Redmond, author of Seven Databases in Seven Weeks, says the most common mistake people make is equating NoSQL with web scale. Redmond says it's become almost an inside joke in the database community. It's an understandable assumption. After all, the progenitors of today's non-relational databases were companies like Google and Amazon that were focusing on how to address massive scalability issues in a web environment. Even the name Mongo comes from huMONGOus—a reference to the volume of data and traffic this popular document store database was expected to address.
But thinking of NoSQL only from the perspective of scalability can lead to poor decisions. Even a smaller organization may benefit from NoSQL solutions if they are dealing mostly with social media data that is best represented by graphs. Or, a larger organization with lots of data could still need to rely primarily on SQL for sophisticated queries. It's really more about the use case and non-functional requirements than just the scale of the data.
Key Takeaway: Don't get sucked in by the next big thing. Put business needs and realities first. Copying the infrastructure of Google won't make a company into the next Google.
#2 Developers need to evolve
Ann Kelly and Dan McCreary, coauthors of Making Sense of NoSQL, point out another big mistake. In a recent, high profile web project, a poorly selected integration team created a huge problem. "The client brought in a really robust, powerful, mature NoSQL database, MarkLogic. Then, they hired an integrator that had only Java programmers and were only familiar with relational databases. They spent roughly 30-40 million dollars building code that didn't need to be written and that will all be thrown away." The website in question is, of course, the notorious healthcare.gov.
Dan says that this isn't the first NoSQL project where this has happened, and it won't be the last. A team that has no experience writing code for non-relational databases is very likely to run into similar issues. When developers are using an old process with new technology, it's easy to over-develop. Relying on UML and Java classes to create elaborate code simply isn't necessary in the streamlined world of document stores.
Key takeaway: When documents map directly onto objects, code can be lean and tightly focused. If code is proliferating, investigate the tooling, methodology, and mindset of the team to pinpoint the real issue.
#3 Distribution is hard, even when it's made easier by NoSQL
Eric Redmond agrees that there is no substitute for knowledge and experience, whether in implementation or ongoing administration. "When you're talking about large scale, the database isn't the problem. The issue is in understanding how to operationally manage it. You can install Couch and run a bunch of queries against it on one machine. But once you try to distribute it across multiple machines, it becomes a distributed system and it's an entirely different scope. You have to be a talented systems administrator and talented operator. The fact that you can write a query that runs quickly on your local development machine often has no bearing on how well it will scale horizontally across hundreds of machines."
Fortunately, Redmond says some of the NoSQL databases are designed to help keep developers from shooting themselves in the foot. Couch is one example, since it automatically uses MapReduce with the assumption that the user is querying against many servers. Riak is another example. For those who say they find Riak difficult to use, Redmond has some bad news. "It's hard to use because writing queries in a distributed environment is hard."
Key Takeaway: Choose a NoSQL DB that enforces best practices. Also, remember that a key value store is one of the simplest general patterns that's easily scalable.
What other techniques can help developers get NoSQL right? Let us know.
The pros and cons of being an Ops or full stack developer