Let’s start with a simple question: What is the real difference between NoSQL and SQL? In my view, the different access patterns provided by NoSQL and SQL result in very different scalability and performance.
NoSQL elements allow data access only in a narrow predefined access pattern. For example, DHT (Distributed Hash Table) is accessible via hashtable API; given the exact key, the value is returned. The access pattern for other NoSQL data services is similarly narrow and well-defined, and as a result scalability and performance structure are predictable and reliable.
In SQL, the access pattern is not known in advance, the tables are modeled, assumptions are made regarding the access patterns, and these assumptions are translated into predefined optimizations like index definitions. SQL is by definition a generic language that allows access to data in various ways. The programmer also has limited control over the execution of the SQL statements; mostly, the database engine is responsible for optimizing the execution of the statements. In other words, in SQL, the data model does not enforce a specific way to work with the data — it is built with an emphasis on data integrity, simplicity, data normalization and abstraction, which are all extremely important for large complex applications.
The NoSQL approach presents huge advantages over SQL databases because it allows one to scale an application to new levels. The new data services are based on truly scalable structures and architectures, built for the cloud, built for distribution, and are very attractive to the application developer. There’s no need for DBA, no need for complicated SQL queries and it is fast. Hooray, freedom for the people!
This is no small matter — a good programmer’s freedom to choose a data model, write a program or an application with familiar tools, reduce dependencies on other people, test and optimize the code without doing guesswork or counting on a black box (DB). Yes, it’s slow on the test system, but someone will take care of it later by tuning the DB…these are all major advantages of the NoSQL movement.
And Why Not…
There are some disadvantages to the NoSQL approach. Those are less visible at the developer level, but are highly visible at the system, architecture and operational levels.
- At the system level, data models are key. Not having a skilled authority to design a single, well-defined data model, regardless of the technology used, has its drawbacks. The data model may suffer from duplication of data objects (non-normalized model). This can happen due to the different object model used by different developers and their mapping to the persistency model. At the system level one must also understand the limitations of the chosen data service, whether it is size, ops per second, concurrency model, etc.
- At the architecture level, two major issues are interfaces and interoperability. Interfaces for the NoSQL data services are yet to be standardized. Even DHT, which is one of the simpler interfaces, still has no standard semantics, which includes transactions, none blocking API etc. Each DHT service used comes with its own set of interfaces. Another big issue is how different data structures, such as DHT and a binary tree, just as an example, share data objects. There are no intrinsic semantics for pointers in all those services. In fact, there’s usually not even strong typing in these services — it’s the developer’s responsibility to deal with that.
Interoperability is an important point, especially when data needs to be accessed by multiple services. A simple example: backoffice works in Java, web serving works in php, can the data be accessed easily from both domains? Clearly one can use web services in front of the data as a data access layer, but that complicates things even more, and reduces business agility, flexibility and performance while increasing development overhead.
- Moving to the operational realm, here, from my experience, lies the toughest resistance, and rightfully so…
The operational environment requires a set of tools that is not only scalable but also manageable and stable, be it on the cloud or on a fixed set of servers. When something goes wrong, it should not require going through the whole chain and up to the developer level to diagnose the problem. In fact, that is exactly what operation managers regard as an operational nightmare. Have you ever tried getting a developer to diagnose why a payment system is not functioning while he’s at a bar and a few beers in? I’m sure the developer’s date would be impressed by his dedication to his work, but that’s a pretty expensive way to impress someone :)
Operation needs to be systematic and self contained. With the current NoSQL services available in the market, this is not easy to achieve, even in managed environments such as Amazon.
So, how can we gain the major advantages of the NoSQL approach while keeping the advantages of the SQL approach?
SQL and NoSQL Joined:
A SQL database implementation that uses NoSQL infrastructure is a good solution. A SQL database that is scalable, manageable, cloud-ready, highly available and built entirely on NoSQL infrastructure, but still provides all the advantages of a SQL database, such as interoperability, well-defined semantics and more.
This hybrid would not be as fast as a NoSQL service, but it may be good enough for the 80% of the market that needs stronger scalability and organic cloud behavior.
Such a solution would also allow migrating existing applications easily into cloud environments, thus protecting huge investments made by organizations in those applications.
It is my opinion that a SQL database built on NoSQL foundations can provide the highest value to customers who wish to be both agile and efficient while they grow.