NoSQL - the SQL

Discussions

News: NoSQL - the SQL

  1. NoSQL - the SQL (5 messages)

    Let’s start with a simple question: What is the real difference between NoSQL and SQL?  In my view, the different access patterns provided by NoSQL and SQL result in very different scalability and performance.

    NoSQL elements allow data access only in a narrow predefined access pattern. For example, DHT (Distributed Hash Table) is accessible via hashtable API; given the exact key, the value is returned. The access pattern for other NoSQL data services is similarly narrow and well-defined, and as a result scalability and performance structure are predictable and reliable.

    In SQL, the access pattern is not known in advance, the tables are modeled, assumptions are made regarding the access patterns, and these assumptions are translated into predefined optimizations like index definitions. SQL is by definition a generic language that allows access to data in various ways. The programmer also has limited control over the execution of the SQL statements; mostly, the database engine is responsible for optimizing the execution of the statements. In other words, in SQL, the data model does not enforce a specific way to work with the data — it is built with an emphasis on data integrity, simplicity, data normalization and abstraction, which are all extremely important for large complex applications.

    Why NoSQL

    The NoSQL approach presents huge advantages over SQL databases because it allows one to scale an application to new levels. The new data services are based on truly scalable structures and architectures, built for the cloud, built for distribution, and are very attractive to the application developer. There’s no need for DBA, no need for complicated SQL queries and it is fast. Hooray, freedom for the people!

    This is no small matter — a good programmer’s freedom to choose a data model, write a program or an application with familiar tools, reduce dependencies on other people, test and optimize the code without doing guesswork or counting on a black box (DB). Yes, it’s slow on the test system, but someone will take care of it later by tuning the DB…these are all major advantages of the NoSQL movement.

    And Why Not…

    There are some disadvantages to the NoSQL approach. Those are less visible at the developer level, but are highly visible at the system, architecture and operational levels.

    1. At the system level, data models are key. Not having a skilled authority to design a single, well-defined data model, regardless of the technology used, has its drawbacks. The data model may suffer from duplication of data objects (non-normalized model). This can happen due to the different object model used by different developers and their mapping to the persistency model. At the system level one must also understand the limitations of the chosen data service, whether it is size, ops per second, concurrency model, etc.
    2. At the architecture level, two major issues are interfaces and interoperability. Interfaces for the NoSQL data services are yet to be standardized. Even DHT, which is one of the simpler interfaces, still has no standard semantics, which includes transactions, none blocking API etc. Each DHT service used comes with its own set of interfaces. Another big issue is how different data structures, such as DHT and a binary tree, just as an example, share data objects. There are no intrinsic semantics for pointers in all those services. In fact, there’s usually not even strong typing in these services — it’s the developer’s responsibility to deal with that. 

      Interoperability is an important point, especially when data needs to be accessed by multiple services. A simple example: backoffice works in Java, web serving works in php, can the data be accessed easily from both domains? Clearly one can use web services in front of the data as a data access layer, but that complicates things even more, and reduces business agility, flexibility and performance while increasing development overhead.

    3. Moving to the operational realm, here, from my experience, lies the toughest resistance, and rightfully so… 

      The operational environment requires a set of tools that is not only scalable but also manageable and stable, be it on the cloud or on a fixed set of servers. When something goes wrong, it should not require going through the whole chain and up to the developer level to diagnose the problem. In fact, that is exactly what operation managers regard as an operational nightmare. Have you ever tried getting a developer to diagnose why a payment system is not functioning while he’s at a bar and a few beers in? I’m sure the developer’s date would be impressed by his dedication to his work, but that’s a pretty expensive way to impress someone :)

      Operation needs to be systematic and self contained. With the current NoSQL services available in the market, this is not easy to achieve, even in managed environments such as Amazon.

    So, how can we gain the major advantages of the NoSQL approach while keeping the advantages of the SQL approach?

    SQL and NoSQL Joined:

    A SQL database implementation that uses NoSQL infrastructure is a good solution. A SQL database that is scalable, manageable, cloud-ready, highly available and built entirely on NoSQL infrastructure, but still provides all the advantages of a SQL database, such as interoperability, well-defined semantics and more.

    This hybrid would not be as fast as a NoSQL service, but it may be good enough for the 80% of the market that needs stronger scalability and organic cloud behavior.

    Such a solution would also allow migrating existing applications easily into cloud environments, thus protecting huge investments made by organizations in those applications.

    It is my opinion that a SQL database built on NoSQL foundations can provide the highest value to customers who wish to be both agile and efficient while they grow.

    ?

    Threaded Messages (5)

  2. Agreed[ Go to top ]

    I completely agree. I recently built an app on top of a NoSQL database. Reporting was a nightmare, as well as adding features beyond the original spec. Then I moved the app to a classic SQL database with a classic normalized data model. Result: The performance suffered horribly. Last month I denormalized the data model but stayed on the SQL database. Performance is as good as on the NoSQL database, I can use standard tools, I can run reporting. In short, for my app it turned out to be the best of all worlds.

  3. NoSQL - the SQL[ Go to top ]

    This is old stuff. Matisse (and others) have been doing this for decades.

  4. temporal databases[ Go to top ]

    One area where nosql databases are good is for temporal databases. When the data has to support bi-temporal versioning or additional dimensions, nosql databases "can" be a good solution when entity versioning is required. Doing the same thing with RDBMS ends up hitting query explosion and rapidly swamps the system.

  5. temporal databases[ Go to top ]

    One area where nosql databases are good is for temporal databases. When the data has to support bi-temporal versioning or additional dimensions, nosql databases "can" be a good solution when entity versioning is required. Doing the same thing with RDBMS ends up hitting query explosion and rapidly swamps the system.

     

    Temporal DB is a specific application, does any NoSQL product support it out of box (if there is such a thing as a standard NoSQL query language)?  We must learn to separate SQL the query language (with its known advantages and limitations) from implementation of RDBMS engines that support that query language.  All NoSQL products are storage engines, with good parser/query optimizer, it should be possible to implement SQL over so called NoSQL products.  Then its competition over best implementation of a "standard" query language.  Already several tradition RDBMS vendors provide such approaches (though in various levels of maturity/performance).

    With some convention in keeping the two columns for the two temporal dimension same across the tables, things like session triggers can be used to add two clauses in WHERE of SQL to achieve this without query explosion.  Another option is to hide this behind ORM tool (e.g. extend JPA).

    I suspect only lack of sufficient demand is preventing traditional RDBMS vendors from supporting bi-temporality in their SQL implementation. 

  6. temporal databases[ Go to top ]

    One area where nosql databases are good is for temporal databases. When the data has to support bi-temporal versioning or additional dimensions, nosql databases "can" be a good solution when entity versioning is required. Doing the same thing with RDBMS ends up hitting query explosion and rapidly swamps the system.

     

    Temporal DB is a specific application, does any NoSQL product support it out of box (if there is such a thing as a standard NoSQL query language)?  We must learn to separate SQL the query language (with its known advantages and limitations) from implementation of RDBMS engines that support that query language.  All NoSQL products are storage engines, with good parser/query optimizer, it should be possible to implement SQL over so called NoSQL products.  Then its competition over best implementation of a "standard" query language.  Already several tradition RDBMS vendors provide such approaches (though in various levels of maturity/performance).

    With some convention in keeping the two columns for the two temporal dimension same across the tables, things like session triggers can be used to add two clauses in WHERE of SQL to achieve this without query explosion.  Another option is to hide this behind ORM tool (e.g. extend JPA).

    I suspect only lack of sufficient demand is preventing traditional RDBMS vendors from supporting bi-temporality in their SQL implementation. 

    You're right that out of the box, nosql databases do not support temporal database features out of the box. I've built b-temporal database on top of RDBMS and it definitely runs into query explosion issues with entity versioning. To be clear, when I say entity versioning, I'm talking about vesioning an object graph. The graph may be arbitrarily deep and could have an arbitrary number of objects. In the use cases I've had to handle, querying the data is always by entity version, which means rebuilding the entire object graph. I've successfully built a temporal database using Cassandra, so from my own experience it solves the query explosion issue. Of course it creates other challenges.