Discussions

News: JDBC vs JPA? JavaSpaces vs. MongoDB? DB4O vs. Java Content Repository?

  1. Joseph Ottinger just published an excellent article today called "Considering Data Stores," which looks at various storage mechanisms - JDBC, JPA, JavaSpaces, Java Content Repository, MongoDB, and DB4O, primarily - from the perspective of how good they are at CRUD operations and queries.

    The comparisons are mostly apples-to-apples comparisons, and include benchmark data, so you can see both objective comparisons and subjective analysis - for example, while a given technology might be able to run a query in a fraction of a millisecond, it still might not be appropriate for complex reporting.

    Some code snippets are included to show the complexity involved in using the technologies, too.

    He also points out that benchmarks are still benchmarks, so he almost undermines his own point while he's making it. But the point behind the article's benchmark was to build a comparison, not provide stuff like "this works well," so it's okay, and at least he acknowledges the benchmark isn't real-world.

    As could be expected, GigaSpaces looks really good in the article.


    Also by Joe:

  2. Choosing a technology purely on some response times in a very small benchmark imho is very dangerous. Some comments:

    Size of the dataset:
    The dataset you are using probably is going be be big enough to store in cache (or in  gigaspace). And an in memory solution easily can outperform any solutions that needs a lot of IO. But as soon as it can't be done in memory anymore, the numbers are going to be a very different,

    Durability guarantees:
    The tests are not using the same durability guarantees. Gigaspaces provides different levels (replications, write-through/behind) to deal with durability issues. But it you need to provide the same durability level as a database, you need to configure a write-through and that is going to cost a lot of performance because you need to synchronize on IO operations to complete.

    Read consistency:
    Traditional databases provide certain levels of read consistency (depending on the concurrency control mechanism or settings used). Are the same settings used in all tests? Especially if you have more complex interaction with shared state (the database) some level of readconsistency is desirable if you want to write bug free code. So I would like to see how the different solutions behave when some a more strict form of read consistency is needed (if it is possible at all).

    I also work with Gigaspaces on a daily basis, and as long as you keep it out of your architecture (same goes for Spring configured applications) it is a nice technology.

    Peter Veentjer
    Multiverse: Software Transactional Memory for Java
    http://multiverse.codehaus.org






  3. ***as long as you keep it out of your architecture ... it is a nice technology.

    How do you use it and keep it out of your architecture? Or did I just miss out on some intended sarcasm?
  4. Hi Cameron,

    our focus is keeping Gigaspace as much as possible out of the code. There are some specific parts like repositories/event-processors that rely on Gigaspace specific stuff, but the focus for the most components is that they should be normal Java objects.

    Most of the gigaspace specifics are in the application contexts and we even keep those clean, so we can reuse the same applicationcontext files in different ways (very handy for integration testing).

    There are some issues left to be solved, like dealing with objects that live in the space. Personally I hate it when a perfectly immutable class needs to be made mutable or primitive fields be changed to wrappers, just for the sake of querying.


  5. our focus is keeping Gigaspace as much as possible out of the code.

    Hoi Peter,

    This reminds me a little about the article on how to avoid the JSF API while still using JSF: http://weblogs.java.net/blog/cayhorstmann/archive/2010/01/03/how-stay-away-jsf-api
  6. Hi Cameron,

    our focus is keeping Gigaspace as much as possible out of the code. There are some specific parts like repositories/event-processors that rely on Gigaspace specific stuff, but the focus for the most components is that they should be normal Java objects.

    Most of the gigaspace specifics are in the application contexts and we even keep those clean, so we can reuse the same applicationcontext files in different ways (very handy for integration testing).

    There are some issues left to be solved, like dealing with objects that live in the space. Personally I hate it when a perfectly immutable class needs to be made mutable or primitive fields be changed to wrappers, just for the sake of querying.


    Hi, Peter, and thanks for the points about the nature of "apples to apples" -- you're partly right without being all the way right.

    For example, the point you bring up about data persistency is right... but only partly so. In the tests I showed with the embedded space (which yielded the best numbers), for example, those numbers are correct in that if you're running in the XAP container, as soon as that write time has elapsed, then yes, other processes can actually see the data - and depending on your synchronization, that server can "die" and your data doesn't go away, either.

    It's not the same as an RDMS disk write, but don't forget that RDMSes don't always write to disk synchronously either; while I understand and appreciate your point, I'd respectfully suggest that it's not quite correct unless you're being really, really pedantic about definitions, where all definitions take place in an RDMS-specific world. (In other words, "a write means a write in a perfect RDMS-only world, and an equivalency appropriate to the storage mechanism doesn't count.")

    As far as your architecture stuff: good approach, to not tie yourself to an architecture. One thing I'd point out, though, with the article's code: I used the exact same test code for *every* data store. (The benchmark code was in the 'common' maven module, and was not customized at all for any of the tests.)

    I was able to modify the test for each data store by merely altering the component scanning for Spring (and could have centralized this, too) and defining the repository per module.

    In other words, for two tests, X and Y, the Spring configurations differed only in scanning package com.enigmastation.dao.x and com.enigmastation.dao.y, and defining the repository used by x and y (database and mongodb url, for example).