Home

News: A New Java Persistence API for Berkeley DB

  1. A New Java Persistence API for Berkeley DB (54 messages)

    Sleepycat is requesting feedback from its existing users and potential users on a new Java API for object persistence. This new API has similarities with, and significant differences from, other persistence approaches in Java such as EJB3 Java Persistence, Hibernate, and Java Data Objects (JDO).

    Traditionally, Berkeley DB provides the necessary capabilities for creating high performance database applications without imposing a schema or data model. Even its Java APIs for object binding and stored collections are unconstrained by a data model of any sort. This provides maximum flexibility, but does not provide built-in support for quickly defining large and complex models.

    The Persistence API adds a built-in persistent object model to the Berkeley DB transactional engine. The design center for this new API is support for complex object models without compromises in performance.

    Please take a look at the API starting with the overview of the com.sleepycat.persist package at the link below. This package plus its three subpackages (model, evolve and raw) are new.

    Start here: The Berkeley DB Persistence API

        * com.sleepycat.persist
        o com.sleepycat.persist.model
        o com.sleepycat.persist.evolve
        o com.sleepycat.persist.raw

    We at Sleepycat are very interested in your reactions, comments, suggestions and other feedback, both positive and negative. In particular we are wondering:

       1. If you have one, what is your favorite persistence approach for Java and how would you rate its usability compared to the usability of the Persistence API? What aspects of the Persistence API are more or less usable?

       2. The Persistence API makes heavy use of Java 1.5 generics and annotations. Without using these new language features, we believe that usability would be lessened. Do you consider the use of these language features positive or negative, and why?

       3. The Persistence API, while it increases usability, does not add a high level query facility. Do you consider a high level query facility to be a requirement for a Java persistence solution?

       4. The Persistence API does not conform to an existing standard such as JDO. To do so, we believe that both usability and performance would be compromised. Do you consider conformance to a standard to be more important than such compromises?

    If you are not already familiar with the existing Berkeley DB product line, the following background information is important to keep in mind:

        * Berkeley DB is an embedded database library, not a database server. By providing a very fast Btree store with fine control over transactions and locking, Berkeley DB applications can be built that outperform applications built using other approaches.

        * Berkeley DB does not include a high level query facility. Queries are performed by accessing indices and by using an equality join method. Hand-optimized queries using Berkeley DB can outperform a general purpose query language optimizer.

        * Berkeley DB traditionally provides a key-value API for accessing Btree databases. A "database" in Berkeley DB is the equivalent of an SQL table and is represented as a set of key-value pairs. In the Berkeley DB Base API, byte arrays, not objects, are used for keys and values. With the Bind and Collections APIs, keys and values may be mapped to Java objects using a variety of mechanisms.

        * Sleepycat has three product lines: The original Berkeley DB, Berkeley DB Java Edition, and Berkeley DB XML. The Persistence API is targeted initially for use with Berkeley DB Java Edition, but may be adapted for use with the original Berkeley DB also at a later date. It is not applicable to Berkeley DB XML, which uses XML and XML Schema as its data model.

    Thank you in advance for taking a look at this and for any feedback that you are willing to provide!

    The Sleepycat Java Edition team

    Threaded Messages (54)

  2. JSR220 EJB3.JPA[ Go to top ]

    Your annotations look very similar to EJB3.JPA

    Have you considered developing EJB3.JPA implementation provider instead of writing your own API?

    The spec is on:
    http://jcp.org/en/jsr/detail?id=220
  3. Why not EJB3.JPA?[ Go to top ]

    I'll try to answer the question of why EJB3.JPA was not used, from Sleepycat's perspective. It's a very good question.

    EJB3 and Hibernate are excellent tools for accessing an SQL database. However, Berkeley DB is not an SQL database. Berkeley DB has a significant performance advantage because it is an embedded non-SQL database. What we're trying to do with the Persistence API is to increase ease of use without compromising performance in any way.

    Why would we need to compromise performance to implement the EJB.JPA spec? It's the extra layer of software and processing between the user API and the database engine. The "persistence context" defined for EJB3.JPA implies the use of an object cache and tracking of object status (detached, dirty, etc).

    A persistence context and object cache make a huge amount of sense when connected to a database server. For objects accessed more than once it is much cheaper to access a local cache than to make a round trip to the server. Even more importantly, updates can be queued locally and flushed to the server at transaction commit. So for a typical RDBMS (or OODB) the persistence context improves performance.

    But the situation is reversed with Berkeley DB since it always functions as an embedded database. Its low level cache of raw data (byte arrays) can be accessed extremely quickly: we often see very high operation rates per second. And object bindings are fast enough -- especially when bytecode enhancement is used -- that retrieving a record from the embedded cache and instantiating an object is very fast.

    So a secondary object cache for Berkeley DB would only use more memory without having any significant performance benefit. And using more memory can cause more I/O if less of the working set fits in memory. Minimizing I/O is a primary goal when it comes to performance tuning.

    A telling fact on this issue is that Berkeley DB is itself often used as a front end cache for an RDBMS, because it is so much faster to access data in a local Berkeley DB database.

    So overall, we think that the EJB3.JPA is a good API for what it was designed for, but it is not optimal for an embedded non-SQL database.

    Mark
  4. Why not EJB3.JPA?[ Go to top ]

    It is a reasonable argument but also note that there are other reasons for automagic dirty checking and persistence contexts other than plain performance. For example, dirty checking simplifies code by removing the need for explicit update operations. This can be a big deal in complex apps.

    OTOH, I agree that implementing JPA for a persistence mechanism that has no support for ad hoc queries would perhaps be a bit "strange".
  5. Why not EJB3.JPA?[ Go to top ]

    It is a reasonable argument but also note that there are other reasons for automagic dirty checking and persistence contexts other than plain performance. For example, dirty checking simplifies code by removing the need for explicit update operations. This can be a big deal in complex apps.OTOH, I agree that implementing JPA for a persistence mechanism that has no support for ad hoc queries would perhaps be a bit "strange".

    You're absolutely right that there is an easy of use aspect to the persistent context provided by the EJB3.JPA model.

    One way of looking at this issue is to say that objects are fetched and stored by *value* with Berkeley DB, not by *reference* as in the EJB3.JPA approach.

    Access by value isn't perfect as you point out: If you retrieve an object twice, you will have two separate instances. To know whether they are equal, you'll have to compare their primary keys. If you change an object's property, you have to remember to store that object explicitly.

    But access by value is very simple to understand and, in my opinion at least, easy to use. There is never a question about whether a given instance is "managed" by a persistence manager or not -- it never is.

    So I think the by-value and by-reference models both have pros and cons WRT ease of use. Because so many people happily use Hibernate, perhaps the by-reference model has become familiar. I'm very interested to know how important this issue is for users.

    Thanks for bringing this issue up.

    Mark
  6. Why not EJB3.JPA?[ Go to top ]

    It makes sence, but it is nice to have some stuff for integration (probably JDBC with some popular RDBMS emulation is the most usefull thing).
  7. Why not EJB3.JPA?[ Go to top ]

    It makes sence, but it is nice to have some stuff for integration (probably JDBC with some popular RDBMS emulation is the most usefull thing).

    Good point. For example, this would allow it to be used with standard reporting tools.

    We have not considered a JDBC emulation layer so far, but perhaps we should consider it for a future release. Thanks for bring it up.

    You mention "popular RDBMS emulation" -- do you know of something like this for Java that could be adapted?

    Mark
  8. Why not EJB3.JPA?[ Go to top ]

    http://www.swissql.com/sqlone-api.html this is popular emulator.
  9. Why not EJB3.JPA?[ Go to top ]

    http://www.swissql.com/sqlone-api.html this is popular emulator.

    I'm sorry, I misunderstood. This kind of emulator translates between different SQL dialects. Berkeley DB does not support SQL, so an emulator like this wouldn't work. But thanks anyway for the pointer.

    Mark
  10. Why not EJB3.JPA?[ Go to top ]

    Yes, I am talking about relational query engine implementation and JDBC wrapper. Popular database emulation can help to migrate application (but this problem is solved by migration tools). Tools can adapt driver themself, for example Hibernate uses "Dialect" implementation for vendor specific features.
     JDBC is implemented for many backends including object databases, this kind of stuff is popular in ETL http://www.enhydra.org/tech/octopus/index.html.
  11. Why not EJB3.JPA?[ Go to top ]

    JDBC driver is usefull in many ways, probably ETL stuff is a good example (extract data from BDB, transform and load to server for data warehousing stuff). It can help to integrate DBD with popular ORM implementations. JDBC wrapper is usefull for integration, optimized API is usefull for maximum performance. Probably JDO and EJB wrappers are not so usefull.
  12. no wrappers please[ Go to top ]

    I think strength of the BerkeleyDB is that it does its own business and does it in the optimal way with optimized API for this particular type of persistence.
    IMO implementing JDBC, JDO etc. wrappers does not make sense, the next question people will ask after JDBC wrapper implementation: why it does not support zzzz SQL construct? And then consider BDB as 'bad' SQL database....
  13. no wrappers please[ Go to top ]

    I think strength of the BerkeleyDB is that it does its own business and does it in the optimal way with optimized API for this particular type of persistence.IMO implementing JDBC, JDO etc. wrappers does not make sense, the next question people will ask after JDBC wrapper implementation: why it does not support zzzz SQL construct? And then consider BDB as 'bad' SQL database....

    I appreciate your comment very much, and this is one of the reasons that we have not gone down the path of providing a JDO, EJB3, or SQL interface. Thanks for confirming this!

    Juozas Baliuka does have a point, however. Perhaps a minimal read-only JDBC interface would not have a high cost to develop and maintain, but would open up interoperability with reporting tools, etc. This is somewhat attractive because the Persistence API does define a schema, and that schema could be exposed via such a read-only JDBC interface.

    OTOH perhaps this would only cause requests for better SQL support, etc, etc, as you say. I'm very interested in your opinions about this.

    (Caveat: This is not something we have discussed at Sleepycat, so I'm just gathering input at this point.)

    Mark
  14. no wrappers please[ Go to top ]

    Ability to use existing reporting tools via JDBC interface definitely looks attractive. But I think that returning schema information in DatabaseMetadata is one thing and parsing SQL requests from those tools and returning JDBC compatible data is another business.
    I think you can make better judgment if you can support such SQL interface.
    Maybe a bit of education/evangelizing could help breaking that mental link:: persistence->sql->RDBMS :)
  15. no wrappers please[ Go to top ]

    I think strength of the BerkeleyDB is that it does its own business and does it in the optimal way with optimized API for this particular type of persistence.IMO implementing JDBC, JDO etc. wrappers does not make sense, the next question people will ask after JDBC wrapper implementation: why it does not support zzzz SQL construct? And then consider BDB as 'bad' SQL database....
    I am not marketing expert, but it it is possible to relese wrapper as separate product or brand to solve this "problem", but if it is very usefull then somebody else will do it anyway.
  16. Non-durable identity[ Go to top ]

    The "persistence context" defined for EJB3.JPA implies the use of an object cache and tracking of object status (detached, dirty, etc).A persistence context and object cache make a huge amount of sense when connected to a database server. For objects accessed more than once it is much cheaper to access a local cache than to make a round trip to the server. Even more importantly, updates can be queued locally and flushed to the server at transaction commit.

    Hi Mark,

    If you guys become interested in standards, you should take a look at JDO's non-durable identity. It was designed for more-or-less the use case you're talking about here.

    -Patrick

    --
    Patrick Linskey
    http://bea.com
  17. Why are they wasting their time and our time, why not just join the EJB3 spec :-) :-) Don't they know that the war is lost, EJB3 and Hibernate won. :-)

    Ilya
  18. Apparently it is still not known for some groups of people.

    Here are some resources for reading:

    Interview with Craig Russell
    http://www.jdocentral.com/JDO_Commentary_CraigRussell_3.html

    Persistence FAQ:
    http://java.sun.com/j2ee/persistence/faq.html
  19. Apparently it is still not known for some groups of people. Here are some resources for reading:Interview with Craig Russellhttp://www.jdocentral.com/JDO_Commentary_CraigRussell_3.htmlPersistence FAQ:http://java.sun.com/j2ee/persistence/faq.html

    I was actually being sarcastic, since the last few weeks we have people coming out of the woods screaming why some open source software projects exists and wanting all to merge into monopolies.

    Ilya
  20. Just implement mapping engine and wrapp it with JDO,EJB,ODMG or implement JDBC driver and it will be wrapped automaticaly by JDBC based ORM implementations.
  21. Relation by ID[ Go to top ]

    Hi,

    I saw in the example of the API documentation that all relation between objects is made by ID (for example Person does not have a reference to Employer object, just its ID).

    I think you stay like this approach because you try to keep it simple, and not enter in the complexity of store object trees and all the stuff of retrieving in levels? That's ok for me.

    I would suggest, in my humble opinion, that you allow to support the use of Collections, Sets, Maps, of primitive numbers. I know is not an standard of Collections API. (Apache commons i think has some API of this kind of collections)

    I think the overuse of create numberic objects to search, store, retrieve, generates too many garbage and i think i would a lot faster (that's what you're looking always?) if you minimize the creation and garbage collection of objects using just primitives numbers.

    Thanks.
  22. Relation by ID[ Go to top ]

    I think the overuse of create numberic objects to search, store, retrieve, generates too many garbage and i think i would a lot faster (that's what you're looking always?) if you minimize the creation and garbage collection of objects using just primitives numbers.Thanks.

    The cost of a short lived Object is pretty small in 1.5. In 1.6 most of these won't even create garbage but will be allocated on the stack. It's unlikely to be worth the effort.
  23. Relation by ID[ Go to top ]

    The cost of a short lived Object is pretty small in 1.5. In 1.6 most of these won't even create garbage but will be allocated on the stack. It's unlikely to be worth the effort.

    GC is not the only concern. Object creation overhead is another. If you have to access millions of rows of data, creating millions of Objects is a performance hit.

    I can see the BerkleyDB being the backend for a quick calcuation engines similiar to OLAP but w/o the data explosion.
  24. Relation by ID[ Go to top ]

    GC is not the only concern. Object creation overhead is another. If you have to access millions of rows of data, creating millions of Objects is a performance hit.

    In our experience profiling and optimizing Berkeley DB Java Edition, we have not found object creation itself to be a significant factor, especially for Java 1.5 and 1.6. Although this is non-intuitive, Sun has been saying this all along, and in this case they seem to be right.
    I can see the BerkleyDB being the backend for a quick calcuation engines similiar to OLAP but w/o the data explosion.

    Yes, I think this is a good application for Berkeley DB.

    Mark
  25. Relation by ID[ Go to top ]

    GC is not the only concern. Object creation overhead is another. If you have to access millions of rows of data, creating millions of Objects is a performance hit.
    In our experience profiling and optimizing Berkeley DB Java Edition, we have not found object creation itself to be a significant factor, especially for Java 1.5 and 1.6. Although this is non-intuitive, Sun has been saying this all along, and in this case they seem to be right.

    From what I understand, Object allocation and deallocation in modern JVMs is much faster than in C so the cost of temporary Objects is low. Also, if you use autoboxing, there is a pool of low value integers. Probably not relevant in this context but good to know all the same.
  26. Relation by ID[ Go to top ]

    From what I understand, Object allocation and deallocation in modern JVMs is much faster than in C so the cost of temporary Objects is low.

    There's no such thing as C. While on one hand "modern JVMs" means something (the Sun VM, the BEA VM, the GNU Java runtime etc), C is nothing but a language. There's also another languages, related to it, that's called C++ and it's debateable whether the two have more in common than things setting them appart. As I'm sure you're aware, there's compilers and runtimes for C and C++ as well, and they're all terribly different, even on the same architecture and OS; they're also different between releases of the same OS. You simply can't state something like "Java [or even Sun JVM] object allocation is faster than C's". Moreover what sort of allocation is this referring to? Stack or heap, cause you know, C and C++ support both (actually structs in C). So which one is it? And compared to what C runtime?
    But then again, if Brian Goetz wrote an article on it then it must be true, right, and it's so much easier to just blindly eat up everything you're served as long as it fits your view of the world, as long as it feels like a friendly pat on the back.
  27. Relation by ID[ Go to top ]

    I would suggest, in my humble opinion, that you allow to support the use of Collections, Sets, Maps, of primitive numbers. I know is not an standard of Collections API. (Apache commons i think has some API of this kind of collections)I think the overuse of create numberic objects to search, store, retrieve, generates too many garbage and i think i would a lot faster (that's what you're looking always?) if you minimize the creation and garbage collection of objects using just primitives numbers.Thanks.

    I was just looking at the Jakarta Commons Collections API and I can't find collections that store primitives as such -- can you point me to where you've seen these?

    In any case, the only requirement for one-to-many or many-to-many key collections is that they implement the java.util.Collection interface and that they are @Persistent. So if you have an efficient collection you'd like to use, as long as it implements Collection you can use it.

    If you want to use a collection class in a 3rd party library, then of course the collection class won't be annotated with @Persistent. To solve this, you can use a PersistentProxy as described here:

    http://dev.sleepycat.com/je-persist-review/java/com/sleepycat/persist/model/PersistentProxy.html

    Mark
  28. Prmitive collections[ Go to top ]

    The only primitive collections I know of is implemented by Sebastiano Vigna at http://fastutil.dsi.unimi.it/.

    It does not support generics but other then that it is extremly complete. It has several implementations for Sets, Maps, Lists and associated iterators and all for any primitive/Object combination. It makes the jar Huge (8Mb).
    The website reports the library is optimized for huge collections. I have succesfully used it for moderately sized maps (200000 items).
  29. Primary and Secondary[ Go to top ]

    I'm curious why there is a PrimaryIndex class and a SecondaryIndex class. Wouldn't it be more elegant, less verbose, and more flexible to just have an Index class that has a getSubIndex method? Is there a special reason that the API only allows two levels of indexes or am I just missing something?
  30. Primary and Secondary[ Go to top ]

    I'm curious why there is a PrimaryIndex class and a SecondaryIndex class. Wouldn't it be more elegant, less verbose, and more flexible to just have an Index class that has a getSubIndex method? Is there a special reason that the API only allows two levels of indexes or am I just missing something?

    Good question. I'll try to explain the reasoning behind the class hierarchy, and please tell me if it makes sense.

    For example, take this class:

    @Entity
    class Person {
      
      @PrimaryKey
      long id;
      
      @SecondaryKey(relate=MANY_TO_ONE)
      String name;
    }

    There would be a PersonByID primary index ordered by id and a PersonByName secondary index ordered by name.

    There are several rules about primary and secondary indices:

      1. A primary index must have unique keys (each person has a unique id in the example). A secondary index may have non-unique keys (there could be more than one person with the same name in the example).

      2. Records may be inserted into a primary index, but not into a secondary index. Secondary index records are maintained automatically by the engine as primary records are inserted, updated and deleted.

      3. Because of the two rules above, you cannot have a secondary index that is associated with another secondary index. A secondary must be associated with a primary.

    Therefore, the PrimaryIndex and SecondaryIndex classes have differences and similarities.

    In the class hierarchy, their similarities are captured in the EntityIndex interface, which is implemented by both classes. EntityIndex allows all kinds of index traversal and queries by key. It does not allow record insertion or update.

    PrimaryIndex implements EntityIndex and adds methods to allow insertion and update.

    SecondaryIndex implements EntityIndex and adds methods to support two special access methods that only make sense for secondary indices:

    + The keysIndex method is for traversing keys only, without retrieving the primary record at all to improve performance. This doesn't apply to a primary index.

    + The subIndex method is for accessing the subset of entities having a given secondary key (duplicates). This does not apply to a primary index because primaries must have unique keys.

    Does this make sense?
    Mark
  31. Primary and Secondary[ Go to top ]

    The subIndex method is for accessing the subset of entities having a given secondary key (duplicates). This does not apply to a primary index because primaries must have unique keys.Does this make sense?Mark

    Yeah, I figured there was a reason, I just didn't see what it during my cursory look at the API. I just get a icky feeling when I see classes named Something1 and Something2 or in that general form. Maybe I would have come up with something different if I had done but I might have come to the same conclusion.

    One thing that I find difficult about this API is that the terms are not clear to me like 'evolve'. While (from what I see) this seems very interesting, I feel like it would take a lot of work to understand the DB before I could even start using this API. Perhaps is because I am not familiar with this kind of DB.

    In 10 words or less, why should I use your DB?
  32. Primary and Secondary[ Go to top ]

    One thing that I find difficult about this API is that the terms are not clear to me like 'evolve'. While (from what I see) this seems very interesting, I feel like it would take a lot of work to understand the DB before I could even start using this API. Perhaps is because I am not familiar with this kind of DB.

    Please don't let the class evolution features detract from the usability of the API. We put these features into a separate package because they are optional and they can certainly be ignored initially. We will emphasize this is the documentation.

    In general, class evolution addresses the need to change your class definitions after you have deployed your application. If the existing stored data is not compatible with the new class definitions, converting the exiting data is necessary. Using this feature is important if you cannot easily recreate the data from another source.

    The evolve package makes this conversion easier, and more efficient. By using the mutation classes, conversion of existing data can be performed lazily and transparently. This avoids downtime while converting a large database.

    Although it would nice to avoid this problem entirely by not changing classes incompatibly, for many applications these types of changes are a fact of life. What we've tried to do is to explicitly address this, rather than leaving it as a problem for the user to deal with.
    In 10 words or less, why should I use your DB?

    Hm, ok, only 10 words, I'll try: It outperforms other databases, is scalable, reliable, transactional and simple.

    Mark
  33. Primary and Secondary[ Go to top ]

    I don't speak for or know much about BDB, but I can sum up why you'd want to use it in two words:

    "huge maps"

    Too many people are thrown off by the letters "DB" which they associate the acronym RDBMS, which they are intimidated by.

    But I think nearly all of these same developers have seen instances of applications where BDB could be used. How many people have run up against bottlenecks iterating through lists that got bigger in production than conceived in development? How many people have paid for expensive caching solutions (or maintained nightmare roll-your-own caches) for lists of data. How many file-based solutions are out there because an app needs to store data but has to be smaller than even an embedded DB like HSQL?

    These are areas where BDB shines. Don't you wish you didn't have to read a big file on startup and parse the lines? Don't you wish you had a reliable caching mechanism that was as easy to use as a HashMap? Don't you wish you could scale a solution that has outgrown it's data handling capabilities without rewriting the whole thing from scratch?
  34. Primary and Secondary[ Go to top ]

    "huge maps"
    ...
    Don't you wish you had a reliable caching mechanism that was as easy to use as a HashMap? Don't you wish you could scale a solution that has outgrown it's data handling capabilities without rewriting the whole thing from scratch?

    Then the next question is: How does this BDB compares to Coherence (Distributed Cache) found @ http://www.tangosol.com/coherence-overview.jsp
  35. Primary and Secondary[ Go to top ]

    "huge maps"...Don't you wish you had a reliable caching mechanism that was as easy to use as a HashMap? Don't you wish you could scale a solution that has outgrown it's data handling capabilities without rewriting the whole thing from scratch?

    Then the next question is: How does this BDB compares to Coherence (Distributed Cache) found @ http://www.tangosol.com/coherence-overview.jsp

    A couple quick points before I carefully side-step the question:

    1. SleepyCat (developer of BerkeleyDB) is a partner of ours.

    2. Some "very big" financial services firms are joint customers, and pushed Tangosol and SleepyCat to work together.

    3. Coherence 3.1 supports BerkeleyDB as a disk store. The BerkeleyDB implementation is fairly high performance, and is definitely faster than the built-in disk store that Coherence has.

    Now, to try to handle the question:

    1. Coherence is focused on in-memory caching, but it can do pure disk caching or mixed memory/disk caching ("overflow caching"). We do *not* focus on single-node usage, with our median deployment size being around 16 nodes and large deployments a "couple orders of magnitude" larger.

    2. BerkeleyDB is good at keeping data safe even when an app isn't running (i.e. on disk in a resilient format). Coherence is good at keeping data safe when the app *is* running, i.e. when the data is only in memory and servers die in the middle of a two-phase commit.

    3. Coherence doesn't tend to use (and certainly doesn't rely on) shared disk (SAN, NAS, etc.) .. Coherence is basically a RAID implementation for objects implemented in a grid environment.

    So it's really apples and oranges. On pure disk speed for a single node, use BerkeleyDB. For files shared from a shared disk, use BerkeleyDB. For persistent data, use BerkeleyDB.

    For clustering, for shared memory, for coherent caching, for data grids, for information fabrics .. use Coherence.

    If you need both, I guarantee that we work well with BerkeleyDB and the joint solution rocks ;-)

    Peace,

    Cameron Purdy
    Tangosol Coherence: Clustered Shared Memory for Java
  36. Consider JDO API[ Go to top ]

    I realize you may find it passe, but the JDO 1.0 spec would be interesting, because you wouldn't have the same objections as the EJB3 or Hibernate APIs (ie. second level caching). You would have makePersistent, and such, but avoid JDOQL altogether (or perhaps implement it using janino or something similar).

    Just a thought.

    BTW, I believe dirty checking of persistent entities is extremely important. I don't think you can avoid this.
  37. I think, that implementing EJB 3.0, in particular EJQ QL, could not heart perfomance. At the moment, if we want to execute

    SELECT FROM Orders where customerName = 'Jonh'

    what should we do? Iterate over all orders and check customersName field? Definetely bad idea for me.

    How that should be done in Berkley Java DB? Abybody from Berkeley Team?
  38. It is trivial example, see "SecondaryIndex" stuff.
  39. This must be more "interesting" to implement manualy "SELECT FROM Orders where customerName = 'Jonh' and customerEmail = 'jonh at yahoo dot com' or ..." to find the "best" index or to use no index (it depends on index selectivity).
  40. This must be more "interesting" to implement manualy "SELECT FROM Orders where customerName = 'Jonh' and customerEmail = 'jonh at yahoo dot com' or ..." to find the "best" index or to use no index (it depends on index selectivity).

    Yeah. So, if fact, for complex queries you become "Query Optimizer". I don't want to be it.
  41. Queries[ Go to top ]

    There were several posts about queries and secondary indices.

    It is true that Berkeley DB does not have a query language, and therefore does not support ad-hoc queries. And if your query uses a secondary key, it is up to you to use the secondary index.
    SELECT FROM Orders where customerName = 'Jonh'

    For example, you can compare the code to execute the above SQL query and get the results object using EJB3.JPA, to the following code using the proposed Berkeley DB API:

    EntityCursor<Order> orders =
       ordersByCustomerName.subIndex("John").entities();

    Or, if you are performing an equality join then you need to use the EntityJoin object:

    http://dev.sleepycat.com/je-persist-review/java/com/sleepycat/persist/EntityJoin.html

    For complex queries with lots of conditions, instead of SQL you will need to write procedural code that iterates through results and performs comparisons.

    If you are accustomed to using SQL, this probably seems strange. However, if you are accustomed to using the Java Collections framework and similar APIs, if you try the Berkeley DB API you may find it simple and straightforward.

    In terms of performance, when you are writing a query using Berkeley DB you can think of it as if you were writing a stored procedure in an RDBMS. Because Berkeley DB is an embedded database and there is no intermediate query language, the ordersByCustomerName object provides direct access to the Btree for that secondary index.

    The performance advantage of this approach is quite significant. But of course, you should determine that for yourself.

    Berkeley DB is not intended to be the tool for all jobs. It is not intended to be used where ad-hoc SQL queries are required, or where an RDBMS is required for other reasons.

    It is intended to be used where you need better performance than can be obtained using an SQL database, or where an RDBMS is undesirable for other reasons. Some users also prefer it for simplicity.

    Of course, many database applications do need ad-hoc queries and many developers will prefer to use SQL. But when you need better performance, or a simpler approach, Berkeley DB will be there to meet that need.

    What we're trying to do with the Persistence API is to make it easy to define and access complex object models, without sacrificing any of the performance advantages that Berkeley DB already gives you.

    Mark
  42. Queries[ Go to top ]

    If you need SQL queries perhaps you could use ZQL http://www.experlog.com/gibello/zql/ to build a SQL layer over the top of Berkeley DB. You would probabaly have to limit the complexity of query, but it is workable. I have created a SQL interface for an XML file (yes I know there is Xpath and XQuery already) as a proof of concept.
  43. some comments[ Go to top ]

    Like most Java programmers, I don't know or care that much about database programming (well I do know a lot about it actually but when programming I don't want to bother much with database specifics). The goal of most persistence APIs is to keep it that way. Let the persistence layer deal with the impedance mismatch, don't bother the Java programmer with database optimizations. The java programmer works with in memory objects, the persistence layer does the difficult job of making sure the objects persist and finding them back. The good ones do this fast and without getting in the way of the Java programmer.

    Assuming this holds true for potential users of your products and APIs, it is safe to assume that the vast majority of your users does not wish to spend a lot of time mastering your API. In fact a lot of them are going to be turned off just by the fact your API is product specific.

    Those are the things you need to deal with. The typical user that will look at your product and API will be a Java developer in need of a persistence layer for his standalone non J2EE application (embedded databases have no place in J2EE other than as a drop in replacement for commercial SQL servers). In other words there are objects that the application uses that need to be persistent. The choice for berkely DB and your API is a performance optimization at the cost of interoperability with other databases.

    So you need to make very clear that A) these performance benefits are very real compared to the many SQL based embedded databases that provide interoperability with standardized persistence layers. B) it is very easy to bridge the conceptual gap between an object oriented program and a berkely DB using the API.

    Good luck.
  44. some comments[ Go to top ]

    The typical user that will look at your product and API will be a Java developer in need of a persistence layer for his standalone non J2EE application (embedded databases have no place in J2EE other than as a drop in replacement for commercial SQL servers).

    I partially agree with this statement. I think that non-SQL databases really don't fit well with EJB, but the same is not true for J2EE as a whole. We have implemented JTA for Berkeley DB, so transactions are fully integrated. This makes implementation of a singleton J2EE service using Berkeley DB straightforward and useful in many cases. Berkeley DB is also useful to complement an RDBMS (as a cache, for example) in J2EE/EJB applications. But I do see your point.
    In other words there are objects that the application uses that need to be persistent. The choice for berkely DB and your API is a performance optimization at the cost of interoperability with other databases.So you need to make very clear that A) these performance benefits are very real compared to the many SQL based embedded databases that provide interoperability with standardized persistence layers. B) it is very easy to bridge the conceptual gap between an object oriented program and a berkely DB using the API. Good luck.

    This makes a lot of sense -- thank you for these comments.

    Performance benchmarks are always problematic, of course, but the performance advantages of Berkeley DB are clear and can be demonstrated.

    I think what you're saying about bridging the conceptual gap is very important and something we need to address in our documentation. We need to present the model for primary and secondary indices more clearly, and show how these map to objects. Thanks for emphasizing this -- we will take your advice seriously.

    Mark
  45. Performance demonstrated?[ Go to top ]

    Performance benchmarks are always problematic, of course, but the performance advantages of Berkeley DB are clear and can be demonstrated.

    Where?
  46. Performance demonstrated?[ Go to top ]

    Performance benchmarks are always problematic, of course, but the performance advantages of Berkeley DB are clear and can be demonstrated.
    Where?

    We've heard that BDB is faster for some of our customers, but obviously your mileage will vary depending on your application.

    In my opinion you shouldn't believe Sleepycat on this, you should do your own comparisons or talk to users of Berkeley DB independently. Performance comparisons, especially with an embedded DB, are sensitive to the data access pattern and how much tuning has been done.

    If you would like to do a performance comparison, Sleepycat will support you in your evaluation and tuning process. Just send an email to support at sleepycat dot com and indicate that you're doing an evaluation.

    Mark
  47. some comments[ Go to top ]

    Like most Java programmers, I don't know or care that much about database programming
    Sad truth.
    don't bother the Java programmer with database optimizations.
    Rather unproductive and unhealthy alienation of Java programmers from “the rest” IMO.
    Assuming this holds true for potential users of your products and APIs, it is safe to assume that the vast majority of your users does not wish to spend a lot of time mastering your API. In fact a lot of them are going to be turned off just by the fact your API is product specific.

    This is rather odd because mastering of a clear and simple API is very easy with the help of a modern IDE that assures correct syntax and types.

    Try to find this level of support for JDOQL, HQL, SQL etc. not to mention that every implementation of a standard has own quirks.
  48. some comments[ Go to top ]

    Like most Java programmers, I don't know or care that much about database programming
    Sad truth.
    don't bother the Java programmer with database optimizations.
    Rather unproductive and unhealthy alienation of Java programmers from “the rest” IMO.

    I second that.
    The oooh-look-at-me-i'm-the-super-in-memory-JAVA-programmer-don't-care-about-no-database-yeah! is such a load of crap and a sign of mediocrity. "I want my API and I want it now and don't make me think about what's actually going on". That's just sad.
  49. Object mapping hardcoded in Java code?[ Go to top ]

    I've checked further your API. I was wondering how do you perform mapping between your DB and Java objects.

    I have an impression (correct me if I'm wrong) that developer has no support from your API for "auto-mapping" and needs to explicitly write "mapping" class that implements Converter interface.

    So, if I have 30 persistent entities, I will have to also write 30 corresponding converters - which is quite a heavy taxation on the developer.

    Then all my mapping information is "hardcoded" in the compiled code...meaning that in order to configure mapping for minor changes I need to do code change and recompile.

    Do you have any external mapping means (like in JDO)?
    Also, do you have any utility API to aid mapping (e.g. POJOs mapping)
  50. Object mapping hardcoded in Java code?[ Go to top ]

    I've checked further your API. I was wondering how do you perform mapping between your DB and Java objects.I have an impression (correct me if I'm wrong) that developer has no support from your API for "auto-mapping" and needs to explicitly write "mapping" class that implements Converter interface.So, if I have 30 persistent entities, I will have to also write 30 corresponding converters - which is quite a heavy taxation on the developer.Then all my mapping information is "hardcoded" in the compiled code...meaning that in order to configure mapping for minor changes I need to do code change and recompile.Do you have any external mapping means (like in JDO)?Also, do you have any utility API to aid mapping (e.g. POJOs mapping)

    I'm sorry if this wasn't clear. All mappings are automatic. You annotate your POJO class with @Entity or @Persistent, and the mapping is done transparently.

    You only need to implement the Converter interface for certain types of class evolution. This is needed when an incompatible class change has been made, and the existing deployed data needs to be converted.

    Mark
  51. I would like to suggest to Sleepycat that they consider using EVS4J, my Apache-licensed pure-Java implementation of the fastest-known reliable multicast protocol with total ordering properties to add support for multi-master replication.

    Note: this is totally unrelated to coherence or other caches. We are talking real concurrency control here, and it's only application to small clusters in data centers, not huge groups.

    Guglielmo

    Enjoy the Fastest Known Reliable Multicast Protocol with Total Ordering
  52. Sleepycat would like to thank everyone who participated in this discussion. Your feedback is invaluable to us and we want you to know that we take your input seriously. We will evaluate what has been discussed here in considering the Persistence API for our next major release of Berkeley DB Java Edition.

    If you have further feedback, questions, or you want to know the status of this project, please either use the bdbje mailing list which you can find at http://dev.sleepycat.com/community/discussion.html) or drop a note to support at sleepycat dot com.


    Thanks again!

    Dave Seqleau
    VP of Engineering
    Sleepycat Software
  53. JDO for SleepyCat[ Go to top ]

    4. The Persistence API does not conform to an existing standard such as JDO. To do so, we believe that both usability and performance would be compromised. Do you consider conformance to a standard to be more important than such compromises?

    Obviously, JDBC and EJB3 are irrelevant to your specific database technology.
    But IMHO you should really consider JDO. Based on my experience of JDO for non-relational data sources (ODBMS, embedded databases and XML) I can tell you you won't compromise usability and performance.
    Would be a very bad idea to start with a new proprietary API.

    BTW: Good luck at Oracle!

    Regards, Eric,
    Xcalia.
  54. A query facility[ Go to top ]

    I have a remark concerning you question 3: Do you consider a high level query facility to be a requirement for a Java persistence solution?

    I wouldn't say that it is a requirement, but if the goal is ease-of-use, I think a query facility, even if it sacrifices some performance, would go a long way. After all, I can always optimize the query by rewriting it for the lower-level API. But if I have a lot query-style access, writing loops over loops seems like quite a hassle.
  55. My Database is faster than yours :)[ Go to top ]

    I am pretty sure your Berkly API DB is faster than other databases, but could you provide some statistics compared to other implementations? Pay a 3rd party company do the comparison between leadning DBs, and Hibernate,JDO,etc..