Sleepycat is requesting feedback from its existing users and potential users on a new Java API for object persistence. This new API has similarities with, and significant differences from, other persistence approaches in Java such as EJB3 Java Persistence, Hibernate, and Java Data Objects (JDO).
Traditionally, Berkeley DB provides the necessary capabilities for creating high performance database applications without imposing a schema or data model. Even its Java APIs for object binding and stored collections are unconstrained by a data model of any sort. This provides maximum flexibility, but does not provide built-in support for quickly defining large and complex models.
The Persistence API adds a built-in persistent object model to the Berkeley DB transactional engine. The design center for this new API is support for complex object models without compromises in performance.
Please take a look at the API starting with the overview of the com.sleepycat.persist package at the link below. This package plus its three subpackages (model, evolve and raw) are new.
Start here: The
Berkeley DB Persistence API
* com.sleepycat.persist
o com.sleepycat.persist.model
o com.sleepycat.persist.evolve
o com.sleepycat.persist.raw
We at Sleepycat are very interested in your reactions, comments, suggestions and other feedback, both positive and negative. In particular we are wondering:
1. If you have one, what is your favorite persistence approach for Java and how would you rate its usability compared to the usability of the Persistence API? What aspects of the Persistence API are more or less usable?
2. The Persistence API makes heavy use of Java 1.5 generics and annotations. Without using these new language features, we believe that usability would be lessened. Do you consider the use of these language features positive or negative, and why?
3. The Persistence API, while it increases usability, does not add a high level query facility. Do you consider a high level query facility to be a requirement for a Java persistence solution?
4. The Persistence API does not conform to an existing standard such as JDO. To do so, we believe that both usability and performance would be compromised. Do you consider conformance to a standard to be more important than such compromises?
If you are not already familiar with the existing Berkeley DB product line, the following background information is important to keep in mind:
* Berkeley DB is an embedded database library, not a database server. By providing a very fast Btree store with fine control over transactions and locking, Berkeley DB applications can be built that outperform applications built using other approaches.
* Berkeley DB does not include a high level query facility. Queries are performed by accessing indices and by using an equality join method. Hand-optimized queries using Berkeley DB can outperform a general purpose query language optimizer.
* Berkeley DB traditionally provides a key-value API for accessing Btree databases. A "database" in Berkeley DB is the equivalent of an SQL table and is represented as a set of key-value pairs. In the Berkeley DB Base API, byte arrays, not objects, are used for keys and values. With the Bind and Collections APIs, keys and values may be mapped to Java objects using a variety of mechanisms.
* Sleepycat has three product lines: The original Berkeley DB, Berkeley DB Java Edition, and Berkeley DB XML. The Persistence API is targeted initially for use with Berkeley DB Java Edition, but may be adapted for use with the original Berkeley DB also at a later date. It is not applicable to Berkeley DB XML, which uses XML and XML Schema as its data model.
Thank you in advance for taking a look at this and for any feedback that you are willing to provide!
The Sleepycat Java Edition team
-
A New Java Persistence API for Berkeley DB (54 messages)
- Posted by: Mark Hayes
- Posted on: February 08 2006 12:10 EST
Threaded Messages (54)
- JSR220 EJB3.JPA by Ruslan Zenin on February 08 2006 12:48 EST
- Why not EJB3.JPA? by Hayes Mark on February 08 2006 13:42 EST
-
Why not EJB3.JPA? by Gavin King on February 08 2006 01:52 EST
- Why not EJB3.JPA? by Hayes Mark on February 08 2006 02:30 EST
-
Why not EJB3.JPA? by Juozas Baliuka on February 08 2006 02:14 EST
-
Why not EJB3.JPA? by Hayes Mark on February 08 2006 03:07 EST
-
Why not EJB3.JPA? by Juozas Baliuka on February 08 2006 03:21 EST
-
Why not EJB3.JPA? by Hayes Mark on February 08 2006 03:41 EST
- Why not EJB3.JPA? by Juozas Baliuka on February 08 2006 04:52 EST
-
Why not EJB3.JPA? by Hayes Mark on February 08 2006 03:41 EST
- Why not EJB3.JPA? by Juozas Baliuka on February 08 2006 03:40 EST
-
Why not EJB3.JPA? by Juozas Baliuka on February 08 2006 03:21 EST
-
no wrappers please by Konstantin Ignatyev on February 08 2006 05:47 EST
-
no wrappers please by Hayes Mark on February 08 2006 07:01 EST
- no wrappers please by Konstantin Ignatyev on February 08 2006 08:54 EST
- no wrappers please by Juozas Baliuka on February 09 2006 01:33 EST
-
no wrappers please by Hayes Mark on February 08 2006 07:01 EST
-
Why not EJB3.JPA? by Hayes Mark on February 08 2006 03:07 EST
- Non-durable identity by Patrick Linskey on February 10 2006 03:22 EST
-
Why not EJB3.JPA? by Gavin King on February 08 2006 01:52 EST
- Why not EJB3.JPA? by Hayes Mark on February 08 2006 13:42 EST
- A New Java Persistence API for Berkeley DB by Ilya Sterin on February 08 2006 13:05 EST
- A New Java Persistence API for Berkeley DB by Ruslan Zenin on February 08 2006 13:25 EST
- A New Java Persistence API for Berkeley DB by Ilya Sterin on February 09 2006 07:10 EST
- A New Java Persistence API for Berkeley DB by Ruslan Zenin on February 08 2006 13:25 EST
- A New Java Persistence API for Berkeley DB by Juozas Baliuka on February 08 2006 13:39 EST
- Relation by ID by Emiliano Marino on February 08 2006 17:02 EST
- Relation by ID by James Watson on February 08 2006 18:16 EST
-
Relation by ID by Dustin Barlow on February 08 2006 08:13 EST
-
Relation by ID by Hayes Mark on February 09 2006 09:08 EST
-
Relation by ID by James Watson on February 09 2006 09:56 EST
- Relation by ID by Radu-Adrian Popescu on February 12 2006 04:52 EST
-
Relation by ID by James Watson on February 09 2006 09:56 EST
-
Relation by ID by Hayes Mark on February 09 2006 09:08 EST
-
Relation by ID by Dustin Barlow on February 08 2006 08:13 EST
- Relation by ID by Hayes Mark on February 08 2006 21:09 EST
- Prmitive collections by Erik van Oosten on February 09 2006 04:07 EST
- Relation by ID by James Watson on February 08 2006 18:16 EST
- Primary and Secondary by James Watson on February 08 2006 18:25 EST
- Primary and Secondary by Hayes Mark on February 08 2006 20:02 EST
-
Primary and Secondary by James Watson on February 09 2006 09:52 EST
- Primary and Secondary by Hayes Mark on February 09 2006 12:17 EST
-
Primary and Secondary by Aaron Evans on February 09 2006 01:04 EST
-
Primary and Secondary by Ruslan Zenin on February 09 2006 01:45 EST
- Primary and Secondary by Cameron Purdy on February 09 2006 03:28 EST
-
Primary and Secondary by Ruslan Zenin on February 09 2006 01:45 EST
-
Primary and Secondary by James Watson on February 09 2006 09:52 EST
- Primary and Secondary by Hayes Mark on February 08 2006 20:02 EST
- Consider JDO API by Kurt Westerfeld on February 08 2006 22:49 EST
- A New Java Persistence API for Berkeley DB by Dmitriy Kiriy on February 09 2006 02:41 EST
- A New Java Persistence API for Berkeley DB by Juozas Baliuka on February 09 2006 05:03 EST
- A New Java Persistence API for Berkeley DB by Juozas Baliuka on February 09 2006 05:17 EST
- A New Java Persistence API for Berkeley DB by Dmitriy Kiriy on February 09 2006 05:35 EST
- Queries by Hayes Mark on February 09 2006 10:30 EST
- Queries by Rob Griffin on February 15 2006 07:39 EST
- some comments by jilles van gurp on February 09 2006 04:53 EST
- some comments by Hayes Mark on February 09 2006 10:53 EST
-
Performance demonstrated? by Michael Newcomb on February 09 2006 02:11 EST
- Performance demonstrated? by Hayes Mark on February 09 2006 02:45 EST
-
Performance demonstrated? by Michael Newcomb on February 09 2006 02:11 EST
- some comments by Konstantin Ignatyev on February 09 2006 11:38 EST
- some comments by Radu-Adrian Popescu on February 12 2006 05:00 EST
- some comments by Hayes Mark on February 09 2006 10:53 EST
- Object mapping hardcoded in Java code? by Ruslan Zenin on February 09 2006 10:40 EST
- Object mapping hardcoded in Java code? by Hayes Mark on February 09 2006 12:38 EST
- In-memory replication for BDB Java Edition by Guglielmo Lichtner on February 09 2006 18:39 EST
- A New Java Persistence API for Berkeley DB by David Segleau on February 13 2006 09:52 EST
- JDO for SleepyCat by Eric Samson on February 16 2006 03:05 EST
- A query facility by Felix Mayer on February 24 2006 15:06 EST
- My Database is faster than yours :) by Jeryl Cook on March 06 2006 11:26 EST
-
JSR220 EJB3.JPA[ Go to top ]
- Posted by: Ruslan Zenin
- Posted on: February 08 2006 12:48 EST
- in response to Mark Hayes
Your annotations look very similar to EJB3.JPA
Have you considered developing EJB3.JPA implementation provider instead of writing your own API?
The spec is on:
http://jcp.org/en/jsr/detail?id=220 -
Why not EJB3.JPA?[ Go to top ]
- Posted by: Hayes Mark
- Posted on: February 08 2006 13:42 EST
- in response to Ruslan Zenin
I'll try to answer the question of why EJB3.JPA was not used, from Sleepycat's perspective. It's a very good question.
EJB3 and Hibernate are excellent tools for accessing an SQL database. However, Berkeley DB is not an SQL database. Berkeley DB has a significant performance advantage because it is an embedded non-SQL database. What we're trying to do with the Persistence API is to increase ease of use without compromising performance in any way.
Why would we need to compromise performance to implement the EJB.JPA spec? It's the extra layer of software and processing between the user API and the database engine. The "persistence context" defined for EJB3.JPA implies the use of an object cache and tracking of object status (detached, dirty, etc).
A persistence context and object cache make a huge amount of sense when connected to a database server. For objects accessed more than once it is much cheaper to access a local cache than to make a round trip to the server. Even more importantly, updates can be queued locally and flushed to the server at transaction commit. So for a typical RDBMS (or OODB) the persistence context improves performance.
But the situation is reversed with Berkeley DB since it always functions as an embedded database. Its low level cache of raw data (byte arrays) can be accessed extremely quickly: we often see very high operation rates per second. And object bindings are fast enough -- especially when bytecode enhancement is used -- that retrieving a record from the embedded cache and instantiating an object is very fast.
So a secondary object cache for Berkeley DB would only use more memory without having any significant performance benefit. And using more memory can cause more I/O if less of the working set fits in memory. Minimizing I/O is a primary goal when it comes to performance tuning.
A telling fact on this issue is that Berkeley DB is itself often used as a front end cache for an RDBMS, because it is so much faster to access data in a local Berkeley DB database.
So overall, we think that the EJB3.JPA is a good API for what it was designed for, but it is not optimal for an embedded non-SQL database.
Mark -
Why not EJB3.JPA?[ Go to top ]
- Posted by: Gavin King
- Posted on: February 08 2006 13:52 EST
- in response to Hayes Mark
It is a reasonable argument but also note that there are other reasons for automagic dirty checking and persistence contexts other than plain performance. For example, dirty checking simplifies code by removing the need for explicit update operations. This can be a big deal in complex apps.
OTOH, I agree that implementing JPA for a persistence mechanism that has no support for ad hoc queries would perhaps be a bit "strange". -
Why not EJB3.JPA?[ Go to top ]
- Posted by: Hayes Mark
- Posted on: February 08 2006 14:30 EST
- in response to Gavin King
It is a reasonable argument but also note that there are other reasons for automagic dirty checking and persistence contexts other than plain performance. For example, dirty checking simplifies code by removing the need for explicit update operations. This can be a big deal in complex apps.OTOH, I agree that implementing JPA for a persistence mechanism that has no support for ad hoc queries would perhaps be a bit "strange".
You're absolutely right that there is an easy of use aspect to the persistent context provided by the EJB3.JPA model.
One way of looking at this issue is to say that objects are fetched and stored by *value* with Berkeley DB, not by *reference* as in the EJB3.JPA approach.
Access by value isn't perfect as you point out: If you retrieve an object twice, you will have two separate instances. To know whether they are equal, you'll have to compare their primary keys. If you change an object's property, you have to remember to store that object explicitly.
But access by value is very simple to understand and, in my opinion at least, easy to use. There is never a question about whether a given instance is "managed" by a persistence manager or not -- it never is.
So I think the by-value and by-reference models both have pros and cons WRT ease of use. Because so many people happily use Hibernate, perhaps the by-reference model has become familiar. I'm very interested to know how important this issue is for users.
Thanks for bringing this issue up.
Mark -
Why not EJB3.JPA?[ Go to top ]
- Posted by: Juozas Baliuka
- Posted on: February 08 2006 14:14 EST
- in response to Hayes Mark
It makes sence, but it is nice to have some stuff for integration (probably JDBC with some popular RDBMS emulation is the most usefull thing). -
Why not EJB3.JPA?[ Go to top ]
- Posted by: Hayes Mark
- Posted on: February 08 2006 15:07 EST
- in response to Juozas Baliuka
It makes sence, but it is nice to have some stuff for integration (probably JDBC with some popular RDBMS emulation is the most usefull thing).
Good point. For example, this would allow it to be used with standard reporting tools.
We have not considered a JDBC emulation layer so far, but perhaps we should consider it for a future release. Thanks for bring it up.
You mention "popular RDBMS emulation" -- do you know of something like this for Java that could be adapted?
Mark -
Why not EJB3.JPA?[ Go to top ]
- Posted by: Juozas Baliuka
- Posted on: February 08 2006 15:21 EST
- in response to Hayes Mark
http://www.swissql.com/sqlone-api.html this is popular emulator. -
Why not EJB3.JPA?[ Go to top ]
- Posted by: Hayes Mark
- Posted on: February 08 2006 15:41 EST
- in response to Juozas Baliuka
http://www.swissql.com/sqlone-api.html this is popular emulator.
I'm sorry, I misunderstood. This kind of emulator translates between different SQL dialects. Berkeley DB does not support SQL, so an emulator like this wouldn't work. But thanks anyway for the pointer.
Mark -
Why not EJB3.JPA?[ Go to top ]
- Posted by: Juozas Baliuka
- Posted on: February 08 2006 16:52 EST
- in response to Hayes Mark
Yes, I am talking about relational query engine implementation and JDBC wrapper. Popular database emulation can help to migrate application (but this problem is solved by migration tools). Tools can adapt driver themself, for example Hibernate uses "Dialect" implementation for vendor specific features.
JDBC is implemented for many backends including object databases, this kind of stuff is popular in ETL http://www.enhydra.org/tech/octopus/index.html. -
Why not EJB3.JPA?[ Go to top ]
- Posted by: Juozas Baliuka
- Posted on: February 08 2006 15:40 EST
- in response to Hayes Mark
JDBC driver is usefull in many ways, probably ETL stuff is a good example (extract data from BDB, transform and load to server for data warehousing stuff). It can help to integrate DBD with popular ORM implementations. JDBC wrapper is usefull for integration, optimized API is usefull for maximum performance. Probably JDO and EJB wrappers are not so usefull. -
no wrappers please[ Go to top ]
- Posted by: Konstantin Ignatyev
- Posted on: February 08 2006 17:47 EST
- in response to Juozas Baliuka
I think strength of the BerkeleyDB is that it does its own business and does it in the optimal way with optimized API for this particular type of persistence.
IMO implementing JDBC, JDO etc. wrappers does not make sense, the next question people will ask after JDBC wrapper implementation: why it does not support zzzz SQL construct? And then consider BDB as 'bad' SQL database.... -
no wrappers please[ Go to top ]
- Posted by: Hayes Mark
- Posted on: February 08 2006 19:01 EST
- in response to Konstantin Ignatyev
I think strength of the BerkeleyDB is that it does its own business and does it in the optimal way with optimized API for this particular type of persistence.IMO implementing JDBC, JDO etc. wrappers does not make sense, the next question people will ask after JDBC wrapper implementation: why it does not support zzzz SQL construct? And then consider BDB as 'bad' SQL database....
I appreciate your comment very much, and this is one of the reasons that we have not gone down the path of providing a JDO, EJB3, or SQL interface. Thanks for confirming this!
Juozas Baliuka does have a point, however. Perhaps a minimal read-only JDBC interface would not have a high cost to develop and maintain, but would open up interoperability with reporting tools, etc. This is somewhat attractive because the Persistence API does define a schema, and that schema could be exposed via such a read-only JDBC interface.
OTOH perhaps this would only cause requests for better SQL support, etc, etc, as you say. I'm very interested in your opinions about this.
(Caveat: This is not something we have discussed at Sleepycat, so I'm just gathering input at this point.)
Mark -
no wrappers please[ Go to top ]
- Posted by: Konstantin Ignatyev
- Posted on: February 08 2006 20:54 EST
- in response to Hayes Mark
Ability to use existing reporting tools via JDBC interface definitely looks attractive. But I think that returning schema information in DatabaseMetadata is one thing and parsing SQL requests from those tools and returning JDBC compatible data is another business.
I think you can make better judgment if you can support such SQL interface.
Maybe a bit of education/evangelizing could help breaking that mental link:: persistence->sql->RDBMS :) -
no wrappers please[ Go to top ]
- Posted by: Juozas Baliuka
- Posted on: February 09 2006 01:33 EST
- in response to Konstantin Ignatyev
I think strength of the BerkeleyDB is that it does its own business and does it in the optimal way with optimized API for this particular type of persistence.IMO implementing JDBC, JDO etc. wrappers does not make sense, the next question people will ask after JDBC wrapper implementation: why it does not support zzzz SQL construct? And then consider BDB as 'bad' SQL database....
I am not marketing expert, but it it is possible to relese wrapper as separate product or brand to solve this "problem", but if it is very usefull then somebody else will do it anyway. -
Non-durable identity[ Go to top ]
- Posted by: Patrick Linskey
- Posted on: February 10 2006 03:22 EST
- in response to Hayes Mark
The "persistence context" defined for EJB3.JPA implies the use of an object cache and tracking of object status (detached, dirty, etc).A persistence context and object cache make a huge amount of sense when connected to a database server. For objects accessed more than once it is much cheaper to access a local cache than to make a round trip to the server. Even more importantly, updates can be queued locally and flushed to the server at transaction commit.
Hi Mark,
If you guys become interested in standards, you should take a look at JDO's non-durable identity. It was designed for more-or-less the use case you're talking about here.
-Patrick
--
Patrick Linskey
http://bea.com -
A New Java Persistence API for Berkeley DB[ Go to top ]
- Posted by: Ilya Sterin
- Posted on: February 08 2006 13:05 EST
- in response to Mark Hayes
Why are they wasting their time and our time, why not just join the EJB3 spec :-) :-) Don't they know that the war is lost, EJB3 and Hibernate won. :-)
Ilya -
A New Java Persistence API for Berkeley DB[ Go to top ]
- Posted by: Ruslan Zenin
- Posted on: February 08 2006 13:25 EST
- in response to Ilya Sterin
Apparently it is still not known for some groups of people.
Here are some resources for reading:
Interview with Craig Russell
http://www.jdocentral.com/JDO_Commentary_CraigRussell_3.html
Persistence FAQ:
http://java.sun.com/j2ee/persistence/faq.html -
A New Java Persistence API for Berkeley DB[ Go to top ]
- Posted by: Ilya Sterin
- Posted on: February 09 2006 07:10 EST
- in response to Ruslan Zenin
Apparently it is still not known for some groups of people. Here are some resources for reading:Interview with Craig Russellhttp://www.jdocentral.com/JDO_Commentary_CraigRussell_3.htmlPersistence FAQ:http://java.sun.com/j2ee/persistence/faq.html
I was actually being sarcastic, since the last few weeks we have people coming out of the woods screaming why some open source software projects exists and wanting all to merge into monopolies.
Ilya -
A New Java Persistence API for Berkeley DB[ Go to top ]
- Posted by: Juozas Baliuka
- Posted on: February 08 2006 13:39 EST
- in response to Mark Hayes
Just implement mapping engine and wrapp it with JDO,EJB,ODMG or implement JDBC driver and it will be wrapped automaticaly by JDBC based ORM implementations. -
Relation by ID[ Go to top ]
- Posted by: Emiliano Marino
- Posted on: February 08 2006 17:02 EST
- in response to Mark Hayes
Hi,
I saw in the example of the API documentation that all relation between objects is made by ID (for example Person does not have a reference to Employer object, just its ID).
I think you stay like this approach because you try to keep it simple, and not enter in the complexity of store object trees and all the stuff of retrieving in levels? That's ok for me.
I would suggest, in my humble opinion, that you allow to support the use of Collections, Sets, Maps, of primitive numbers. I know is not an standard of Collections API. (Apache commons i think has some API of this kind of collections)
I think the overuse of create numberic objects to search, store, retrieve, generates too many garbage and i think i would a lot faster (that's what you're looking always?) if you minimize the creation and garbage collection of objects using just primitives numbers.
Thanks. -
Relation by ID[ Go to top ]
- Posted by: James Watson
- Posted on: February 08 2006 18:16 EST
- in response to Emiliano Marino
I think the overuse of create numberic objects to search, store, retrieve, generates too many garbage and i think i would a lot faster (that's what you're looking always?) if you minimize the creation and garbage collection of objects using just primitives numbers.Thanks.
The cost of a short lived Object is pretty small in 1.5. In 1.6 most of these won't even create garbage but will be allocated on the stack. It's unlikely to be worth the effort. -
Relation by ID[ Go to top ]
- Posted by: Dustin Barlow
- Posted on: February 08 2006 20:13 EST
- in response to James Watson
The cost of a short lived Object is pretty small in 1.5. In 1.6 most of these won't even create garbage but will be allocated on the stack. It's unlikely to be worth the effort.
GC is not the only concern. Object creation overhead is another. If you have to access millions of rows of data, creating millions of Objects is a performance hit.
I can see the BerkleyDB being the backend for a quick calcuation engines similiar to OLAP but w/o the data explosion. -
Relation by ID[ Go to top ]
- Posted by: Hayes Mark
- Posted on: February 09 2006 09:08 EST
- in response to Dustin Barlow
GC is not the only concern. Object creation overhead is another. If you have to access millions of rows of data, creating millions of Objects is a performance hit.
In our experience profiling and optimizing Berkeley DB Java Edition, we have not found object creation itself to be a significant factor, especially for Java 1.5 and 1.6. Although this is non-intuitive, Sun has been saying this all along, and in this case they seem to be right.I can see the BerkleyDB being the backend for a quick calcuation engines similiar to OLAP but w/o the data explosion.
Yes, I think this is a good application for Berkeley DB.
Mark -
Relation by ID[ Go to top ]
- Posted by: James Watson
- Posted on: February 09 2006 09:56 EST
- in response to Hayes Mark
GC is not the only concern. Object creation overhead is another. If you have to access millions of rows of data, creating millions of Objects is a performance hit.
In our experience profiling and optimizing Berkeley DB Java Edition, we have not found object creation itself to be a significant factor, especially for Java 1.5 and 1.6. Although this is non-intuitive, Sun has been saying this all along, and in this case they seem to be right.
From what I understand, Object allocation and deallocation in modern JVMs is much faster than in C so the cost of temporary Objects is low. Also, if you use autoboxing, there is a pool of low value integers. Probably not relevant in this context but good to know all the same. -
Relation by ID[ Go to top ]
- Posted by: Radu-Adrian Popescu
- Posted on: February 12 2006 04:52 EST
- in response to James Watson
From what I understand, Object allocation and deallocation in modern JVMs is much faster than in C so the cost of temporary Objects is low.
There's no such thing as C. While on one hand "modern JVMs" means something (the Sun VM, the BEA VM, the GNU Java runtime etc), C is nothing but a language. There's also another languages, related to it, that's called C++ and it's debateable whether the two have more in common than things setting them appart. As I'm sure you're aware, there's compilers and runtimes for C and C++ as well, and they're all terribly different, even on the same architecture and OS; they're also different between releases of the same OS. You simply can't state something like "Java [or even Sun JVM] object allocation is faster than C's". Moreover what sort of allocation is this referring to? Stack or heap, cause you know, C and C++ support both (actually structs in C). So which one is it? And compared to what C runtime?
But then again, if Brian Goetz wrote an article on it then it must be true, right, and it's so much easier to just blindly eat up everything you're served as long as it fits your view of the world, as long as it feels like a friendly pat on the back. -
Relation by ID[ Go to top ]
- Posted by: Hayes Mark
- Posted on: February 08 2006 21:09 EST
- in response to Emiliano Marino
I would suggest, in my humble opinion, that you allow to support the use of Collections, Sets, Maps, of primitive numbers. I know is not an standard of Collections API. (Apache commons i think has some API of this kind of collections)I think the overuse of create numberic objects to search, store, retrieve, generates too many garbage and i think i would a lot faster (that's what you're looking always?) if you minimize the creation and garbage collection of objects using just primitives numbers.Thanks.
I was just looking at the Jakarta Commons Collections API and I can't find collections that store primitives as such -- can you point me to where you've seen these?
In any case, the only requirement for one-to-many or many-to-many key collections is that they implement the java.util.Collection interface and that they are @Persistent. So if you have an efficient collection you'd like to use, as long as it implements Collection you can use it.
If you want to use a collection class in a 3rd party library, then of course the collection class won't be annotated with @Persistent. To solve this, you can use a PersistentProxy as described here:
http://dev.sleepycat.com/je-persist-review/java/com/sleepycat/persist/model/PersistentProxy.html
Mark -
Prmitive collections[ Go to top ]
- Posted by: Erik van Oosten
- Posted on: February 09 2006 04:07 EST
- in response to Hayes Mark
The only primitive collections I know of is implemented by Sebastiano Vigna at http://fastutil.dsi.unimi.it/.
It does not support generics but other then that it is extremly complete. It has several implementations for Sets, Maps, Lists and associated iterators and all for any primitive/Object combination. It makes the jar Huge (8Mb).
The website reports the library is optimized for huge collections. I have succesfully used it for moderately sized maps (200000 items). -
Primary and Secondary[ Go to top ]
- Posted by: James Watson
- Posted on: February 08 2006 18:25 EST
- in response to Mark Hayes
I'm curious why there is a PrimaryIndex class and a SecondaryIndex class. Wouldn't it be more elegant, less verbose, and more flexible to just have an Index class that has a getSubIndex method? Is there a special reason that the API only allows two levels of indexes or am I just missing something? -
Primary and Secondary[ Go to top ]
- Posted by: Hayes Mark
- Posted on: February 08 2006 20:02 EST
- in response to James Watson
I'm curious why there is a PrimaryIndex class and a SecondaryIndex class. Wouldn't it be more elegant, less verbose, and more flexible to just have an Index class that has a getSubIndex method? Is there a special reason that the API only allows two levels of indexes or am I just missing something?
Good question. I'll try to explain the reasoning behind the class hierarchy, and please tell me if it makes sense.
For example, take this class:
@Entity
class Person {
@PrimaryKey
long id;
@SecondaryKey(relate=MANY_TO_ONE)
String name;
}
There would be a PersonByID primary index ordered by id and a PersonByName secondary index ordered by name.
There are several rules about primary and secondary indices:
1. A primary index must have unique keys (each person has a unique id in the example). A secondary index may have non-unique keys (there could be more than one person with the same name in the example).
2. Records may be inserted into a primary index, but not into a secondary index. Secondary index records are maintained automatically by the engine as primary records are inserted, updated and deleted.
3. Because of the two rules above, you cannot have a secondary index that is associated with another secondary index. A secondary must be associated with a primary.
Therefore, the PrimaryIndex and SecondaryIndex classes have differences and similarities.
In the class hierarchy, their similarities are captured in the EntityIndex interface, which is implemented by both classes. EntityIndex allows all kinds of index traversal and queries by key. It does not allow record insertion or update.
PrimaryIndex implements EntityIndex and adds methods to allow insertion and update.
SecondaryIndex implements EntityIndex and adds methods to support two special access methods that only make sense for secondary indices:
+ The keysIndex method is for traversing keys only, without retrieving the primary record at all to improve performance. This doesn't apply to a primary index.
+ The subIndex method is for accessing the subset of entities having a given secondary key (duplicates). This does not apply to a primary index because primaries must have unique keys.
Does this make sense?
Mark -
Primary and Secondary[ Go to top ]
- Posted by: James Watson
- Posted on: February 09 2006 09:52 EST
- in response to Hayes Mark
The subIndex method is for accessing the subset of entities having a given secondary key (duplicates). This does not apply to a primary index because primaries must have unique keys.Does this make sense?Mark
Yeah, I figured there was a reason, I just didn't see what it during my cursory look at the API. I just get a icky feeling when I see classes named Something1 and Something2 or in that general form. Maybe I would have come up with something different if I had done but I might have come to the same conclusion.
One thing that I find difficult about this API is that the terms are not clear to me like 'evolve'. While (from what I see) this seems very interesting, I feel like it would take a lot of work to understand the DB before I could even start using this API. Perhaps is because I am not familiar with this kind of DB.
In 10 words or less, why should I use your DB? -
Primary and Secondary[ Go to top ]
- Posted by: Hayes Mark
- Posted on: February 09 2006 12:17 EST
- in response to James Watson
One thing that I find difficult about this API is that the terms are not clear to me like 'evolve'. While (from what I see) this seems very interesting, I feel like it would take a lot of work to understand the DB before I could even start using this API. Perhaps is because I am not familiar with this kind of DB.
Please don't let the class evolution features detract from the usability of the API. We put these features into a separate package because they are optional and they can certainly be ignored initially. We will emphasize this is the documentation.
In general, class evolution addresses the need to change your class definitions after you have deployed your application. If the existing stored data is not compatible with the new class definitions, converting the exiting data is necessary. Using this feature is important if you cannot easily recreate the data from another source.
The evolve package makes this conversion easier, and more efficient. By using the mutation classes, conversion of existing data can be performed lazily and transparently. This avoids downtime while converting a large database.
Although it would nice to avoid this problem entirely by not changing classes incompatibly, for many applications these types of changes are a fact of life. What we've tried to do is to explicitly address this, rather than leaving it as a problem for the user to deal with.In 10 words or less, why should I use your DB?
Hm, ok, only 10 words, I'll try: It outperforms other databases, is scalable, reliable, transactional and simple.
Mark -
Primary and Secondary[ Go to top ]
- Posted by: Aaron Evans
- Posted on: February 09 2006 13:04 EST
- in response to James Watson
I don't speak for or know much about BDB, but I can sum up why you'd want to use it in two words:
"huge maps"
Too many people are thrown off by the letters "DB" which they associate the acronym RDBMS, which they are intimidated by.
But I think nearly all of these same developers have seen instances of applications where BDB could be used. How many people have run up against bottlenecks iterating through lists that got bigger in production than conceived in development? How many people have paid for expensive caching solutions (or maintained nightmare roll-your-own caches) for lists of data. How many file-based solutions are out there because an app needs to store data but has to be smaller than even an embedded DB like HSQL?
These are areas where BDB shines. Don't you wish you didn't have to read a big file on startup and parse the lines? Don't you wish you had a reliable caching mechanism that was as easy to use as a HashMap? Don't you wish you could scale a solution that has outgrown it's data handling capabilities without rewriting the whole thing from scratch? -
Primary and Secondary[ Go to top ]
- Posted by: Ruslan Zenin
- Posted on: February 09 2006 13:45 EST
- in response to Aaron Evans
"huge maps"
...
Don't you wish you had a reliable caching mechanism that was as easy to use as a HashMap? Don't you wish you could scale a solution that has outgrown it's data handling capabilities without rewriting the whole thing from scratch?
Then the next question is: How does this BDB compares to Coherence (Distributed Cache) found @ http://www.tangosol.com/coherence-overview.jsp -
Primary and Secondary[ Go to top ]
- Posted by: Cameron Purdy
- Posted on: February 09 2006 15:28 EST
- in response to Ruslan Zenin
"huge maps"...Don't you wish you had a reliable caching mechanism that was as easy to use as a HashMap? Don't you wish you could scale a solution that has outgrown it's data handling capabilities without rewriting the whole thing from scratch?
Then the next question is: How does this BDB compares to Coherence (Distributed Cache) found @ http://www.tangosol.com/coherence-overview.jsp
A couple quick points before I carefully side-step the question:
1. SleepyCat (developer of BerkeleyDB) is a partner of ours.
2. Some "very big" financial services firms are joint customers, and pushed Tangosol and SleepyCat to work together.
3. Coherence 3.1 supports BerkeleyDB as a disk store. The BerkeleyDB implementation is fairly high performance, and is definitely faster than the built-in disk store that Coherence has.
Now, to try to handle the question:
1. Coherence is focused on in-memory caching, but it can do pure disk caching or mixed memory/disk caching ("overflow caching"). We do *not* focus on single-node usage, with our median deployment size being around 16 nodes and large deployments a "couple orders of magnitude" larger.
2. BerkeleyDB is good at keeping data safe even when an app isn't running (i.e. on disk in a resilient format). Coherence is good at keeping data safe when the app *is* running, i.e. when the data is only in memory and servers die in the middle of a two-phase commit.
3. Coherence doesn't tend to use (and certainly doesn't rely on) shared disk (SAN, NAS, etc.) .. Coherence is basically a RAID implementation for objects implemented in a grid environment.
So it's really apples and oranges. On pure disk speed for a single node, use BerkeleyDB. For files shared from a shared disk, use BerkeleyDB. For persistent data, use BerkeleyDB.
For clustering, for shared memory, for coherent caching, for data grids, for information fabrics .. use Coherence.
If you need both, I guarantee that we work well with BerkeleyDB and the joint solution rocks ;-)
Peace,
Cameron Purdy
Tangosol Coherence: Clustered Shared Memory for Java -
Consider JDO API[ Go to top ]
- Posted by: Kurt Westerfeld
- Posted on: February 08 2006 22:49 EST
- in response to Mark Hayes
I realize you may find it passe, but the JDO 1.0 spec would be interesting, because you wouldn't have the same objections as the EJB3 or Hibernate APIs (ie. second level caching). You would have makePersistent, and such, but avoid JDOQL altogether (or perhaps implement it using janino or something similar).
Just a thought.
BTW, I believe dirty checking of persistent entities is extremely important. I don't think you can avoid this. -
A New Java Persistence API for Berkeley DB[ Go to top ]
- Posted by: Dmitriy Kiriy
- Posted on: February 09 2006 02:41 EST
- in response to Mark Hayes
I think, that implementing EJB 3.0, in particular EJQ QL, could not heart perfomance. At the moment, if we want to execute
SELECT FROM Orders where customerName = 'Jonh'
what should we do? Iterate over all orders and check customersName field? Definetely bad idea for me.
How that should be done in Berkley Java DB? Abybody from Berkeley Team? -
A New Java Persistence API for Berkeley DB[ Go to top ]
- Posted by: Juozas Baliuka
- Posted on: February 09 2006 05:03 EST
- in response to Dmitriy Kiriy
It is trivial example, see "SecondaryIndex" stuff. -
A New Java Persistence API for Berkeley DB[ Go to top ]
- Posted by: Juozas Baliuka
- Posted on: February 09 2006 05:17 EST
- in response to Dmitriy Kiriy
This must be more "interesting" to implement manualy "SELECT FROM Orders where customerName = 'Jonh' and customerEmail = 'jonh at yahoo dot com' or ..." to find the "best" index or to use no index (it depends on index selectivity). -
A New Java Persistence API for Berkeley DB[ Go to top ]
- Posted by: Dmitriy Kiriy
- Posted on: February 09 2006 05:35 EST
- in response to Juozas Baliuka
This must be more "interesting" to implement manualy "SELECT FROM Orders where customerName = 'Jonh' and customerEmail = 'jonh at yahoo dot com' or ..." to find the "best" index or to use no index (it depends on index selectivity).
Yeah. So, if fact, for complex queries you become "Query Optimizer". I don't want to be it. -
Queries[ Go to top ]
- Posted by: Hayes Mark
- Posted on: February 09 2006 10:30 EST
- in response to Dmitriy Kiriy
There were several posts about queries and secondary indices.
It is true that Berkeley DB does not have a query language, and therefore does not support ad-hoc queries. And if your query uses a secondary key, it is up to you to use the secondary index.SELECT FROM Orders where customerName = 'Jonh'
For example, you can compare the code to execute the above SQL query and get the results object using EJB3.JPA, to the following code using the proposed Berkeley DB API:
EntityCursor<Order> orders =
ordersByCustomerName.subIndex("John").entities();
Or, if you are performing an equality join then you need to use the EntityJoin object:
http://dev.sleepycat.com/je-persist-review/java/com/sleepycat/persist/EntityJoin.html
For complex queries with lots of conditions, instead of SQL you will need to write procedural code that iterates through results and performs comparisons.
If you are accustomed to using SQL, this probably seems strange. However, if you are accustomed to using the Java Collections framework and similar APIs, if you try the Berkeley DB API you may find it simple and straightforward.
In terms of performance, when you are writing a query using Berkeley DB you can think of it as if you were writing a stored procedure in an RDBMS. Because Berkeley DB is an embedded database and there is no intermediate query language, the ordersByCustomerName object provides direct access to the Btree for that secondary index.
The performance advantage of this approach is quite significant. But of course, you should determine that for yourself.
Berkeley DB is not intended to be the tool for all jobs. It is not intended to be used where ad-hoc SQL queries are required, or where an RDBMS is required for other reasons.
It is intended to be used where you need better performance than can be obtained using an SQL database, or where an RDBMS is undesirable for other reasons. Some users also prefer it for simplicity.
Of course, many database applications do need ad-hoc queries and many developers will prefer to use SQL. But when you need better performance, or a simpler approach, Berkeley DB will be there to meet that need.
What we're trying to do with the Persistence API is to make it easy to define and access complex object models, without sacrificing any of the performance advantages that Berkeley DB already gives you.
Mark -
Queries[ Go to top ]
- Posted by: Rob Griffin
- Posted on: February 15 2006 19:39 EST
- in response to Hayes Mark
If you need SQL queries perhaps you could use ZQL http://www.experlog.com/gibello/zql/ to build a SQL layer over the top of Berkeley DB. You would probabaly have to limit the complexity of query, but it is workable. I have created a SQL interface for an XML file (yes I know there is Xpath and XQuery already) as a proof of concept. -
some comments[ Go to top ]
- Posted by: jilles van gurp
- Posted on: February 09 2006 04:53 EST
- in response to Mark Hayes
Like most Java programmers, I don't know or care that much about database programming (well I do know a lot about it actually but when programming I don't want to bother much with database specifics). The goal of most persistence APIs is to keep it that way. Let the persistence layer deal with the impedance mismatch, don't bother the Java programmer with database optimizations. The java programmer works with in memory objects, the persistence layer does the difficult job of making sure the objects persist and finding them back. The good ones do this fast and without getting in the way of the Java programmer.
Assuming this holds true for potential users of your products and APIs, it is safe to assume that the vast majority of your users does not wish to spend a lot of time mastering your API. In fact a lot of them are going to be turned off just by the fact your API is product specific.
Those are the things you need to deal with. The typical user that will look at your product and API will be a Java developer in need of a persistence layer for his standalone non J2EE application (embedded databases have no place in J2EE other than as a drop in replacement for commercial SQL servers). In other words there are objects that the application uses that need to be persistent. The choice for berkely DB and your API is a performance optimization at the cost of interoperability with other databases.
So you need to make very clear that A) these performance benefits are very real compared to the many SQL based embedded databases that provide interoperability with standardized persistence layers. B) it is very easy to bridge the conceptual gap between an object oriented program and a berkely DB using the API.
Good luck. -
some comments[ Go to top ]
- Posted by: Hayes Mark
- Posted on: February 09 2006 10:53 EST
- in response to jilles van gurp
The typical user that will look at your product and API will be a Java developer in need of a persistence layer for his standalone non J2EE application (embedded databases have no place in J2EE other than as a drop in replacement for commercial SQL servers).
I partially agree with this statement. I think that non-SQL databases really don't fit well with EJB, but the same is not true for J2EE as a whole. We have implemented JTA for Berkeley DB, so transactions are fully integrated. This makes implementation of a singleton J2EE service using Berkeley DB straightforward and useful in many cases. Berkeley DB is also useful to complement an RDBMS (as a cache, for example) in J2EE/EJB applications. But I do see your point.In other words there are objects that the application uses that need to be persistent. The choice for berkely DB and your API is a performance optimization at the cost of interoperability with other databases.So you need to make very clear that A) these performance benefits are very real compared to the many SQL based embedded databases that provide interoperability with standardized persistence layers. B) it is very easy to bridge the conceptual gap between an object oriented program and a berkely DB using the API. Good luck.
This makes a lot of sense -- thank you for these comments.
Performance benchmarks are always problematic, of course, but the performance advantages of Berkeley DB are clear and can be demonstrated.
I think what you're saying about bridging the conceptual gap is very important and something we need to address in our documentation. We need to present the model for primary and secondary indices more clearly, and show how these map to objects. Thanks for emphasizing this -- we will take your advice seriously.
Mark -
Performance demonstrated?[ Go to top ]
- Posted by: Michael Newcomb
- Posted on: February 09 2006 14:11 EST
- in response to Hayes Mark
Performance benchmarks are always problematic, of course, but the performance advantages of Berkeley DB are clear and can be demonstrated.
Where? -
Performance demonstrated?[ Go to top ]
- Posted by: Hayes Mark
- Posted on: February 09 2006 14:45 EST
- in response to Michael Newcomb
Performance benchmarks are always problematic, of course, but the performance advantages of Berkeley DB are clear and can be demonstrated.
Where?
We've heard that BDB is faster for some of our customers, but obviously your mileage will vary depending on your application.
In my opinion you shouldn't believe Sleepycat on this, you should do your own comparisons or talk to users of Berkeley DB independently. Performance comparisons, especially with an embedded DB, are sensitive to the data access pattern and how much tuning has been done.
If you would like to do a performance comparison, Sleepycat will support you in your evaluation and tuning process. Just send an email to support at sleepycat dot com and indicate that you're doing an evaluation.
Mark -
some comments[ Go to top ]
- Posted by: Konstantin Ignatyev
- Posted on: February 09 2006 11:38 EST
- in response to jilles van gurp
Like most Java programmers, I don't know or care that much about database programming
Sad truth.don't bother the Java programmer with database optimizations.
Rather unproductive and unhealthy alienation of Java programmers from “the rest” IMO.Assuming this holds true for potential users of your products and APIs, it is safe to assume that the vast majority of your users does not wish to spend a lot of time mastering your API. In fact a lot of them are going to be turned off just by the fact your API is product specific.
This is rather odd because mastering of a clear and simple API is very easy with the help of a modern IDE that assures correct syntax and types.
Try to find this level of support for JDOQL, HQL, SQL etc. not to mention that every implementation of a standard has own quirks. -
some comments[ Go to top ]
- Posted by: Radu-Adrian Popescu
- Posted on: February 12 2006 05:00 EST
- in response to Konstantin Ignatyev
Like most Java programmers, I don't know or care that much about database programming
Sad truth.don't bother the Java programmer with database optimizations.
Rather unproductive and unhealthy alienation of Java programmers from “the rest” IMO.
I second that.
The oooh-look-at-me-i'm-the-super-in-memory-JAVA-programmer-don't-care-about-no-database-yeah! is such a load of crap and a sign of mediocrity. "I want my API and I want it now and don't make me think about what's actually going on". That's just sad. -
Object mapping hardcoded in Java code?[ Go to top ]
- Posted by: Ruslan Zenin
- Posted on: February 09 2006 10:40 EST
- in response to Mark Hayes
I've checked further your API. I was wondering how do you perform mapping between your DB and Java objects.
I have an impression (correct me if I'm wrong) that developer has no support from your API for "auto-mapping" and needs to explicitly write "mapping" class that implements Converter interface.
So, if I have 30 persistent entities, I will have to also write 30 corresponding converters - which is quite a heavy taxation on the developer.
Then all my mapping information is "hardcoded" in the compiled code...meaning that in order to configure mapping for minor changes I need to do code change and recompile.
Do you have any external mapping means (like in JDO)?
Also, do you have any utility API to aid mapping (e.g. POJOs mapping) -
Object mapping hardcoded in Java code?[ Go to top ]
- Posted by: Hayes Mark
- Posted on: February 09 2006 12:38 EST
- in response to Ruslan Zenin
I've checked further your API. I was wondering how do you perform mapping between your DB and Java objects.I have an impression (correct me if I'm wrong) that developer has no support from your API for "auto-mapping" and needs to explicitly write "mapping" class that implements Converter interface.So, if I have 30 persistent entities, I will have to also write 30 corresponding converters - which is quite a heavy taxation on the developer.Then all my mapping information is "hardcoded" in the compiled code...meaning that in order to configure mapping for minor changes I need to do code change and recompile.Do you have any external mapping means (like in JDO)?Also, do you have any utility API to aid mapping (e.g. POJOs mapping)
I'm sorry if this wasn't clear. All mappings are automatic. You annotate your POJO class with @Entity or @Persistent, and the mapping is done transparently.
You only need to implement the Converter interface for certain types of class evolution. This is needed when an incompatible class change has been made, and the existing deployed data needs to be converted.
Mark -
In-memory replication for BDB Java Edition[ Go to top ]
- Posted by: Guglielmo Lichtner
- Posted on: February 09 2006 18:39 EST
- in response to Mark Hayes
I would like to suggest to Sleepycat that they consider using EVS4J, my Apache-licensed pure-Java implementation of the fastest-known reliable multicast protocol with total ordering properties to add support for multi-master replication.
Note: this is totally unrelated to coherence or other caches. We are talking real concurrency control here, and it's only application to small clusters in data centers, not huge groups.
Guglielmo
Enjoy the Fastest Known Reliable Multicast Protocol with Total Ordering -
A New Java Persistence API for Berkeley DB[ Go to top ]
- Posted by: David Segleau
- Posted on: February 13 2006 09:52 EST
- in response to Mark Hayes
Sleepycat would like to thank everyone who participated in this discussion. Your feedback is invaluable to us and we want you to know that we take your input seriously. We will evaluate what has been discussed here in considering the Persistence API for our next major release of Berkeley DB Java Edition.
If you have further feedback, questions, or you want to know the status of this project, please either use the bdbje mailing list which you can find at http://dev.sleepycat.com/community/discussion.html) or drop a note to support at sleepycat dot com.
Thanks again!
Dave Seqleau
VP of Engineering
Sleepycat Software -
JDO for SleepyCat[ Go to top ]
- Posted by: Eric Samson
- Posted on: February 16 2006 03:05 EST
- in response to Mark Hayes
4. The Persistence API does not conform to an existing standard such as JDO. To do so, we believe that both usability and performance would be compromised. Do you consider conformance to a standard to be more important than such compromises?
Obviously, JDBC and EJB3 are irrelevant to your specific database technology.
But IMHO you should really consider JDO. Based on my experience of JDO for non-relational data sources (ODBMS, embedded databases and XML) I can tell you you won't compromise usability and performance.
Would be a very bad idea to start with a new proprietary API.
BTW: Good luck at Oracle!
Regards, Eric,
Xcalia. -
A query facility[ Go to top ]
- Posted by: Felix Mayer
- Posted on: February 24 2006 15:06 EST
- in response to Mark Hayes
I have a remark concerning you question 3: Do you consider a high level query facility to be a requirement for a Java persistence solution?
I wouldn't say that it is a requirement, but if the goal is ease-of-use, I think a query facility, even if it sacrifices some performance, would go a long way. After all, I can always optimize the query by rewriting it for the lower-level API. But if I have a lot query-style access, writing loops over loops seems like quite a hassle. -
My Database is faster than yours :)[ Go to top ]
- Posted by: Jeryl Cook
- Posted on: March 06 2006 11:26 EST
- in response to Mark Hayes
I am pretty sure your Berkly API DB is faster than other databases, but could you provide some statistics compared to other implementations? Pay a 3rd party company do the comparison between leadning DBs, and Hibernate,JDO,etc..