News: A criticism of Java Data Objects ( JDO)
JDO has received a lot of attention since the 1.0 release, with both good publicity and a lot of bad publicity from many (particularly O/R tool vendors). Now Carl Rosenberger (head of the DB40 object persistence project) has jumped in on the bandwagon, and delivered a scathing criticism of the JDO spec.
- Posted by: Michael Azzi
- Posted on: August 19 2001 21:35 EDT
He criticized the spec as being overly complex, a rehash of old ideas that never worked, and with no new fresh thinking put into it, calling for a complete rewrite
of the whole spec.
I am no expert on transparent persistence but the guy seems to raise some interesting issues, and I would like to know what other people think about them.
You can read the whole article here
- Here We Go Again JDO by Floyd Marinescu on August 20 2001 11:10 EDT
- Discussion continued in comp.databases.object by Carl Rosenberger on August 21 2001 07:01 EDT
- A criticism of Java Data Objects ( JDO) by Bryan Headley on August 21 2001 14:18 EDT
- A criticism of Java Data Objects ( JDO) by Peter Nelson on August 23 2001 20:42 EDT
- A criticism of Java Data Objects ( JDO) by Mohamed Ramadan on July 22 2002 04:31 EDT
JDO Spec Lead Craig Russel responded to this post in another forum:
As the author (not the inventor) of the JDO specification, I'd like to respond.
> A direct interface to store objects needs to be simpler than SQL, not more
> complicated, otherwise it will never find it's place in the database world.
> JDO is anything but simple and concise.
JDO concerns itself with two categories of class developers: those that develop persistence-capable classes and those that develop persistence-aware classes. It is well understood that developers who know they are developing for databases (persistence-aware) need to deal with some of the complexities of talking to databases: you have to know which database you are trying to connect to, which classes you are trying to query, etc. For these people, I won't claim that JDO is simple, but it is not more complex than the alternatives.
Most of the database specific stuff is abstracted to things familiar to JDBC users: ConnectionURL, ConnectionUserName, ConnectionPassword. But JDO is arguably simpler once you have access to a database. You can query for instances of persistence-capable classes with only the knowledge of the Java class and fields. Contrast this with JDBC where you have to know lots of details about the structure of the database itself.
For persistence-capable class developers, I'd claim JDO has a substantial advantage over any other persistence capability. You can often implement Java persistence-capable classes without any knowledge whatever of database. No mapping, no type conversions, no special read methods, no special write methods.
With this level of transparency, JDO still allows for highly scalable solutions, while solving the problem of the natural closure of data store objects. In most databases, getting the closure of an object is often the entire database. An employee has a reference to a department, which has references to the company, and the company contains all of the departments, all employees, all projects, etc. JDO allows for defining the object model to navigate the application specific closure without
requiring application-specific data models.
> The failure of old-time object databases to win a noticeable market share
> proves the necessity to introduce clean and simple concepts quite clearly:
I think there are many reasons why old-time object databases failed to win market share. Picking one is interesting but the story is much more complex.
> Object database vendors tried to jump two paradigm steps at a time: storing
> objects and transparent persistency. What they seem to have forgotten:
> Persistency can never be truly transparent. Controlling locks efficiently is
> one of the greatest challenges in enterprise data management and almost
> always fine tuning has to be applied - by hand. Once you start doing this,
> you are aware of the database, so why not keep things simple, for the start,
> by explicitly storing objects? In doing this, the application can work with
> "dirty" objects, without worries about events routed through to the database
> or worse: objects getting locked.
Unfortunately, having the application keep track of dirty instances detracts from one of object programming's major advantages: separation of concerns. Once you require each persistence-capable class to track its own state with explicit code, you pollute its methods with database calls. And it doesn't matter if you delegate this to another application class. You still need to insert database calls into the class at odd places. It's really easy to miss out an update to a field and then the application fails without anyone noticing.
> A great part of the JDO specification was developed by the old guys from the
> object database scene, the same people that used to do ODMG. The "enhancer"
> concept, that many object databases use to modify application code can be
> found again. People had difficulties to understand this concept and to get
> it running in the past. Let's not make the same mistakes twice. Why don't we
> start the JDO specification process again with a much simpler design than
> current JDO?
I'll admit to being one of the old guys from ODMG days, and I think we learned a lot about what the market demands from a transparent persistence interface. The requirements now are for scalable implementations with binary compatibility that can take advantage of distributed computing a la J2EE (TM) without writing yet another bunch of single purpose code.
WRT the enhancer, have you looked at the strategy for CMP entity beans, where the class file provided by the bean developer is analyzed and a concrete implementation class constructed? Is there really a difference between generating classes and enhancing them?
Another point that is often missed with a casual read of the specification is that the JDO specification does not require a
post-processor (enhancer). A JDO implementation is free to use other techniques, and several JDO vendors plan to do just that. Two alternatives to the enhancer are pre-processing and direct code (either .java or .class) generation from a business model or UML model or whatever you are comfortable with.
> 5.4. JDO Identity
> What are all these many different identity concepts for? Java objects should
> be usable as they come, without special concerns. If code needs to be
> reengineered by JDO experts first, where is the advantage to relational
> databases? Uniquing is the one and only correct concept. It allows '==' to
> be used to compare two instances of an object. Only one central cache is
> needed to allow queries to decide, whether to return a reference to an
> instantiated object or to instantiate a new object. Equality comparisons to
> remote or distribute objects are possible with special methods or equality
> queries, without exposing the internal ID.
This is one of the most difficult concepts in JDO, and I apologize if it is not yet crystal clear.
In a single cache managed by a single PersistenceManager, a user can compare two instances using == and get the expected results. If you query for an Employee, and navigate to its Department instance, then query for a Department, you can compare the two Department instances using == and they will compare == iff they represent the same datastore instance.
The reason a special concept is used for comparing instances from multiple caches is that in JDO, caches can be transactional. In the same VM, you can have multiple caches in different transactions, and each cache can maintain a different transactional view of the data store instances. So in two different PersistenceManager caches, you are guaranteed to have two different instances of the same data store instance. The instances might compare equal (.equals(Object)) but that is up to the implementation of .equals. But both transactional caches might be making conflicting changes to the state of the instance. At
commit time (or earlier if the application chooses) the conflicts will be resolved. Of course, if you want locking semantics to be applied, you can also choose for the data store to enforce locking at the time you retrieve the instance. But why require database locking to be enforced immediately? Many applications require more concurrency.
> The specification allows to change keys to "change object identity". What is
> the use case in practice? This approach will kill the reference cache.
This is to allow users to update the primary key of a data store instance, if there is a key field or key fields that map to key columns in the data store.
> 7. PersistenceCapable
> JDO requests to implement the "PersistenceCapable" interface for all classes
> that are to be made persistent. This requirement produces troubles and
> maintenance work, it bloats application classes and thereby costs resources
> and performance. How should the application programmer implement this
> interface? By manual implementation? Deriving from a common base class is
> not a solution, since it takes away the possibility to store objects derived
> from existing JDK classes.
True, and this is why most users will use tools to provide this functionality.
> Using a "ReferenceEnhancer" considerably slows
> down the development process and user classes get bloated with database
I don't agree. Tools will seamlessly integrate enhancement in the development process. All that is needed is to enhance the classes prior to testing. Check out the Transparent Persistence feature of Forte for Java to see how easy this really is.
> It is very well possible for a database engine to keep references to all
> managed objects, without the necessity to modify application classes and
> without wasting resources. There are existing object database engines that
> prove this.
Many of these modify the VM in order to achieve these goals. The objective of JDO is specifically to not require a modified VM to make the use of persistence transparent.
> 7.2 PersistenceCapable#jdoMakeDirty(String fieldName)
> The JDO-Implementers do not seem to be aware of the fact that the use of
> - takes away type-safety during development. Errors will not be detected by
> the compiler.
> - cost lots of resources
> - disables efficient refactoring
> Strings have to be avoided at all costs.
> We want to work with objects, don't we?
> ...and not with Strings that point to objects.
This particular hook allows dynamic tracking of dirty instances. The only reason for this method is for Array typed fields that cannot automatically be tracked. The alternative (disallowing Array type fields entirely, as CMP Entity Beans have done) was not acceptable.
> Why do objects need to be made "dirty"?
> The database engine can perform comparisons.
The performance issue that automatic dynamic tracking solves is with large caches of mostly clean instances with a few dirty ones. The JDO implementation only has to keep track of the dirty instances, instead of having to compare field by field a large number of instances. JDO provides for a commit operation simply iterating the list of dirty instances, and constructing a data store update for only those fields that actually changed.
> 7.3 jdoGetObjectId()
> Internal IDs should not be exposed. We want to work with objects, not with
> pointers. If an application programmer can not live without IDs, he can
> easily build his own flavour into his objects. He is free, to use his own
> system (social security number, license plate).
This feature allows you to write the JDO identity of a persistence to a file, transport it around the network, and use it later, separated by space and time. It is modeled on the EJB Entity Identity concept, and has been extremely popular when dealing with distributed objects.
> exactly the same specification text as for jdoGetObjectId()
The difference is only if the application is changing some key fields in the transaction (for example, performing some maintenance work on an instance to correct the key).
> 7.6 to 7.12
> All these methods look very cryptic. Fields are replaced, get int values,
> get "provided" get copied, in the end there are even simple-type-specific
> fetch methods (i.e. fetchDoubleField). Most methods return void? Is this a
> new data type in JDO?
> What's all this for?
These methods are specifically for JDO implementations to use to manage the values of the persistent fields in persistent instances.
> If this very weird API should serve to change class versions, it's terribly
This is not an API that any application developer will use. For the APIs that are intended for public consumption, please refer to 8.1 through 8.4. Again, most users will never use these methods either.
> Classes can use reflection to analyse themselves. Storing a superset of all
> class versions removes the need to use int values to define Fields of
> 8. JDOHelper
> Many PersistenceCapable methods are listed here again, in static form. Why?
The JDO implementation needs access to the persistence capable instance to implement the get/set fields behavior. The reason these methods are in the static form here is to allow access to a subset of useful information without requiring the application to be aware of the existence of the PersistenceCapable interface.
> A good API provides one single path, how a problem is to be implemented. The
> - Code by different implementers uses the same methods. It remains
> interchangeable and further requests for a change of the API remain similar.
> - The API can be modified more easily.
I'd advise against any application developer ever using the PersistenceCapable interface directly. Perhaps this should be more
explicit in the text.
> 8.2 PersistenceManager#makeDirty(Object, String fieldname)
> If "makeDirty" is necessary, it should be makeDirty(Object).
The propsed change isn't sufficient. JDO tracks changes to fields not just persistent instances.
And here is the rest of Craig Russels response:
> 14. Query
> The Query interface is a complete disaster.
> - It requires the use of String filters. Why not use API methods?
We feel that both a String form of query and an API are useful and arguably necessary. There are strong arguments for both forms. For practical purposes, only one form is implemented for JDO 1.0. We expect an API form to be implemented by the JDO RI, but will have to wait for JDO 1.0++ for standardization.
> 16. EJB are a different specification. They have nothing to do with JDO. If
> an EJB implementation wishes to use JDO, it may do so.
We added this chapter after comments of the nature: what is the relationship between JDO and EJB? Can I implement EJB with JDO, or JDO with EJB? How do I use JDO in an application server?
> 18. JDO has nothing to do with XML.
> Schemas are defined by existing classes. If the classes are not present to
> instantiate objects, what is the use of providing metadata?
XML is a language for specifying information that cannot be discovered by introspecting the persistence capable classes. For example, in Department we have a Collection typed field named employees. What is the type of the elements of this field? Most relational databases would require that you store Employees in the elements of Collection. Until Java implements such metadata itself, we are forced to annotate the Java classes.
By the way, most of the information in the xml-format metadata is nicely defaulted, so you don't need it in many cases.
> Why is JDO exclusively focused on Java? There are other worthwhile
> programming languages (Ruby). A universal Object-Persistence-API would also
> be helpful in other programming languages. The "What about C# and the
> CLR?" - question will come and people will stick to SQL, since it's also
> available on the Microsoft platform.
We decided to focus on Java as the platform for JDO. If we wanted to have a universal platform we would have a much bigger task.
> Who is supposed to implement all this? Who is supposed to understand it? The
> ODMG specification failed because no vendor ever provided a complete
I agree that this was a strong negative for ODMG. The specification was incrementally implemented but no vendor ever did the whole thing. One of the nice features of the JCP process under which JDO is being developed is that prior to publication of the JDO specification, a reference implementation and compatibility test suite must be developed.
JDO Reference Implementation is in pre-release form, available for free download to anyone in the world. Prospective vendors of JDO can use the Reference Implementation to get a head start on their own implementations.
> When is the first complete implementation of JDO expected?
This calendar year.
> The process has been extremely slow. In spite of all the big names in the
> JCP, very little has developed forward.
Well, there are several commercial implementations currently available (currently subsets due to the nature of a not-final specification).
> This version 1.0 is not really it?
Well, I'm personally focused on 1.0. Without a successful 1.0, there won't be a 1.0++.
> Noone will ever use this, except for consultants that make a living of never
> finishing projects.
I beg to differ.
> Sorry for these rude and harsh words.
Why make them if you have to apologize for them? Perhaps you could just have exercised your del key...
Further to the response that Floyd has already posted here, the discussion has continued in the comp.databases.object newsgroup and there is another very interesting posting by Craig Russell. If you don't have a news account, you can also read it at google groups.
The issue that I have with the Forte JDO is that it modifies class files you've compiled and inserts the bytecode necessary for it to do persistence.
From an enterprise standpoint, this is a disaster: my source code in version control does not emit the class files that I've deployed to production. That's a fundamental concept of version control; that you always can identify and reproduce any production release from what's in the repository.
It's also rather hard to debug things when you don't *really* know what your source code is. What? You say that you should only debug your part of the code; the JDO stuff's already been debugged? Oh, man! Let me tell you how many bugs I find in third party products just by stepping through their stuff. Put another way: if I can't find the bug, I'm damn sure going to check the JDO, just as a sanity check, if nothing else.
So, you could say I should disassemble what Forte made out of the class files, and assume that there's an intermediate source code step... Okay, but JAD has problems with what Forte puts out, often going into "Java Assembler" mode to try to describe what the bytecode is doing. Real useful.
Maybe if the JDO went through an intermediate source code stage, like RMI Stub generators or IDE compilers, there would be something I can work & be confortable with. E.g., I can put a directive in my build.xml to do the extra source generation/compilation stage.
As it stands now, you're better off working with existing object persistence products, like TOPLink, CocoBase, Castor, or [your candidate here] and declaring that your java persistence standard. IMHO.
Bryan: The issue that I have with the Forte JDO is that it modifies class files you've compiled and inserts the bytecode necessary for it to do persistence.
Craig: Just so everyone is clear on this: Forte for Java has a preview release of the JDO standard. I'll call it FFJ in this response. FFJ does use an early version of an enhancer, but that does not mean that all JDO products must use this technique. Other tools might perform pre-processing or generate JDO classes directly from a business model.
Bryan: From an enterprise standpoint, this is a disaster: my source code in version control does not emit the class files that I've deployed to production. That's a fundamental concept of version control; that you always can identify and reproduce any production release from what's in the repository.
Craig: Customers have been able to use FFJ in source control environments, by defining the enhancement process external to the IDE tool. That is, the enhancer has a perfectly functional command line interface so the code line is always able to be created from source. The enhancement is merely another step (similar to javac or jar) in the process of constructing an executable jar file.
Bryan: It's also rather hard to debug things when you don't *really* know what your source code is. What? You say that you should only debug your part of the code; the JDO stuff's already been debugged? Oh, man! Let me tell you how many bugs I find in third party products just by stepping through their stuff. Put another way: if I can't find the bug, I'm damn sure going to check the JDO, just as a sanity check, if nothing else.
Craig: I agree that you need to be able to debug JDO classes, both before and after enhancement. JDO specification requires that the debugging information (line number to byte code mapping) be left intact in enhanced .class files, and you should be able to set breakpoints, step line by line, examine variables, etc. just like usual.
Bryan: So, you could say I should disassemble what Forte made out of the class files, and assume that there's an intermediate source code step... Okay, but JAD has problems with what Forte puts out, often going into "Java Assembler" mode to try to describe what the bytecode is doing. Real useful.
Craig: If there is a problem disassembling the output of the enhancer, there is either a problem with the enhancer or a problem with the disassembler. During the development of JDO, we found instances of both!
"The issue that I have with the Forte JDO is that it
modifies class files you've compiled and inserts the
bytecode necessary for it to do persistence."
Yeah sun calls this Bytecode Manipulation, but we like to
refer to it as Bytecode Mutilation. It results in poorly
debuggable apps, more maintenance issues, and believe it
or not it actually hampers enterprise performance. The
Forte JDO product is really (Javablend Done Over), and in
my opinion the 'free' pricetag is quite high :)
For Enterprise O/R mapping customers need a stable and
mature product, not a research product from Sun... It's
much cheaper to get a product that saves time, than one
that consumes it... A bargain is rarely a bargain...
That's just my $.02
It appears that the big names like IBM and BEA are not in the list of vendors that plan to provide JDO. Other than Sun Forte, all other vendors appear to be small players.
I didn't find the referenced article's points particulary interesting. However, I have followed JDO for a while now and do have some points of my own. I figured these may be relevant to this topic.
My first problem is that I don't see any reason for deciding that fields are accessed directly (with reference enhancement) instead "standard" Java aproach of using (for instance) an interface to define persistent model or abstract accessors to define persistent model. By standard I mean something that doesn't require redifining the interpretation of the Java language. All this decision means to a Java programmer is that direct access syntax is replaced by accessors. Some rationales I have seen for this are:
1. More comfortable syntax. This may be the case of some people. However, this kind of usage has also been dictated in the JavaBeans model (and hence in Swing components). In these cases, you should always access fields with accessors. If the fields are in the super class, you have no other choice. If they are in your class, it is simply bad practice because there may be events / bound checking involved.
If this is really a big syntax burden, I think the JLS should be the one to fix it by something like "properties". Adding local patches in specific specs is in my opinion a resepee for disaster, because it makes local redefenitions of how a single Java source file should be interpreted (semantically). Just imagine what would happen if two such specs redefine the same Java source file.
2. Performance. I can't see any way how this approach can increase performance. The same thing can be achived, and will be achieved by any reasonable optimizer, by a simple inline. As the matter of fact, I can see how this approach can hurt optimization efforts. When dealing with a single method, the optimizer can realize it is a "hot spot" and apply aggressive optimizations on it. Doing the same for many identical pieces of code may be very time consuming - and the optimizer won't even know they are "hot spots".
Further, I see how this approach can cause other damages. For one things, it will probably make the resulting spec semantically "incompatible" with reflection. I.e, fetching a field directly will yield different results then fetching it with reflection. It can also limit implementation strategies such as not holding some parts of the object in-memory. While it is possible to earase a field from a class, this imposes highly visible changes (i.e, visible to the debugger and reflection for instance) and will probably be ruled out from the start.
The second issue I have with JDO is less "concrete" than the one above. The problem is with JDOs view of relationships. It seems to me like there is a fundenmental difference between JDO and EJB CMP which leads to my "problem". JDO wants to take objects, and persist them to datastores. EJB wants to take a datastore, and give it an object view. Hence, JDO has made little effort to standartize standard "relational" relationships between objects, while in EJB (2.0) this was a primary goal. This is fine as far as the specs go (i.e, each spec achieves it's goal). However, I think expirience shows that representing the business data model in a "relational" model is more effective (some may argue). Anyway, this is the way things are done in most of the industry. For this, you need a clear and simple way to represent concepts such as relationship "direction" and "multiplicity" along with the semantics involved (e.g, changing one end of a bi-directional relationship changes both ends). Although I've seen some obscure notes about "adding this info in the XML" (specifically, in a response in a forum on this site), I see no way for this to be done without completely changing the Java object model. Such a change seems to me out of the goal the JDO has defined for itself (define a persistence model for *Java objects*), and also seems too big to happen with no syntactic difference. Changing the semantics of assignment, of instance, with no appearent syntactic difference is very error prone, IMHO.
If JDO will choose to keep Java object semantics (because of the reasons above or others), I think it will be obscure for use in projects that use relational modelling, which is probably atleast 80%.
Overall I think JDO is a step forward in taking Java persistence out of a specific platform (EJB) and standartizing it in a way that can work for all scales. I will hate to see users drifted away from it because of inconsistent data models.
Just my humble opinion.
JBoss 4 implements JDO functionality...maybe not be as big as BEA WebLogic and IBM WebSphere, but still a respected J2EE App Server. They have to see something promising in it.
> It appears that the big names like IBM and BEA are not in the list of vendors that plan to provide JDO. Other than Sun Forte, all other vendors appear to be small players.
I did not find this article. How can I find it.