September 23, 2004
Playing with JDO 2 fetch groups, ZODB, thinking about TranQL (for two months now),
playing with Prevayler, and looking at TORPEDO (need to run OJB against it when
I have a chance) something clicked for me which I think clicked for some other
people a long time ago -- but somehow got lost in the hullabaloo. We may all be
doing O/R mapping wrong. Seriously, we probably are.
The current popular approach is a thin wrapper around JDBC. It is what OJB,
Hibernate, and JPOX all do. I cannot comment on Kodo and Toplink as I cannot
go browse around their sources, but I suspect it is the same. This is how we
are used to thinking about it -- the objects you get are basically a stream
(or collection) of database results.
This isn't really what they are though. They are really closer to a swapped
in page of the entire object graph. The query mechanism for the object graph,
and query mechanism for the backend get confused (in the con-fuse sense). The
JDO spec has the right idea in separating object queries from persistence store
queries (I do tend to agree with Gavin King that the JDOQL query language itself
is somewhat less than elegant). The editing context can contain more or less
than has been queried for, as long as what is accessed is available when it
is needed.
When you need to obtain a handle on an instance, a query language is bloody
useful. OGNL defines a better object query language than either OQL, JDOQL or
HSQL, though -- if you are talking purely objects. HSQL evolved as it did to
avoid the loss inherent in this abstraction though, and works nicely. You are
querying into the editing context though, and the context can determine, separately
form the exact query, what it does not already have loaded (thank you Jeremy
and Dain). This is a lot of work probably best done in a haskell style language
optimized for doing fun math rather than pushing bits.
Once you are maintaining graph pages instead of flat contexts, and issuing
queries against the page rather than the backend, you can do nice things like
absurdly optimize your queries into the backend (query the backend specifically
for the disjunction of the predicate for the current query and the union of
all predicates known to be in the current page (thank you, again, Jeremy and
Dain)). The paging system certainly knows about the database, and needs to be
able to write extremely optimized code (sql) to pull data out of it, but the
client of the paging system really is better off being able to describe queries
in terms of object behaviors.
Providing hinting about what objects are going to be needed, rather than how
to pull them from the rdbms (hinting is what you are really doing when you ask
postgres (only rdbms whose internals I have poked at much, Oracle's not being
available to me) to use a join, unless you do a lot of configuration to make
it not so) becomes a lot more useful as you can express the same intention in
a way that lets the system know what you want, rather than flat out telling
it. A perfect example of an optimization that would be tough to do by hand here
is to stream elements in a collection down the join chain from the primary queried
entities rather than pulling them in the initial join. In HSQL you would join
them as you *know* you will need them, but what you really know is that the
JSP needs them for rendering a while in the future, and on a different jvm.
A mechanism to supply hints that these things will be needed, and will be needed
as a one-pass stream (this may be too low level) when they get serialized out
allows for much better actual throughput. The best way to provide this type
of hinting would be hard to work out, but fun as heck to do -- and worth it!
This type of throughput-oriented hinting is hard to do through any existing
o/r mapper I know of. It is not difficult to describe, however. It really begs
for a flexible object query language. You can get the equivalent type of behavior
in JDBC right now, but not in a useful way to OJB or Hibernate at least. This
is just one example, you can use your imagination for others =)
The big problem here is that what I am talking about is most of a dbms. You
need to handle snapshotting for transactions, dirtying predicates, etc. It just
uses a relational database for its actual backend. In theory EJB's were designed
to be able to do this, but I don't think any of them actually do. Is the problem
just too hard? I have a lot of trouble believing that -- if you can formulate
the questions correctly, you can pretty much build the solution. It just ain't
easy to do -- and easy is seductive. Yea hard problems!
This is also a big abstraction -- and one that bets it can provide the correct
knobs to allow the programmer to dive through it when needed. There is risk
in a big abstraction, but then again, there are reasons we use Ruby, er, I mean
Java, instead of assembly ;-)
This is a big hunk of code, and dives into math instead of simple bit-pushing,
making it fun code! Definitely outside the scope of my (one person) spare time
programming, unfortunately =(
About the author
Brian McCallister
Blog: http://kasparov.skife.org/blog/
Brian McCallister doesn't particularly like writing bios or writing about himself in the third person. He does love programming and systems work though, and tends to find himself doing a lot of both. Brian has also quite enjoyed giving presentations and seminars in the past, which isn't too hard as he loves teaching and exploring new ideas.
|