Java Development News:

When to use an Embedded ODBMS

By Rick Grehan

01 May 2008 | TheServerSide.com

Though relational database systems comprise the majority of back-end databases employed by applications, object databases continue to capture database developers’ interest. In fact, object database technology has seen a recent resurgence in popularity (since their beginnings in the 1970s), particularly now that object-oriented languages offer well-matured platforms for creating such database systems. Numerous commercial and open-source ODBMSes exist, including Objectivity, ObjectStore, Versant, Matisse (all commercial) and db4o, Prevayler, Ozone, Perst, and JDOInstruments (all open-source). A list of object databases can be found at: http://www.service-architecture.com/products/object-oriented_databases.html.

A particularly interesting variant of the ODBMS is referred to as an "embedded" object database. An embedded object database system is one whose database code executes within the application. That is, the code for the database runs in the same address space as the application. An application that uses an embedded database is monolithic; it is not built along client/server architectural lines.

Implied in that last sentence is an important question: when should an application use an embedded object database? Or, more precisely, when should a developer consider an embedded object database as an alternative to the more popular (and, probably, more familiar) relational database?

This question is really two questions combined. The first is: when is it appropriate to use an ODBMS? The second is: assuming that you have determined to use an ODBMS, when is it appropriate to use an embedded one? (Many of the ODBMS systems mentioned above are available in both client/server and embedded forms.) In this article, I will deal with the individual, component questions separately. The combined answers will be the reply to the original question.

In addition, I'm going to assume throughout the article that the only alternatives to an ODBMS are either a purely relational database (accessed via, for example, JDBC) or an object/relational database (along the lines of Hibernate). This is certainly not always the case. Sometimes, a flat-file pseudo-database, or even a well-crafted XML file, is all that an application really needs. I do not want you leaving the article with the idea that an ODBMS is the only answer to all persistence needs.

The OO Language Consideration

One extremely good reason for considering an ODBMS - perhaps the best, in fact - is one so obvious as to be possibly overlooked. The reason can be put simply: you might want to use an ODBMS if you're writing your application in an object-oriented language.

The reasons why are fairly fundamental in nature, and are summed up in a phrase that has been orbiting the database world for well over a decade: "the object-relational impedance mismatch." Borrowing terminology from the electronics world, this phrase highlights the significant differences between systems built along object-oriented lines, and those built along relational lines. It is not too far off the mark to say that the two paradigms - object-oriented and relational - are two very different world-views.

In an object-oriented system, data lives inside objects. In a relational database, data lives inside rows within tables. If we could personify a relational database, we might well say that it understands concepts such as tables, rows, relations, tuples, and so on; it has no concept of a thing called an object (which is what the object oriented language "understands"). So, a database application written in an object-oriented language, but using a relational database, must shuttle its data back and forth between two dissimilar worlds.

Practically, this means that in order to store an object into a relational database, an application must extract that object's data, assemble it piece-wise to build an appropriate SQL statement, and execute that SQL. The application cannot just put the object in the database, it must literally pull the object's insides out, and put those fragments into the database as separate entities.

Suppose you have built an application that catalogs the courses offered at a school. You define (in Java) a Course class as follows:

public class Course {
  public string name;       // Name of Course
  public int courseID;      // Course number
  public int deptID;        // ID of the Course's department
  public int creditHours;
  ...  rest of class definition ...
}

To store a course object using JDBC, you would have to do something akin to this:

PreparedStatement insertStatement = connection.prepareStatement(
  "INSERT INTO Courses (name, courseID, deptID, creditHourse) " +
  "VALUES (?, ?, ?, ?)");
insertStatement.setString(1, course.name);
insertStatement.setInt(2, course.courseID);
insertStatement.setInt(3, course.deprtID);
insertStatement.setInt(4, course.creditHours);
insertStatement.executeUpdate();

In the above code, I have made the Course members public, so that the action of extracting each data item is apparent. In deployed code, the members would likely be private, and I would use accessor methods to retrieve each value. Even so, the gist of the process is the same, regardless of how I define access to data members.

This wordiness in the code exists in reverse as well. To fetch an object from a relational database, its constituent parts must be read from database tables, an empty object must be instantiated, and the object's 'data pieces' must be inserted into the object's fields one by one. The code would look like this:

Statement selectStatement = connection.createStatement();
ResultSet rset = selectStatement.executeQuery(
  "SELECT name, courseID, deptID, creditHours FROM Courses");
while rset.next()
{
   Course course = new Course()
   course.name = rset.getString("name");
   course.courseID = rset.getInt("courseID");
   course.deptID = rset.getInt("deptID");
   course.creditHours = rset.getInt("creditHours");
  ... <do something with the course object> ...
}

Both bits of code make one aspect of the aforementioned impedance mismatch apparent: object data must pass through a translation process to get to and from the relational database. To put an object into the database, you have to take it apart; to bring it out of the database, you have to re-assemble it. The amount of translation code depends on the size and complexity of the objects.

One could argue that the impedance mismatch problem can be eliminated with object/relational mapping tools. However, this is only partly true, and only true from the perspective of the developer. While it is correct than an object/relational mapping tool removes the burden of having to write SQL statements, JDBC calls, and all the associated code that moves object in and out of tables - that code must nevertheless be executed (which means that someone - or something - has to write it). So, while an automated tool (or library) is writing the object/relational bridge code on the programmer's behalf, the code is still there.

In addition, employing an O/R mapping solution conjures significant non-obvious challenges with which the programmer must wrestle. The challenges revolve around the decisions that must be made concerning how class structure is represented in the database. I will have more to say on this shortly.

Evolving Class Structure

Many database applications are narrowly targeted at a specific task. Such applications are built for a single purpose; there is little likelihood that they will be extended into other territories. In such cases, the class structure of persistent objects will be static, and therefore - assuming the data is stored in a relational database - the structure of database tables will be unchanging.

But what about applications whose persistent objects are evolving? That is, the class structures of persistent objects are changing, and perhaps even the class types of persistent objects are changing. Consider the ramifications of such an application with a relational database back-end.

First, suppose that a new persistent class must be added to an existing application. If access to the relational database is "by hand" (the programmer is writing JDBC calls) then a whole new set of SQL commands and accompanying code must be written for the application. If an O/R mapping tool is being used, then a new mapping section will have to be added to the mapping file. And, in both cases, a new database table (possibly more than one) will have to be created. In short, adding a new persistent class goes beyond simply adding the class; an entourage of additional infrastructure must be assembled.

In a true ODBMS, none of the above work is necessary. For example, using the open-source object database, db4o, the programmer simply begins storing the new objects, and the database is automatically adjusted to accept them. The db4o engine invisibly discovers the new class structure (via reflection) and performs all the behind-the-scenes tasks to store the new object type. There is no administrative work, no new tables need to be added, and no mapping entries must be written.

Next, suppose that the structure of an existing persistent class has to be altered; perhaps the evolving responsibilities of the application require that objects of a given class must now include an additional data member. This situation is actually more perilous than the previous; adding a new class involves working with code that is - thanks to the encapsulating nature of a class - isolated to a degree from the rest of the application. Modifying an existing class, however, means touching - and perhaps breaking - deployed code.

If the application is using a relational database on the back-end, that database must be modified. In the case of an additional data member, a new column must be added to an existing table. Possibly, this new column's contents will have to be pre-loaded with appropriate data in the case of existing rows. If the application is deployed enterprise-wide, then the services of a database administrator may have to be requested in order to make the alterations. And then there’s the issue of locating all the SQL queries (in JDBC) and modifying the translation code. Again, the consequence of a changed class structure is more effort than one would otherwise expect.

Compare all the above with the corresponding effort in an application using an ODBMS. As an example, using the open-source db4o ODBMS, the programmer's work is limited to modifying the class, writing whatever business logic is needed to manipulate the new field (code that would have to be written anyway), and deploying the application. That's it.

db4o handles the addition of the new data member invisibly. Objects instantiated from the new class definition peacefully take their place beside "old" objects of the same class (though in its earlier definition). When an "old" object is fetched from the database, db4o will automatically fill the new element with a default "empty" value - null for strings, zero for numerics. So, when an old object is fetched from the database, it is invisibly cast to the new form. And, when that object is written back to the database, the new field is written with it. Consequently, db4o provides what amounts to an invisible migration from the old class structure to the new class structure; the application remains "backward compatible" with the previous class structure, even as persistent objects are migrated to the new structure.

Complex or Deep Object Relationships

I have already discussed the challenges facing a programmer whose application uses a relational database and who must cope with evolving class structures. There are, however, considerable challenges involved in simply designing the mapping of class structure to the relational database which must take place even if class structure is guaranteed to be static throughout the lifetime of the application. In other words, forget about dealing with evolving objects, what about just getting objects into the database in the first place?

Working out the mapping of objects to tables is not a straightforward process. Often, multiple mapping possibilities exist for a give class structure, particularly when inheritance is involved. Consider the simple arrangement shown in figure 1.

Text Box: Figure 1

Person is an abstract base class with a pair of concrete descendant classes, Student and Professor. Obviously, Student and Professor objects share a great deal of common data elements: most likely both will have fields for name, address, gender, and so on. There is, as well, much dissimilar data. For example, Student objects might have a data member indicating their grade level (freshman, sophomore, etc.), while Professor objects might include a data member that identifies the department to which they belong (business, engineering, mathematics, etc.).

How should this be modeled in a relational database? One approach is the one-class-per-table method. All common data is placed in a Person table, data specific to Student objects goes into a Student table, and data specific to Professor objects goes into a Professor table. While this methodology is intuitively comfortable, its implementation splits a given object across two tables. The object's common data resides in the Person table, while its specific data resides in the Student or Professor table. Fetching an object from the database requires two SQL SELECT statement executions. Similarly, adding an object requires two SQL INSERT statements, and updating an object requires two SQL UPDATE statements. (The arrangement is further complicated if the system is asked to model teaching assistants, in which case a combined Student-and- Professor object is required. Either a third table will have to be designed, or objects will have to carry some sort of class-type identification so the teaching assistants can be distinguished from students and professors.)

In addition, some sort of unique object identification field will have to be defined (a special column in each table) that "connects" the two halves of every object in the database. This identifier will probably be a Long data member added to the Person class. It exists for no reason other than to provide a means for the application's to track which objects in memory correspond to which objects in the database. This sort of "object identifier" field is a common requirement for persistent objects in virtually all O/R mapping systems. Any persistent object class must define an "ID" field (usually as a private member to limit its interaction with application code) that the O/R tool uses for the reason given above (to connect in-memory objects with in-database rows).

Another approach for storing objects in the database would be to simply place all Person objects - whether they be Student or Professor objects - into a single table; a "one-object-per-row" approach, if you will. Doing so would require the creation of a column used to identify what kind of object - Student or Professor - occupied each row (another aspect of impedance mismatch, albeit minor). On the one hand, objects are not spread across two tables, so database access is reduced to a single SQL statement. On the other hand, though, each row will contain unused space. Columns corresponding to data specific to Professor objects will be empty in Student-object rows, and columns corresponding to data specific to Student objects will be empty in Professor object rows. Also, this could cause problems if the database administrator attempts to apply null data constraints to such object-specific columns, and the approach completely melts down if new classes are defined that descend from the base Person class.

The above problems intensify when object relationships are very complex or very 'deep'; i.e., object A references object B, which references object D, and so on. Each object reference becomes a foreign key in a relational database table. Furthermore, the direction of the reference is (in almost all cases - and certainly in cases involving collections) the precise reverse in the relational database as the reference that actually exists in the application.

As an example, suppose our application tracks professors within a given Department. So, a Department object includes a variable-length array of Professor objects:

public class Department {
  private Professor[] professors;
  ...
}

Suppose you persist this to a relational database, and apply the one-class-per-table methodology. So, you create a Department table. Given this arrangement, the connection between Professor objects and Department objects will most probably not be in the Department table, it will be in the Professor table. The connection will be a foreign key that indicates which Department a given Professor belongs to. Put another way, when the objects are in memory, the relationship can be expressed as: "I am a Department object, and here are the Professors that I contain." However, when the objects are in the database, the relationship is expressed as: "I am a Professor, and here is the Department to which I belong."

The result is that - rather than a direct reference leading from Department to Professors - the database requires a foreign key column (indirect reference) in the Professor table leading back to the Department. And, for speed's sake, that foreign key column will probably require an index. So, finding those Professors that belong to a Department is not a matter of following a reference from the Department directly to the Professors in the array; it requires a SELECT statement that must search through the database to match Professor rows that include a specific Department ID, assemble the result set, and fill the array.

With a true ODBMS, most of the above work evaporates. The developer need not be involved in researching the best techniques for modeling inheritance. From the perspective of an ODBMS, the inheritance architecture is already 'explained' in the very classes that comprise the application - it does not have to be "re-explained" to the database. Professor objects, Student objects, and Department objects can be freely intermixed in the database, and the database engine will know which is which. Nor is the ODBMS developer forced to "pollute" persistent classes with ID fields that exist for no reason other than to accommodate the database back-end.

Finally, in a true ODBMS, collections and arrays, complex or deeply-nested object structures, all are dealt with quietly ... often invisibly. No foreign keys are required, nor associated indexes created. The ODBMS engine persists the object relationships in the same "direction" as is defined in the classes. In some ODBMSes, persistent objects can be navigated with an ease that approaches that of navigating in-memory objects. The database stores object relationships in a form closer to the in-memory representation than an O/R mapper ever could.

You Are Using Agile Development

Agile development techniques employ incremental modeling, building, modifying, and testing activities that demand continuous alterations in both the design and the code. For example, refactoring seeks to improve application quality without changing application behavior. Refactoring is far easier and less error-prone in an application using an ODBMS, than in that same application implemented with an RDBMS.

The reasons for this derive from the fact that a system written in an object-oriented language but using a relational data store must support two dissimilar paradigms. Consequently, the refactoring effort that affects persistent objects must flow into the relational database and - as I showed in the section on object evolution, refactoring a relation database is no simple task.

The argument for an ODBMS system's superiority (over an RDBMS) is ably explained by Scott Ambler in his whitepaper "Agile Techniques for Object Databases", available on the ODBMS.ORG portal at http://www.odbms.org/experts.html#article3. As Ambler points out, in a combined object/relational system, modeling and development must take place on two sides of an invisible divide. You must model both objects and (relational) data schemas. Meanwhile, "... with ODBMS technology, you merely determine and evolve your object schema over time." In short, using an ODBMS significantly reduces the amount of work you have to do.

Embedding The Database

Finally, the second half of the original question must have its answer. When should the ODBMS be embedded? As noted earlier, the real question is: when should the object database engine be embedded in the application? Given that the only real alternative to an embedded database engine is a database server running as a separate process, an equivalent question is: when should you not use client/server as the database application's architecture?

Obviously, you should rule out a client/server architecture of the application's very requirements demand a monolithic design. Such requirements typically accompany any application that must execute in a resource-limited, single-user environment. Running a database server in a separate process space typically consumes more memory and processor cycles per unit of work accomplished than running the database engine in the same process. A client/server architecture also requires an inter-process communication infrastructure to carry database requests and responses between the application and the database server. Meanwhile an embedded architecture handles the same with simple in-process method (or procedure) calls.

An embedded database is particularly suitable for any application that must use a zero-management database. That is, an application whose database must be initialized, tuned, brought online, and maintained with no user interaction. Instances of just this sort of application encompass a broad range of industrial, real-time, and "traditional" embedded systems. The database is expected to just "be there," and no accommodations can be made for a database administrator to appear and create tables, schemas, and such on the application's behalf.

There are, of course, circumstances in which a client/server architecture is preferred. If the database application will be executing is a heavily multiuser environment, a client/server arrangement is usually superior to an embedded arrangement. Moving the code that directly manipulates the database to a centralized server simplifies the concurrency issues, since access is more easily serialized (transactions can be more easily supported) if database operations emanate from a single process. In addition, a client/server architecture is apt to scale better than an embedded one. Many database systems - some ODBMSes included - allow for adding database servers to improve multiuser response times, and incorporate load-balancing among the distributed servers. Such load-balancing is not so easily done (might, in fact, be impossible) with an embedded implementation.

The ODBMS Alternative

To be clear, it is certainly not the case that there are no circumstances in which an RDBMS would be a suitable back-end for a database application. Countless instances of existing systems already successfully using RDBMSes testify to the capabilities of the relational database.

I am suggesting, however, that the next time you begin the design of an application that will require a database back-end, you consider seriously an ODBMS. If you application will be written in an object-oriented language - highly probable, in this day and age - you could very well completely side-step a host of design and development wrestling matches by using an object database.