I'm writing a paper for a new Relational yet Object Oriented data model, extending the Entity-Relationship model, while guiding a DBMS implementation to support relational analysis and design in general purpose OO languages - there's a Java implementation already, and hopefully the purposed approach will extend to UML, C# and etc.
Here I would post a few ideas around, and like to see your oppinions. I'm expecting discussion here will influence my rest work on the dissertation, and make the resulted paper more expressive and clear to everyday database practioners.
Current ideas here:
* Adding Persistence Ability to Application Object Model
Using exactly the same object model among the Database and its Applications is a key advantage of Object Oriented Databases over conventional Relational Databases.
But things still get not pretty simplified when application objects have to write code to query / persist themselves or Data Access Objects still have to be implemented by object oriented applications. This is the common situation with current Object Oriented Database Management System (OODBMS)s, partially responsible for their less adoption than RDBMS. It is neither better with Object-Relational Mapping (ORM) mechanism.
The idea comes to have persistence as an ability, that persistent object classes can declare to require it, then supplied/implemented by the database management system. This becomes quite possible when the database and the application run in a same language and environment, with further helps by Automatic Code Generation technologies.
* Unique Object Graph
Loading object data from storage into memory is a costly course, so once an object has been loaded, it should not be discarded until necessary. And it's normally desirable that objects are unique across sequential accesses from the view of the applications. These are normally implemented by a cache , and by leveraging the modern garbage collection technologies, the cache can not only be simpler, but more adequate.
With object uniqueness and the guarantee that objects still in use can not be discarded, we are able to safely combine persistent attributes and transient attributes into single objects. For the example of an object which encapsulating an IRC channel, now in this single object to store the channel name/description and access rules together with connections to channel subscribers is safe.
While it can't be quite correct if the persistent channel object are accessed through a traditional cache, whereas it will possibly be discarded without hits after a period of time, and reconstructed on another fetch, even there are constant subscribers to this channel meanwhile. Atop a traditional cache, the application will have to design a separated runtime channel object, and maintain the consistency and relationships among these persistent objects and transient objects by hand .
The overall data model design will be simpler and more adequate when a unique object graph at runtime is shared by both persistent and transient aspects.
* Relational Data Traverse for Embeded DB Access
SQL itself can never just turn Object Oriented , with SQL as the primary access method, we will not be able to avoid the impedance mismatch [3, Ambler, ORIM] problems.
SQL's nature is query, almost none Object Query Language that ever defined reached comfortable and graceful as SQL does for this job. A funny fact is, when querying, if an OQL doesn't look like SQL, it'll probably be wrapped so by the programmers.
SQL is not bad at all, when used to find data out. But when accessing an interconnected object graph, it is not all about query then. With Object Oriented thoughts, the natural way to access structured data is to traverse from one node to others related, rather than to query by criteria.
A result of the Relational Model is that one Tuple can be related with other Tuple s by the existence of some Tuple s in a particular Relation serving this connective purpose.
SQL reflects these connectivities by means of JOINs within query criteria, this approach fits in perfectly with the RAM Cache atop Disk Storage architecture, where applications should never be given pointers to actual data at all, but specify them by logical criteria for manipulation.
In the Object Oriented view, this result can be stated that one Object can be related with other Object s by the existence of some Relation Object s in a particular Relation Class serving this connective purpose.
But conventional Object Oriented data paradigm does not distinguish this sort of relationships from normal object references, thus resulted in the fact that conventional OODBMS choose to reflect such connectivities by just keeping references in one object to other objects. In this way, they actually fall into the Network Model [2, Chen, 1976] , other than the Relational Model in our question.
In favor of the Relational Model as well as Object Oriented analysis and design paradigm, we have to introduce two things into the object space:
1. Relation Class
A new base class for all relation types, those intend to represent relationships between other objects.
2. Persistent Reference
A new reference type that distinguished from normal object reference types, that used to reach such indirectly related objects .
With these two defined, we will be able to maintain object relationships in the Relational way. That is, by manipulating objects of Relation Class . With further helps by code generation technologies, we will be able to just declare special Persistent Reference s for the objects to be connected, referencing their related peers, and leave the correctness and integrity of these Persistent Reference s maintained by the database management system.
Finally traversing among persistent object graphs represented in this way appears as natural as traversing Plain Old in-memory data structures matured before years. What's really great, you can straight encapsulate candidate logics as object behaviors as well.
* Hosting Based Interfacing for Distributed DB Access
By far interfacing (communication) between software components are based on invocations , the caller send a request to a target component, then wait that component to return a calculated response according to its interfacing contract.
As most communications are synchronous, i.e. the caller pends to wait response from the target component, there are also asynchronous variants, normally called callback mechanism: The caller registers some callable routines to the target component, before sending asychronous requests. And after an asychronous request has been sent, the caller continues immediately to do other jobs at hand. The called component may have some results at a later time, then send the resulted information or notification via those callback routines previously registered.
But all these approaches require the service logic be proposed, announced and interpreted by the target component. It has to at the very first expose those service functionalities as its callable interface/contract. - The common thought of these approaches can be summarized as the idea of Invocation Based Interfacing , on contrary of the Hosting Based Interfacing idea proposed herein.
For communications between software components, by the idea of Hosting Based Interfacing , the requests sent between software components will no longer be pure data in fixed formats by interface/contract. Instead, they are in variable structure and each form an request command which maps to executable code supplied by the caller as well.
XML is very capable of carrying request data in variable strucutres, and dynamic deployment technologies will aid the called components to fetch proper executable code to run on receipt of a command. Finally object technologies will help encapsulating these request commands into command objects to gain clearer design.
The mechanism for Stored Procedures in most contemporary relational database management systems partially meets this idea. And SPs show superior execution performance since they just run inside the database server . However by tradition, SPs are usually defined by DBAs instead of application developers, and unwilling to change during evolutionary application development. What's worse, stored procedures are usually written in feature-limited, proprietary programming languages, and can merely be dynamicly deployed.
For a database system, if distributed applications access the server through executable command objects, writen in the same programming language as the DBMS, instead of sequential combinations of SQL commands, then you can reasonably expect:
1. When designing command objects as part of database applications, the database can be simply viewed as an embeded instance, then the benefits including unique object graph will be all available.
2. Superior performance. Even possibly better than stored procdures when commands are properly written (because in the DB native programming language other than SP specific languages).
3. Power, flexibility, robustness as from the general purpose programming languages coding the DBMS itself.