Opinion: An index is not a book

Discussions

News: Opinion: An index is not a book

  1. Opinion: An index is not a book (234 messages)

    Charles Armstrong points out that sometimes relational databases aren't exactly the best way to actually store data. Mr. Armstrong goes on to challenge readers to "overcome the mental block that has gotten us thinking storage and query are the same thing," primarily focusing on object databases.

    In Charles Armstrong's recent blog An index is not a book, Charles writes:
    Obvious right? Well then why does everyone use relational databases for storage?

    Relational databases are great for query. They are analagous to the index of a book. Relational databases are a terrible storage mechanism. Anyone who has tried to map a complex object graph to a relational storage mechanism will know this.

    Object databases are terrible for queries. Anyone who has tried to do queries on an object database knows this.

    So why not use a relational databases to index an Object storage mechanism? Overcome the mental block that has gotten us thinking storage and query are the same thing. Seperate them. Don't confuse the index with the book.

    Threaded Messages (234)

  2. sigh[ Go to top ]

    Another contender for our "silliest blog brainfart entry, 2005 edition"...
  3. Because...[ Go to top ]

    It turns out that if you pay attention the overall domain, data is much more valuable when you can read it than if you can just write it.

    So, when it reallys comes down to it, the value of the data really is the value of the index.

    If a phone book was sorted by street name, it would be vastly less useful to a majority of it the consumers of the data it represents, even though the data in a sorted by street name phone book is essentially identical to a typical sorted by name phone book.

    Information dumped in a pile is about as useful as a huge stack of old newspapers, and about as readily referenced.

    While you certainly can't have indexes without data, indexes is what makes the data valuable. The Web would be essentially worthless without search engines. HTTP may be the infrastructure of the Web as we know, but search engines drive it and power it.

    RDBMS are essentially the Search Engines for corporate data. They let us gives the data a rough structure that we think matches the application domain, but they also give us the flexibility to mold and manipulate that structure very easily, giving us untold power over the data in ways that we may not even imagine when we start collecting it.

    Data is the key, and RDBMS's are designed to manipulate and manage the data. But SQL is the true enabler that makes even more powerful.

    It turns out the with SQL, I can make really badly optimized queries that invlove several full table joins and table scans in order to answer my question. But that's the true beauty of SQL.

    SQL makes the indexes and organization of the data irrelevant to the user. I can use SQL to make any view of the data that I want, and express it neatly, succinctly, and portably. This brings light to the data in all sorts of ways.

    Obviously, with lots of data, some SQL queries are worse (even far worse) that others in terms of performance, but we can tune that out using indexes and what not.

    It turns out that the flaming hoops we jump through to persist our data into RDBMS's ends up, 9 times out of 10, well worth the trouble over a more "elegant" or "simpler" solution. Having maleable data and impromptu queries and manipulation is far more powerful than an enforced rigid structure, and most of the time it's worth paying the potential performance hit to get that flexibility.

    So, in the end, I think for most users, it IS the index that matters more than the book.
  4. the concept is right ...[ Go to top ]

    Could it be that folks are generally in a rut when it comes to persistence? I tend to agree with this. Isn't it true that businesses tend to want to use their proverbial "hammer" for everything and it's hard to get them to shift (skill sets, training, et al factor into this position)? That issue aside, I could see adopting/promoting object storage and applying a search engine/indexer (like Lucene) to xml renderings of the objects for some applications.

    Persistence would be very straightforward. A query would go to the indexer first and then grab the appropriate objects, by key, from the object store. This could vastly simplify development of certain applications and allow for significant flexibility in the data model.

    Though not for all applications, I think there are some significant benefits to this model if you're willing to think a little outside of "the box" :)
  5. Those who do not learn from history are condemned to repeat it.

    File based storage systems have been tried in the past and found to be woefully inadequate. Therefore, in the 1970s people moved to relational databases because it guaranteed data integrity, reduced or eliminated duplication, and provided quite robust and easy to use querying abilities.

    XML and Object persisitence are essentially hierarchical file storage mechanisms. File based technologies with rigid hierarchies failed more than three decades ago. What makes you think that it will succeed this time? I am interested in finding out what we plan to do differently that will assure us of some success.

    While hierarchies and file based persisitence mechanisms encourage, nay mandate , rigid structure that may not remain true in a world of ever changing business needs, a relational database gives no special weightage to any point of view.

    In other words, in a sample banking application, you can query the data from a customer point of view, an account point of view, or a transaction point of view. Not so easy in a hierarchy if a specific viewpoint (Customer or Account or Transaction or ...) is considered the root of the hierarchy.

    Let us say that your object model defines the Customer as the root of the hierarchy. How, then, do you find out how many transactions happened in a given period of time? Not so easy, is it? You'd have to traverse the complete list of Customers, possibly go through each account, then go through the transactions for each account and find out if it meets your criteria. Whereas, using a relational database you'd write a simple query such as select count(*) from transaction where tx_time between 'day1' and 'day2'

       Any other viewpoint is easily handled using a relational database. Not at all easy in an "object persistence" world.

       Given that OO persistence can not guarantee data integrity or facilitate various types of queries, while a relational database does provide these guarantees, why would anyone want to use object persistence?

    The only reason I can think of is that they are unaware of the fact that these things have been tried and have failed, and are also unaware of the limitations of object technology for querying.

    Ravi
  6. XML and Object persisitence are essentially hierarchical file storage mechanisms.

    This is not true.
    While hierarchies and file based persisitence mechanisms encourage, nay mandate , rigid structure

    Not at all!
    in a sample banking application, you can query the data from a customer point of view, an account point of view, or a transaction point of view. Not so easy in a hierarchy if a specific viewpoint (Customer or Account or Transaction or ...) is considered the root of the hierarchy.
    Let us say that your object model defines the Customer as the root of the hierarchy. How, then, do you find out how many transactions happened in a given period of time? Not so easy, is it? You'd have to traverse the complete list of Customers, possibly go through each account, then go through the transactions for each account and find out if it meets your criteria. Whereas, using a relational database you'd write a simple query such as select count(*) from transaction where tx_time between 'day1' and 'day2'    Any other viewpoint is easily handled using a relational database. Not at all easy in an "object persistence" world.

    It is just as easy. There are query languages for object databases which are flexible and easy to write: JDOQL for example, or Smalltalk for Gemstone/S.

    Object databases need not be hierarchies in the way that relationships between objects in memory need not be hierarchies. There is no reason why object databases need to traverse all objects to locate matches any more than relational databases need to traverse all records to locate matches - they can have indexes too!
    Given that OO persistence can not guarantee data integrity or facilitate various types of queries, while a relational database does provide these guarantees, why would anyone want to use object persistence?

    Why do you say that OO persistence can't guarantee data etc.? Of course it can! There are well-established object databases such as Gemstone and Versant that have been providing this for a very long time.
    The only reason I can think of is that they are unaware of the fact that these things have been tried and have failed, and are also unaware of the limitations of object technology for querying.Ravi

    Have you tried a product such as Gemstone?

    Versant's object database is a quality product that has recently gained the capabilities of JDO (and soon JDO 2.0).

    I don't believe the limitations you are describing really exist in modern object databases.
  7. XML and Object persisitence are essentially hierarchical file storage mechanisms.
    This is not true.
    Most OO databases are using the network model. That model was also abandoned 25 years ago.
    There are query languages for object databases which are flexible and easy to write: JDOQL for example, or Smalltalk for Gemstone/S.Object databases need not be hierarchies in the way that relationships between objects in memory need not be hierarchies. There is no reason why object databases need to traverse all objects to locate matches any more than relational databases need to traverse all records to locate matches - they can have indexes too!
    In an OO database has the same query capabilites as relational databases, if it has relations and if it has indexes, it is a relational database with extra OO features added (hybrid object-relational).
  8. Most OO databases are using the network model. That model was also abandoned 25 years ago.

    Funny!

    What I want tell you, that 10 years (1995) ago I work on warehouse software. We use DB_Vista (Raima Velocis). On a Pentium233 64MB RAM we have a 10 gigabyte databases that work as fast as you can't imagine.

    Network model DB work much qucker then relational DB. Period. Ok, why they are not popular now? Bacause it difficult technology. You must care about many things. And when Development become commodity, stupid guys come to Development area (which was previous scientific area) and starting complain about too difficult technology. From that time, Relational DB start show there greatest simplification. End of story.
  9. Opinion: An index is not a book[ Go to top ]

    It seems that in many cases systems built on RBDMS's have a tendency to get poluted with stuff imposed by RDBMS's like id properties. In an ideal world this would be removed on higher levels of abstraction, but in practice it rarely is.
    OODBMS solves this.
    The impedence mismatch between relational and OO-paradigme also often prevent's seemless reuse of persistent data in RDBMS's. I.e. the way 1-to-many relationships are typically expressed in RDBMS's means that many side can not be reused without modification in code or mapping mechanism. OODBMS do not have this problem.
    OODBMS are not the final answer to our problems with persistence however. What we need is an orthogonally persistent language/runtime-environment. Imagine everything being persistent unless declared otherwise.

    Regards
    Jan H. Hansen
  10. ;In an OO database has the same query capabilites as relational databases, if it has relations and if it has indexes, it is a relational database with extra OO features added (hybrid object-relational).

    I did not say it had the same query capabilities, or that it had relations: Just that it can have its own powerful querying and it can have indexes to speed searches.
  11. Steve Zara says:
    Why do you say that OO persistence can't guarantee data etc.? Of course it can!

    Could you please prove to me that it can? Or cite some reference where it proves that?

    Relational database theory is based on predicate theory and set logic. When used properly, they can guarantee the integrity of the data.

    Is there any analogue in the Object world?

    If you are storing everything in files, and somebody adds a meaningless record to the file, what then? For example, assume that you keep the Transactions in a separate file, Customer in a separate file, and Account in a separate file. What if somebody adds a transaction by manipulating the file? What if the transaction is completely meaningless? What happens to the integrity of your data then?

       Such a thing can not happen in a properly designed database. Any attempt to add a transaction without associating it to an acocunt or customer, as the rule may be, will simply be rejected by the DBMS (Database management system) because it violates integrity constraints.

       Can your object databases assure me of this level of integrity? If they can not guarantee this level of integrity why should I replace RDBMS (Relational Database Management System) with whatever is being peddled as "better" than a RDBMS?

    If they can provide this level of integrity, then they are merely repeating what has been established in relational database theory. What is the benefit of replacing something with its (almost) clone? If you want me to replace my relational database with an object database that walks like a RDBMS and quacks like a RDBMS, then you must point out some significant benefits. If not, you are merely buying into the hype of the software vendors, not thinking for yourself.

      Learn about relational database theory, learn how ODBMS are implemented, then do the comparison. And of you then decide that an OODBMS makes sense for you, I'll respect your decision.
  12. Steve Zara says:Why do you say that OO persistence can't guarantee data etc.? Of course it can! Could you please prove to me that it can? Or cite some reference where it proves that?

    I can't, as I don't know the theory. However, I know that Object Databases are being used in critical roles in industries such as banking and insurance where data integrity is vital.

    Here are the URLs of two vendors of object databases:

    www.gemstone.com
    www.versant.com

    I would suggest you e-mail them and tell them you don't believe their products can guarantee data integrity. I believe the resulting correspondence would be interesting!
    If you are storing everything in files, and somebody adds a meaningless record to the file, what then? For example, assume that you keep the Transactions in a separate file, Customer in a separate file, and Account in a separate file. What if somebody adds a transaction by manipulating the file?

    Why do you keep insisting that Object Databases store things in some sort of filesystem structure?
    Learn about relational database theory, learn how ODBMS are implemented, then do the comparison. And of you then decide that an OODBMS makes sense for you, I'll respect your decision.

    I know enough about databases to understand that learning relational theory is irrelevant to whether or not a particular database engine guarantees integrity. Do you have detailed knowledge about how Oracle physically stores it's data? or MySQL? or PostgreSQL?

    I'm personally not a user of object databases, and don't have that much interest in them for current projects, but I do know enough about them to know that saying that they are either generally slow or can't guarantee data integrity is nonsense.
  13. Steve, you are making my points for me when you say:
    Here are the URLs of two vendors of object databases:

    www.gemstone.com
    www.versant.com


       I told you that you must not buy into the hype provider by vendors, but to start thinking for yourself. And all you do is provide me the urls of two software vendors!
    Why do you keep insisting that Object Databases store things in some sort of filesystem structure?

    How else do they store the data?

    Ultimately, everything is stored in files, even data in a relational database.

    What a relational database provides to you in addition to data integrity and ease of manipulation is the idea of logical-physical independence. The concept of a "logical" table is one of the key ideas of relational theory. You, as a developer, never have to know where the data is actually stored, whether it is on one file, or many, on one server or many.

    A developer using Object databases must know of the files that are used to store the data. Can you provide me any reference to show that an Object DBMS developer can just start from scratch and decide to persist data without ever knowing what file it is stored in?
  14. Steve, you are making my points for me when you say:
    Here are the URLs of two vendors of object databases:www.gemstone.comwww.versant.com
       I told you that you must not buy into the hype provider by vendors, but to start thinking for yourself. And all you do is provide me the urls of two software vendors!

    Well, those URLs contain documentation and free evaluation products. If you choose to look at that documentation and try the products you will see for yourself what I mean.
    Can you provide me any reference to show that an Object DBMS developer can just start from scratch and decide to persist data without ever knowing what file it is stored in?

    Yes. I already have - the documentation on those sites. Or you may wish to look at

    http://www.service-architecture.com/object-oriented-databases/articles/index.html

    There are plenty of articles and FAQs.
  15. You, as a developer, never have to know where the data is actually stored, whether it is on one file, or many, on one server or many.A developer using Object databases must know of the files that are used to store the data.

    I don't use OODBMS, nor do I like finding myself defending a technology that I don't espouse the use of, but it's obvious that (at a conceptual level) an OODBMS would require a file name no more often than an RDBMS would.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  16. I don't use OODBMS, nor do I like finding myself defending a technology that I don't espouse the use of, but it's obvious that (at a conceptual level) an OODBMS would require a file name no more often than an RDBMS would.
    What is OODBMS ? If it is persitent storage for "complex object graph" then "java.io.ObjectOutputStream" is an OODBMS by definition, is not it ?
    "stream.writeObject(complexObjectGrapt)"
    RDBMS has persistent storage in low level too, but you do not need to care about persistence if you use RDBMS, database manges it (Transparent Persistence). RDBMS user just manipulates or queries data in declarative way without any files and code. RDBMS are popupular for very simple reasons: easy-of-use, secure and performant.
  17. Opinion: An index is not a book[ Go to top ]

    RDBMS has persistent storage in low level too, but you do not need to care about persistence if you use RDBMS, database manges it (Transparent Persistence). RDBMS user just manipulates or queries data in declarative way without any files and code.

    Just change RDBMS for ODBMS (Object database management system) and you get an equally correct statement:

    "ODBMS has persistence storage in low level too, but you do not need to care about persistence if you use ODBMS, database manages it (Transparent Persistence). ODBMS user just manipulates or queries objects in a declarative way without any files and code."
    RDBMS are popupular for very simple reasons: easy-of-use, secure and performant.

    True, but they are also popular because they are popular - its a self-propogating system, maintaining the myth that they are the only secure, reliable and robust way to store data.
  18. Opinion: An index is not a book[ Go to top ]

    "Just change RDBMS for ODBMS (Object database management system) and you get an equally correct statement"
    Probably it means ODBMS is the same thing. Relational database is an object database management system, it is object oriented and manages objects without problems too.
  19. Opinion: An index is not a book[ Go to top ]

    "Just change RDBMS for ODBMS (Object database management system) and you get an equally correct statement"Probably it means ODBMS is the same thing. Relational database is an object database management system, it is object oriented and manages objects without problems too.

    I'm sorry but that makes absolutely NO sense!!
  20. Opinion: An index is not a book[ Go to top ]

    "Just change RDBMS for ODBMS (Object database management system) and you get an equally correct statement"Probably it means ODBMS is the same thing. Relational database is an object database management system, it is object oriented and manages objects without problems too.
    I'm sorry but that makes absolutely NO sense!!

    Probably what he means is that you can use relational databases to store objects. Which, of course you can, because in principle you can use them to store anything.

    The statement that 'relational databases are object oriented ' reminds me of those tiresome C fans who insist that C is object oriented because you can sort of do object-type things if you are really clever with structures and pointers to functions.

    I think that IT is exciting because there is a constant sense of innovation and there is always something new to learn, so I find it sad when I encounter people who become skilled in one paradigm (such as relational databases) and then can't see anything beyond that paradigm, or become defensive - anything that doesn't match their skillset can't possibly be of use, and anyone who doesn't use their approach must be either ignorant or misguided.

    Relational databases are a vital technology in IT, but to say that they are the only way that data should ever be stored, and any other mechanism is flawed is a sign of rigid thinking and lack of experience, in my opinion.
  21. Opinion: An index is not a book[ Go to top ]

    The statement that 'relational databases are object oriented ' reminds me of those tiresome C fans who insist that C is object oriented because you can sort of do object-type things if you are really clever with structures and pointers to functions.
    Can you define "Object Oriented" without analogies and C pointers ?
  22. Opinion: An index is not a book[ Go to top ]

    The statement that 'relational databases are object oriented ' reminds me of those tiresome C fans who insist that C is object oriented because you can sort of do object-type things if you are really clever with structures and pointers to functions.
    Can you define "Object Oriented" without analogies and C pointers ?

    Absolutely. I was a Smalltalk developer for a long time. I never needed analogies or 'C pointers' to describe how Smalltalk worked.
  23. Opinion: An index is not a book[ Go to top ]

    I see you OODBMS expert and probably you can help me. I like to learn, but can not find any formal object model definition. Do you know some refernce without marketing BS ? BS confuses me.
  24. Opinion: An index is not a book[ Go to top ]

    I see you OODBMS expert and probably you can help me. I like to learn, but can not find any formal object model definition. Do you know some refernce without marketing BS ? BS confuses me.

    I am not an expert by any standards, and I know of no formal models. I'll let you into a secret - almost no aspect of IT is based on formal models! When something is based on formal models, it is always incomplete in some way. You see, most of us are pragmatic - we use what works. For example, formal proof of correctness of code was all the rage in the early 1980s, but was soon found to be impractical in almost all cases. I have also been a Prolog developer in the 80s, and although it was nice and formal, we actually had to use lots of nasty 'cuts' for practical development. IT is like that.


    Your relational database may be based on formal definition, but the actual implementations are always flawed in some way. MySQL is a relational database, but given a choice of that or a Versant or Gemstone Object database to handle real commercial data, I would choose the latter without even a milliseconds hesitation.

    A good developer bases their decisions on the reputation and quality of a product, not on how, in some ideal mathematical formal system, the product should work.
  25. Opinion: An index is not a book[ Go to top ]

    MySQL is a relational database
    Yes, MySQL is a relational database, but old versions are not database management systems.
    Probably your faworite products have formal model too, but I do not know it and I can not by this stuff without information, I can not accept BS and analogies as information.
  26. Opinion: An index is not a book[ Go to top ]

    MySQL is a relational database
    Yes, MySQL is a relational database, but old versions are not database management systems.Probably your faworite products have formal model too, but I do not know it and I can not by this stuff without information, I can not accept BS and analogies as information.

    Why are you asking me? I am not selling you anything. If you want detailed information for a purchase I am sure that Object Database vendors will provide you with all the information you need, including technical details of how they work. www.versant.com, www.gemstone.com are a couple.
  27. Opinion: An index is not a book[ Go to top ]

    Ok, forget it. I see you do not have information about OODBMS too, probably it it doe's not exist and there is nothing to talk about.
  28. Opinion: An index is not a book[ Go to top ]

    Wow, just one day not reading TSS and look how much I've missed!

    Ravi, can you clarify the data and relationships you envisage for the query below so that I can have a go with JDOQL?
    Select account_id, count(customer_id) from t_c_c group by account_id

    For example, is "t_c_c" a join table between the Account (primary key "account_id" and Customer (primary key "customer_id") tables? Do you expect one account to have many customers, or one customer to have many accounts, or either (many-many)? Can you give me a small example of a few rows in the "t_c_c" and the result of your query when run against that set of data?

    That would be great. Thanks.
  29. Opinion: An index is not a book[ Go to top ]

    Wow, just one day not reading TSS and look how much I've missed!
    Please don't leave us to fend for ourselves again! :)
  30. Hi Robin,

      Sorry for not replying earlier.

      t_c_a [typo there, is short for customer_account] has the columns cust_id and acct_id, with the combination (cust_id,acct_id) being the primary key. (Other columns are irrelevant for our purpose.)

      I am assuming a many to many relationship between customers and accounts. That is, one account can belong to many customers (joint accounts) and a customer can have more than one account.

       Sample data is: (Cust_Id, followed by acct_id)
    R1: C1, A1
    R2: C2, A1
    R3: C1, A3
    R4: C3, A2
    R5: C4, A2

    The result of the query would be:
    Acct_id Count(cust_id)
    -------- --------------
    A1 2
    A2 2
    A3 1

    ---------------------------------------------


    Ravi
  31. Query cracked with JDOQL[ Go to top ]

    Hi Ravi

    I created a new project, built the Account and Customer classes (package com.ravi.domain), populated them as per your many-to-many join table, and wrote the query.

    Account has a field called "customers" which is a Collection of Customer instances. Likewise, Customer has a collection of Account instances called "accounts". Account instances live in the "Account" table. Customer instances live in the "Customer" table. The many-many relationship is resolved in the "Account_Customer" table which is analogous to your "t_c_a" table.

    Here's the JDOQL query:
    SELECT a, count(this) FROM com.ravi.domain.Customer WHERE ((com.ravi.domain.Account)a).customers.contains(this) GROUP BY a

    The typecast of "a" is the implicit definition of "a" as a variable of type "Account". I ran it against a SQL database, but the same query is naturally portable to any object database for which a JDO implementation exists.

    Here is the code for executing the query and iterating through the results:
    System.out.println("Running query");
            Query q = pm.newQuery("SELECT a, count(this) FROM com.ravi.domain.Customer WHERE ((com.ravi.domain.Account)a).customers.contains(this) GROUP BY a");
            t.begin();
            Collection results = (Collection) q.execute();
            Iterator iter = results.iterator();
            while (iter.hasNext()) {
                Object[] o = (Object[]) iter.next();
                Account a = (Account) o[0];
                int count = ((Long) o[1]).intValue();
                System.out.println(a + " , " + count );

            }
            t.commit();
            System.out.println("Query complete");

    The printed results were (My toString() on Accounts enumerates its Customers):
    Running query
    A1 [ C1 C2 ] , 2
    A2 [ C4 C3 ] , 2
    A3 [ C1 ] , 1
    Query complete

    Kind regards, Robin.
  32. Query cracked with JDOQL[ Go to top ]

    SELECT a, count(this) FROM com.ravi.domain.Customer WHERE ((com.ravi.domain.Account)a).customers.contains(this) GROUP BY a
    The typecast of "a" is the implicit definition of "a" as a variable of type "Account". I ran it against a SQL database, but the same query is naturally portable to any object database for which a JDO implementation exists.Are you sure all JDO implementations will print the same result ? How do you define JDOQL semantic ?
  33. Query cracked with JDOQL[ Go to top ]

    Hi Juozas
    Are you sure all JDO implementations will print the same result ? How do you define JDOQL semantic?

    I believe the query is valid according to the JDOQL Grammar (see the JDO 2.0 Public Draft specification), but I'm open to comments from anyone who knows or suspects otherwise.

    I trust that these semantics will be the subject of a few TCK tests which Sun Microsystems is kindly authoring for us.

    I know that any JDO 2.0-compilant implementation must pass the TCK tests, regardless of the target database (Object/Relational/other).

    The JDO 2.0 "preview features" available *today* in many JDO implementations are not subject to a TCK, so I cannot promise that the query is portable *today*. I ran it with Kodo JDO version 3.2.4.

    Kind regards, Robin.
  34. Query cracked with JDOQL[ Go to top ]

    Hi Juozas
    Are you sure all JDO implementations will print the same result ? How do you define JDOQL semantic?
    I believe the query is valid according to the JDOQL Grammar (see the JDO 2.0 Public Draft specification), but I'm open to comments from anyone who knows or suspects otherwise.I trust that these semantics will be the subject of a few TCK tests which Sun Microsystems is kindly authoring for us.I know that any JDO 2.0-compilant implementation must pass the TCK tests, regardless of the target database (Object/Relational/other).The JDO 2.0 "preview features" available *today* in many JDO implementations are not subject to a TCK, so I cannot promise that the query is portable *today*. I ran it with Kodo JDO version 3.2.4.Kind regards, Robin.

    I am sure this grammar is valid, but I am want to understand sematics. Are you going to explain it by analogy and empiric tests ?
  35. Query cracked with JDOQL[ Go to top ]

    Hi Juozas
    I am sure this grammar is valid, but I am want to understand sematics. Are you going to explain it by analogy...

    The query uses "a" which is an "unbound query variable" of type Account.

    Whereas a "bound query variable" occurs within a contains constraint, e.g. customers.contains(c), in which "c" would logically iterate all Customer instances in the "customers" collection, the unbound variable "a" does not appear within contains(). Therefore it is not constrained to the contents of any one collection of Account instances. Therefore it logically iterates ALL Account instances in the database.

    (Note my use of logically iterates - this does not mean that iteration physically takes place at runtime.)

    So in my Query "a" is any single instance of all persistent Account objects. What is returned is the count of customers which are contained by a's "customers" collection, grouped by a, where a is each Account in turn.

    For further details please look at "unbound query variables" in any JDO specification, version 1.0 or above. Unbound query variables offer substantial power to the developer but are not as widely used as they could be, probably because many developers, even many JDO developers, are unaware of their power.
    ... and empiric tests?

    My query executed with Kodo JDO 3.2.4 is an empirical test. I'm happy to supply the small project zip file for anyone else wishing to reproduce it. What I do not offer is a mathematical proof of correctness.... ;-)
  36. Query cracked with JDOQL[ Go to top ]

    Thank you Robin for pointing out how to do this using JDO QL.

    You have demonstrated that it is so much more difficult to write even a simple query against an ODBMS. Just look at the code for the SQL and the JDOQL. The SQL is clear, anybody with a bit of understanding of SQL will know what is the intent of the query.

    For the JDOQL, you have to really go through the code to understand what the developer wants to do. You create so many objects (a Collection, and all account objects, for example). You actually have to do so much work after executing the query.

    Yes, you will get the same result. That is never in doubt. You can get the same result using C, Pascal, or any language you choose to. But the important thing is how much effort is required to get the result. In this case, I feel that SQL is better. And hence, I'd prefer SQL databases.

    Which is why I prefer the expressive power and conciseness of SQL, along with its ability to write ad-hoc queries, to JDO QL or any OO QL.

    Ravi
  37. Query cracked with JDOQL[ Go to top ]

    Thank you Robin for pointing out how to do this using JDO QL.You have demonstrated that it is so much more difficult to write...
    Wow. What you've missed/ignored is that with Robins query, you all ready have Account objects. Your SQL - only a recordset of account_ids. You still need to get accounts. And if you want to work with Domain objects you need to populate them too.

    As for much more difficult to write - not really. Take away package/namespaces and it is very close to what you have and is very readable.

    Now take very complex query in SQL and compare it to the equivalent in an OO query language. Just did one with Hibernate and you wouldn't believe the work it saved me (and you probably wouldn't).
  38. Query cracked with JDOQL[ Go to top ]

    Hi Ravi
    Thank you Robin...

    My pleasure.
    You have demonstrated that it is so much more difficult to write even a simple query...

    Be careful here - I agree that this particular query is more easy to write and to understand in SQL.
    Yes, you will get the same result. That is never in doubt.

    Interresting change of direction. The original premise seemed to be that this was not possible except with SQL. I have shown that it can be portably expressed in object model terms which are independent of the database paradigm.
    But the important thing is how much effort is required to get the result.

    At first glance it was clear that your particular SQL might be hard to express in JDOQL, and might have been proposed precisely because of that. Which is fine and fair - I'm not at all surprised that SQL is more elegant for that particular query. Remember my object model does not even have an object which represents the Account_Customer abstraction - I have only Account and Customer and the intersection table is merely a database artefact. The query might be easier in JDOQL if the intersection table was materialized as an AccountCustomer object in the model, but I don't usually do that in real-world models (unless the intersection carries state beyond the pair of foreign keys) and felt that doing so here might be construed as "cheating".
    Which is why I prefer the expressive power and conciseness of SQL, along with its ability to write ad-hoc queries, to JDO QL or any OO QL.

    And this is where your narrow focus on one particular query (or type of query) leads to a conclusion which is written as if it is universally applicable.

    Firstly JDOQL is ad-hoc; the String (sorry Carl) which is passed to the newQuery() method is not necessarily a constant.

    Secondly here is a counter example.
    SELECT FROM com.frontoffice.dealing.InterestRateSwapTrade WHERE tradeId = :tradeId

    The above JDOQL selects the trade object with the given tradeId parameter. What you get is an InterestRateSwapTrade instance. Regardless of the fact that InterestRateSwapTrade is actually a subclass of SwapTrade which is itself a subclass of AbstractCapitalMarketsTrade. These live in separate tables, so the one line of JDOQL represents a 3-way table join in SQL, as well as "inflation" of the Java object from the SQL result set.

    The above is a real-world example (package and class names changed to project the innocent).

    Kind regards, Robin.
  39. Query cracked with JDOQL[ Go to top ]

    Hi Robin, You say:
    At first glance it was clear that your particular SQL might be hard to express in JDOQL, and might have been proposed precisely because of that.

    All the discussion about the customer account issue was mainly with regards to the data integrity issues of many to many relationships.

    Yes, and it is clear that there are many queries that are much easier to express in SQL.

    The other query that you cite can be done in SQL, too, using the concept of views or using user defined functions in a query, which is essentially what your Java code is doing behind the scene.

    In that case, the SQL query would simply become:

      Select * from v_interest_rate_swap where trade_id = 'X';

    Just like an OO model would define the class and its hierarchy once, a database developer would define the query once and the developers never need know the implementation details. (Aside: Wow, hiding implementation detail in SQL, that sounds like a practice possible using only OO, doesn't it?)

    Ravi
  40. Query cracked with JDOQL[ Go to top ]

    The above JDOQL selects the trade object with the given tradeId parameter. What you get is an InterestRateSwapTrade instance. Regardless of the fact that InterestRateSwapTrade is actually a subclass of SwapTrade which is itself a subclass of AbstractCapitalMarketsTrade. These live in separate tables, so the one line of JDOQL represents a 3-way table join in SQL, as well as "inflation" of the Java object from the SQL result set.The above is a real-world example (package and class names changed to project the innocent).Kind regards, Robin.
    Probably it is not the best way, but one of the right ways to define JDOQL grammar to SQL grammar mapping (it is possible to define this mapping usinf formal grammar itself).
    BTW you do not need any joins to express entity generalization and implement it using relational model (It is a database modeling mistake to map entities to tables directly)
  41. BTW you do not need any joins to express entity generalization and implement it using relational model (It is a database modeling mistake to map entities to tables directly)
    Can you show us how would you do it, and still maintain proper data normalization?
  42. BTW you do not need any joins to express entity generalization and implement it using relational model (It is a database modeling mistake to map entities to tables directly)
    Can you show us how would you do it, and still maintain proper data normalization?
    I think you know this mapping yourself. You use ORM, do not you ? ER modeling have the same concepts as the OR mapping. Good way to transform generalization is to use "discriminator" or "clasifier" field ("discriminator" term is used for aggregation in ER modeling ). This stuff is well known, it is defined in EJB3 specification too. Generalization doe's not produce any relationships in relational model.
  43. Generalization doe's not produce any relationships in relational model.
    Again, can you show us an simple example?
  44. Generalization doe's not produce any relationships in relational model.
    Again, can you show us an simple example?
    Why do you need this trivial example ? You are ORM expert yourself. Motivation for the "flat" mapping is the same as for "one-to-one" conceptual relationship mapping to single table in relational model. "Anti motivation" for this mapping is the "waste of disk space" for null values, you need to declare nullable fields last for this reason (database will be able to optimize disk usage)
  45. Query cracked with JDOQL[ Go to top ]

    BTW you do not need any joins to express entity generalization and implement it using relational model (It is a database modeling mistake to map entities to tables directly)

    I prefer to call it a choice, rather than a mistake. Juozas is right - one can map an inheritance hierarchy into a single table using what is known as a "flat" mapping with class discrimination.
  46. Query cracked with JDOQL[ Go to top ]

    BTW you do not need any joins to express entity generalization and implement it using relational model (It is a database modeling mistake to map entities to tables directly)
    I prefer to call it a choice, rather than a mistake. Juozas is right - one can map an inheritance hierarchy into a single table using what is known as a "flat" mapping with class discrimination.
    Yes, but it would be very denormalized on some situations. Keeping proper data integrity in this kind of mapping, specially when you have relationships between different entities mapped to this single table, is very cumbersome.

    For example: Class customer has e-mail field, and inherits from class person, which has name field. Map these 2 to the same table in RDBMS, PERSONS, and have the following integrity check applied to it: customer's email should not be null. It is doable, using triggers. Now add another type of person to the hierachy, like DEPENDENT inherits from PERSON, with a mandatory relationship to yet another type of person (EMPLOYEE with not null DEPARTMENT field), but which can not be to a CUSTOMER. If you map all this hierarchy to a single table, things will get preety messy.

    PS: you wouldn't have to worry about how to do this mapping properly if you were using a OODB... ;)

    Juozas, is this how you map entity generalizations to RDBMS?
  47. Yes, but it would be very denormalized on some situations. Keeping proper data integrity in this kind of mapping, specially when you have relationships between different entities mapped to this single table, is very cumbersome. For example: Class customer has e-mail field, and inherits from class person, which has name field. Map these 2 to the same table in RDBMS, PERSONS, and have the following integrity check applied to it: customer's email should not be null. It is doable, using triggers. Now add another type of person to the hierachy, like DEPENDENT inherits from PERSON, with a mandatory relationship to yet another type of person (EMPLOYEE with not null DEPARTMENT field), but which can not be to a CUSTOMER. If you map all this hierarchy to a single table, things will get preety messy.PS: you wouldn't have to worry about how to do this mapping properly if you were using a OODB... ;)Juozas, is this how you map entity generalizations to RDBMS?
    I do not need any cool conceptual model features in relational model, I just need simple and performant queries or upades in this layer. "Nice" model is not a reason for "complex graph" traversal.
    BTW normalization is defined by functional depency not by ER modeling concepts. It is very nornal to have different conceptual and logical models (tool can do this transformation automaticaly, it is possible to find functional dependencies in conceptual model too)
    I develop MDA tools at this time, this stuff uses the same old ER modeling ideas too and graph rewriting or "model to model" transformation is the most important aspect in the "model driven architecture" (We use "PIM" and "PSM" terms for this stuff).
  48. Juozas, can you provide me a simple YES or NO to my question?

    And if it is a NO answer, could you enlighten me with an simple and direct example of how you would do it (map entity generalizations to RDBMS), regardless of my expertise on this matter? Am I asking too much?

    Thanks,
    Henrique Steckelberg
  49. http://www.hibernate.org/hib_docs/reference/en/html/mapping.html#mapping-declaration-discriminator
    Is it fine or you need some analogy ?
  50. http://www.hibernate.org/hib_docs/reference/en/html/mapping.html#mapping-declaration-discriminatorIs it fine or you need some analogy ?
    Are you aware of all the integrity enforcement problems I have desribed previously if you adopt this single-table-per-class-hierarchy solution? Like, have one field be not null for one class but not for another, and so on?

    Since the discussion revolves around data integrity comparison between OODBs and RDBs, I think my question is relevant.

    Regards,
    Henrique Steckelberg
  51. Are you aware of all the integrity enforcement problems I have desribed previously if you adopt this single-table-per-class-hierarchy solution? Like, have one field be not null for one class but not for another, and so on?Since the discussion revolves around data integrity comparison between OODBs and RDBs, I think my question is relevant.Regards,Henrique Steckelberg
    Yes, you can do it in this hypotetical example without triggers using "CHECK( email IS NOT NULL OR role = 'P' )"
  52. Are you aware of all the integrity enforcement problems I have desribed previously if you adopt this single-table-per-class-hierarchy solution? Like, have one field be not null for one class but not for another, and so on?Since the discussion revolves around data integrity comparison between OODBs and RDBs, I think my question is relevant.Regards,Henrique Steckelberg
    Yes, you can do it in this hypotetical example without triggers using "CHECK( email IS NOT NULL OR role = 'P' )"
    Yes, this one is easy. Enforcing FK rules is what gets nasty in single table mapping.
  53. Enforcing FK rules is what gets nasty in single table mapping.
    Yes, it is not easy and I make a lot this kind of mistakes myself, but it is possible to validate and fix this stuff. We need to lean any stuff to make it performant and safe, I do not believe in easy ways and I think it is silly to marketing BS or magic.
  54. marketing BS or magic[ Go to top ]

    Enforcing FK rules is what gets nasty in single table mapping.
    Yes, it is not easy and I make a lot this kind of mistakes myself, but it is possible to validate and fix this stuff. We need to lean any stuff to make it performant and safe, I do not believe in easy ways and I think it is silly to marketing BS or magic.

    What I still don't understand is why you consider anything that isn't relational to be silly, BS or magic, when there is clear evidence on this thread that alternative strategies for persisting data not only work, but work well for large volumes of data.

    I see a parallel here: it is between strictly typed and dynamic languages. I have used both, and I know that both strategies can work very effectively.

    Relational databases are like strictly typed languages - such as Java - all hard-coding of rules. Object databases are more like dynamic languages - like Smalltalk - with many of the rules in code in classes. (With some object databases, these rules are stored in the actual database, so the parallel is not perfect). Both strategies work: both can provide scalability, transactional integrity, data integrity, secure multi-user access etc.

    What is puzzling me is this: No competent Java (typed) or Smalltalk (dynamic) developers would, these days, call the other language 'silly', 'BS' or 'magic', as large-scale long-term enterprise projects have been developed in both languages. There may be some technical arguments either way, but the volume of use of both languages proves the effectiveness of both strategies. So why, given the plain fact that object databases not only work, but many companies and organisations are very happy with them, are some relational people hand-waving it away as 'just marketing', and trying to shoe-horn all possible data structures and storage requirements into relational stores? I just don't understand it.

    And now, some controversial views coming up!

    Finally, it is my belief that there is a shift in the way information is being handled in IT - increasingly databases are often being used little more than a robust multi-user way to store data, and not as the centre of actual processing. There are still huge numbers of PL/SQL coders, but even the largest database vendors are accepting that the way data will be increasingly defined in the future is in terms of objects and their relationships. Developers won't want to have to spend ages trying to model their data structures and relationships in a mathematically perfect relational form just to be able to store some object instances in a robust and shared store.

    Object databases are a natural, low-impedance match solution for this kind of use, and I think they will have a bright future. This is why I am so keen on store-agnostic technologies like JDO, in contrast to the pure-relational EJB 3.0, which is a backwards step - an attempt to hold back for a while the tide of non-relational methods of persistence.
  55. marketing BS or magic[ Go to top ]

    This is why I am so keen on store-agnostic technologies like JDO, in contrast to the pure-relational EJB 3.0, which is a backwards step - an attempt to hold back for a while the tide of non-relational methods of persistence.

    Probably we do not agree for this reason, I think network databases is a step back. As I understand form this dicussion "Object Oriented" is just a BS to sell to network database. Relational database is "Object Oriented" in the same way. Everything is object, is not it ?
  56. marketing BS or magic[ Go to top ]

    I understand form this dicussion "Object Oriented" is just a BS to sell to network database. Relational database is "Object Oriented" in the same way. Everything is object, is not it ?

    No, everything is certainly NOT an object.

    Object Orientation is clearly defined: To be object oriented a system must be designed to support inheritance, encapsulation and polymorphism.

    Relational databases are not designed to support these. Relational databases are designed to support inter-related sets of regular data: Sets of Tables.

    Just because you can, with effort, implement these features NOT mean that relational databases are object oriented. To be object oriented a system has to support these features by design and with minimal effort from the developer.

    I would suggest that it pointless to continue to insist that relational databases are object oriented unless you can provide evidence that full-featured object orientation was specifically designed into relational theory. A quote from Codd or co-workers indicating this would be useful.

    I would also suggest that it is pointless to continue to state that object databases are 'BS' without providing real evidence that they are currently failing to provide useful persistence solutions. This thread contains posts that show where object databases are being used effectively.
  57. marketing BS or magic[ Go to top ]

    A quote from Codd or co-workers indicating this would be useful.
    If you will define "Object Oriented" using the same way ( math ) then probably I will be able to prove this fact (or opposite). If you are going to compare math and analogy then it doe's not make any sence, it is possible to prove anything by analogy ( BS )
  58. marketing BS or magic[ Go to top ]

    A quote from Codd or co-workers indicating this would be useful.
    If you will define "Object Oriented" using the same way ( math ) then probably I will be able to prove this fact (or opposite).

    No. I am not after some obscure mathematical transform. All such a transform would show would be that it is possible to represent object orientation in relational databases. We all know that. But that is not the same a relational databases actually being object oriented, as I clearly described. You can represent objects in all kinds of things - Prolog, assembler, but that does not make them object oriented languages as you well know. What I am after is a clear proof that relational theory was designed with all the features of object orientation, and in a form that makes it easy for developers to make use of objects. I am not after some 'relational analogy' of objects, and not some 'marketing BS' about how relational database can do objects.

    Again: a quote from Codd or Date that states that they had object orientation in mind when they came up with relational theory would help.

    If not, perhaps we could agree that relational databases are not object oriented?
  59. marketing BS or magic[ Go to top ]

    I was sure you understand me. "Object stream" and "Object Oriended RDBMS" is just a sarcasm, sorry if it hurts. It is hard to be serious on this forum.
     I advocate OOP and MDA ideas myself, but I am sure it doe's not comflict with science, there is no problems to define objects in formal way and to transform this model. Mathemetical model can be very trivial, but calculus (model for queries) is not, I advocate transformation for this reason. Probably it is not an easy way, but it is scientific. Both models are very good and I like to have both of them.
     I am not sure about OODBMS, but it looks like this tool (framework) is designed to transform object model ( or conceptual model ) to network model automaticaly. I am afraid declarative network navigation is very complex to optimize and I do not believe it is an easy way too.
  60. marketing BS or magic[ Go to top ]

    I was sure you understand me. "Object stream" and "Object Oriended RDBMS" is just a sarcasm, sorry if it hurts.

    I realised that it was sarcastic - I just wanted to be sure that this was explicitly stated. It does not hurt at me all - robust debate is important and I really rather enjoy it.
    I advocate OOP and MDA ideas myself, but I am sure it doe's not comflict with science, there is no problems to define objects in formal way and to transform this model. Mathemetical model can be very trivial, but calculus (model for queries) is not, I advocate transformation for this reason. Probably it is not an easy way, but it is scientific. Both models are very good and I like to have both of them. I am not sure about OODBMS, but it looks like this tool (framework) is designed to transform object model ( or conceptual model ) to network model automaticaly. I am afraid declarative network navigation is very complex to optimize and I do not believe it is an easy way too.

    I don't have your skills in formal models. What I think is important (just my personal view) is actual performance and real behaviour. I do not know if object databases transform to a network model. From my point of view that is for the database vendor to worry about. All I am concerned about as a developer and IT decision maker is whether or not the database I am using is robust and fast and simple to develop for - is it a good match for my requirements. I don't sit down myself and worry theoretically about optimization: what I do is to look at reviews of the product, and test out things on evaluation versions of the product. From what I have seen, object databases can be very fast to use in some situations. Perhaps one of the reasons is that these days we are in a position to allow software to automatically optimize things for us - things that look very complex to do manually can be done automatically.

    But anyway, thanks for the honesty, and the debate!
  61. marketing BS or magic[ Go to top ]

    Steve says,
    Relational databases are not designed to support these [Objects]. Relational databases are designed to support inter-related sets of regular data: Sets of Tables.
    and
    I would suggest that it pointless to continue to insist that relational databases are object oriented unless you can provide evidence that full-featured object orientation was specifically designed into relational theory. A quote from Codd or co-workers indicating this would be useful.

      Tables are not necessarily the only way to implement relational theory. Actually, physical implementation is not a part of relational theory.

      It is true that relational databases were not designed to support objects. What Chris Date and Hugh Darwen have said in several articles and a book, I believe, is that types and sub-types are orthogonal to relational theory. That is, they do not conflict with relational database theory in any way. Any vendor who wishes to implement types and sub-types as in OO languages is free to do so without violating relational theory. They do have some questions and concerns about the exact meaning of the term inheritance, I think.
    I would also suggest that it is pointless to continue to state that object databases are 'BS' without providing real evidence that they are currently failing to provide useful persistence solutions.

    My questions on handling the many to many relationships in OODBMS yielded several responses that would have violated data integrity. Most respondents simply refused to see the problem at all. Only one person, Robin, provided a valid solution. That leads me to think that such problems are being overlooked in actual implementation, too.

    How likely is it that somebody who tried to use OODBMS and was not successful would stand up and shout from the rooftops about their failure? As we go on, we may see failures. Of course, it still does not imply that the OODBMS was faulty, maybe the implmenters did not understand the features well enough.

       Consider the XP hype. (Disclaimer, I like some parts of XP, not all.) How many people know that the original Chrysler C3 project quoted in Kent Beck's book on XP failed to meet its target of handling the payroll of a 100,000 employees, was scaled tdown to handle about 10K employees, and then terminated?

      To know of OODBMS failures, if any, we'll have to wait and see.

    Ravi
  62. marketing BS or magic[ Go to top ]

    Actually, PostgreSQL lets you inherit features from other tables. Thus, they implement their version of inheritance and do not violate relational theory in doing that.

    Ravi
  63. marketing BS or magic[ Go to top ]

    Steve said
    increasingly databases are often being used little more than a robust multi-user way to store data,

    Replace multi-user with multi-application and that becomes true and always has been true.

    With RDBMS data was modelled from an enterprise viewpoint and multiple application could access the same data.

    When building an application using OO tools, and then storing it in an OODBMS, you now have preference for the view used in the application.

      Another application that needs to use the same data will have to transform this data into the taxonomy that it desires. Often that will not be easy. In fact, it may turn out to be too hard. Generating reports that access millions of records and then filter through them in application code, for example, usually proves costly in terms of time and effort.

       As the discussion here has shown, in OODBMS you can enforce data integrity at great effort.

    What Robin showed was the code for inserting a customer-account relationship and keeping it synchronized. You'd need to write code to handle deletes and updates. Consider the fact that there can be many such relationships in an application. You end up with quite a bit of effort. Then, too, the fact remains that the bidirectional relationship introduces a redundancy of data. Duplication of data is bad. Inevitably, somebody will change the original class code and introduce data discrepancies.

      Using a relational database, enforcing these types of relationships is relatively easy, practically efforless. All this because there is no data duplication!

      The argument that fetching a coarse grained object like customer will be very dificult if the data is stored in a relational database is false.

      In OO, the hierarchy is implicit. If relational databases, if you need a coarse grained object, you'd write a view with underlying functions to fetch the related data to as great a hierarchy depth as you want. Since this view would be executed for a particular customer, for example, we'd know the customerId at this point. Therefore, the primary key index would be used. Traversing all the way down the hierarchy we'd end up using only the primary key indexes of the tables involved. Performance, then, would not be an issue at all.

    Hence, in a database with, say, a million customers having 5 million accounts and 100 million transactions, the response time would most likely be less than a second. Acceptable for most applications, I'd guess.

      Having heard the arguments in favour of OODBMS, and having a good knowledge of relational database theory and practice, I am not inclined to use OODBMS for my data storage needs. OODBMS gives me no advantage in terms of storage or retrieval, seems like more work to do the routine tasks, locks me into the application defined taxonomy (classification) and the query language is less expressive and way more verbose.

    In short, anything that an OODBMS can do, an RDBMS can do at least as well, if not better.

    Ravi
  64. marketing BS or magic[ Go to top ]

    Using a relational database, enforcing these types of relationships is relatively easy, practically efforless. All this because there is no data duplication!
    No data duplication? Depends on you define data. If you store the key from one record in another table, you are duplicating data. If you don't consider non-business keys data, you still are duplicating something.
    Replace multi-user with multi-application and that becomes true and always has been true.
    The problem is that multi-application databases are very expensive to maintain and difficult to coordinate changes. Sometimes it is the lesser of two evils. But I don't think it is needed as much as it is done. There are excellent ways around the issues you've stated.

    There is much more to the equation than what you are focusing on. I'm not saying I like any of the current OODBMSs but saying that the concept is wrong, well is short sighted.

    Duplication of code is just as bad if not worse then duplication of data. Intergrating at the database almost always requires duplicated code even if one uses stored procs.
  65. marketing BS or magic[ Go to top ]

    There is much more to the equation than what you are focusing on. I'm not saying I like any of the current OODBMSs but saying that the concept is wrong, well is short sighted.

    Mark, I am not saying that the concept is wrong.
    I'm saying that, looking at OODBMS, I do not see anything superior to relational databases. If OODBMS meets your needs, please use them for all I care. I feel that relational databases are better, so I'll choose to use them, whenever I have the choice.
    If you store the key from one record in another table, you are duplicating data.

    Not quite true. If my business requirement says that an account must belong to at least one customer, then the customer is a part of the account record. Instead of storing everything about the customer in the same row, we store a refernce to the customer in the account table.

      Even in an OO implementation, when you store a list of customers for an account in the Account class, you have an implicit ObjectID for each customer that is retained during the runtime of the application. When you want to save the data, you must either store the ObjectId (not a good choice, since runtime objectIds may be transient) or create a new identifier and store it.

    That is not data duplication but an essential element of the business rule.
    The problem is that multi-application databases are very expensive to maintain and difficult to coordinate changes.
      The datawarehouse world has the concept of corporate hub that are intended to store enterprise level data accessible to various data-marts (applications) that can be spawned from this single repository.

    Ravi
  66. marketing BS or magic[ Go to top ]

    Not quite true. If my business requirement says that an account must belong to at least one customer, then the customer is a part of the account record.
    Whether it is required by business or not - there is duplication of data no matter how small. I'm not aware of any RDBMSs that allow you to store a "reference" to another record in another table. You have store the PK from that table as an FK in the other table. That is a technical requirement.
     The datawarehouse world has the concept of corporate hub that are intended to store enterprise level data accessible to various data-marts (applications) that can be spawned from this single repository.
    And they very expensive and very difficult to change.
  67. marketing BS or magic[ Go to top ]

    Mark said,
    I'm not aware of any RDBMSs that allow you to store a "reference" to another record in another table.

    That is perfectly true. If any database allowed pointers (or references) to rows, Codd would be upset. Not having pointers, but only using keys, allows the DBMS to optimize storage without having to physically change the pointer in every table that references a given table.

    The fact that customer information is carried over into another table, is definitely duplication of a data element from the perspective you are looking at, but is also a business requirement. The only reason we build applications, is to meet users' business reauirements!

    Even with OODBMS, internally you will end up either:

    1. using objectIds or GUIDS or whatever, an analog of foreign keys; or

    2. store the relationship as a hierarchy with all its attendant pitfalls.

    Ravi
  68. marketing BS or magic[ Go to top ]

    The only reason we build applications, is to meet users' business reauirements
    Depends on the business you are in, but for the greater percentage - true.

    Business requirements could also be that we don't spend tons of money on the application(s). And duplicating logic in PL/SQL and Java and C# and Crystal Reports and some OLAP tool and PHP and ... doesn't go to meeting that business requirement.
  69. marketing BS or magic[ Go to top ]

    In short, anything that an OODBMS can do, an RDBMS can do at least as well, if not better.

    No. It is well established that there are many things that an OODBMS can do far better that RDBMSes. RDBMSes are superb for general purpose use, but in some cases, OODBMSes can retrieve data far faster. This can be seen by the situations where object databases are used: often in situations where very high volume of data retrieval is required - stock exchanges or booking systems. One of the most interesting examples is in particle research, in labs such as CERN, where extremely high volumes of data (hundreds of terabytes) need to be stored and retrieved very rapidly (tens of megabits per second).

    So, your statement is clearly false.
  70. marketing BS or magic[ Go to top ]

    One of the most interesting examples is in particle research, in labs such as CERN, where extremely high volumes of data (hundreds of terabytes) need to be stored and retrieved very rapidly (tens of megabits per second).

    So, your statement is clearly false.
    Steve, here's a link to a site that list users of large databases. http://www.wintercorp.com/vldb/2003_TopTen_Survey/TopTenWinners.asp

    Please note that there are several databases that are multi TB size.

    The fact that CERN, or some other organization, chose to use a OODBMS does not imply that RDBMS could not have done the job.

    By the way, has any study been conducted that tests the performance of RDBMS vs OODBMS on similar architectures for a wide variety of database sizes? And this study should not have been sposored by a DBMS vendor of any sort.

    Ravi
  71. jo[ Go to top ]

    The fact that CERN, or some other organization, chose to use a OODBMS does not imply that RDBMS could not have done the job.

    It does - CERN have some of the best IT people in the world, and the criteria they used for the choice of an OODBMS are available on the web.

    The reason they chose an OODBMS is that they had a phenomenal volume of data (now in the 100s of PetaBytes) that had to be kept live and was constantly changing in structure (new classes of data had to be added). They assessed the RBDMS option and came to the conclusion that it would not give the required flexibility and performance.
  72. marketing BS or magic[ Go to top ]

    RDBMSes are superb for general purpose use, but in some cases, OODBMSes can retrieve data far faster. This can be seen by the situations where object databases are used: often in situations where very high volume of data retrieval is required - stock exchanges or booking systems. One of the most interesting examples is in particle research, in labs such as CERN, where extremely high volumes of data (hundreds of terabytes) need to be stored and retrieved very rapidly (tens of megabits per second).

    Presumably you're alluding to CERN's ROOT OODBMS. If I understand correctly, like EJB-2, ROOT imposes a persistance base class on the model. And the Qt GUI library is an important part of the ROOT API; very strange!

    Could the reason that OODBS is faster than RDMBS be because pointer swizzling is faster than ODBC un/marshalling?
  73. marketing BS or magic[ Go to top ]

    Relational databases are like strictly typed languages - such as Java - all hard-coding of rules. Object databases are more like dynamic languages

    I disagree, Steve.

      When I change a class's variables, I (or a tool) must write the getters and setters for the variables added, remove the methods for the variables removed, and update the ones for the variables changed.

      Any code that has method calls using old signatures will not be valid. That is static-type enforcement. Any and every Class that uses these methods will have to be checked and updated.

      Whereas, when I modify a table, I can start writing code like select new_column_name from table_x immediately.

      For queries, nothing that I know of is more flexible than relational database and SQL, imperfect as it is.

    Ravi
  74. marketing BS or magic[ Go to top ]

    Relational databases are like strictly typed languages - such as Java - all hard-coding of rules. Object databases are more like dynamic languages
    I disagree, Steve.  When I change a class's variables, I (or a tool) must write the getters and setters for the variables added, remove the methods for the variables removed, and update the ones for the variables changed.  Any code that has method calls using old signatures will not be valid. That is static-type enforcement. Any and every Class that uses these methods will have to be checked and updated.  Whereas, when I modify a table, I can start writing code like select new_column_name from table_x immediately.   For queries, nothing that I know of is more flexible than relational database and SQL, imperfect as it is.Ravi

    You are working very hard to find all possible problems with object databases!

    Modifying existing structures in databases, be they classes or tables, have the same consequences.

    If you modify a table structure, all code that uses that table structure has to be changed, and verified. All code that uses the old table 'signature' will be potentially invalid.

    Sure, you can immediately write new code, but what about all the old code?

    With object databases, I can extend structures using subclassing, leaving existing classes and instances of those classes intact, so I don't impact existing code. That is one of the key benefits of the object model: Inheritance. And, of course, even old code can use my new classes because of another benefit of the object model: polymorphism.
    For queries, nothing that I know of is more flexible than relational database and SQL, imperfect as it is.

    Well, I consider that there may be things more flexible and powerful than the technologies I personally know about! I don't assume that my limited knowledge covers all aspects, present and future, of IT and computer science.
  75. marketing BS or magic[ Go to top ]

    Modifying existing structures in databases, be they classes or tables, have the same consequences.

    Yes, but the getters and setters are extra work that I do not have to do when using a relational database. Without getters and setters, in Java, C++, at least, the data is not accessible. That is the point I was trying to show.

    Ravi
  76. marketing BS or magic[ Go to top ]

    Modifying existing structures in databases, be they classes or tables, have the same consequences.
    Yes, but the getters and setters are extra work that I do not have to do when using a relational database. Without getters and setters, in Java, C++, at least, the data is not accessible. That is the point I was trying to show.Ravi

    A. Not true (at least not for Java and C#).
    B. In OO you shouldn't be accessing "the data" anyway.
    C. If you are going to use Java, you are going to have to write objects of some sort and if you do any sort of OO (with or without OODB) you will be creating fields in objects. (If not, why are you here? - (just wondering) )
  77. marketing BS or magic[ Go to top ]

    ... the getters and setters are extra work that I do not have to do when using a relational database. Without getters and setters, in Java, C++, at least, the data is not accessible. That is the point I was trying to show.Ravi
    A. Not true (at least not for Java and C#).B. In OO you shouldn't be accessing "the data" anyway.C. If you are going to use Java, you are going to have to write objects of some sort and if you do any sort of OO (with or without OODB) you will be creating fields in objects. (If not, why are you here? - (just wondering) )

    Hmmm. Interesting. Are you suggesting that the fields be made public? Then you definitely do not need getters and setters! I though Java and C++ idiomatic usage was to create private fields only, then provide getters and setters. Maybe you do it differently.

       In OO, or otherwise, you have to access the data when you are dealing with what are essentially data structures. How else will a different application, not envisaged originally, have access to the various fields that it needs access to? The Visitor pattern?

       Mark, I did not realize that I was talking to a Java and OO newbie who wanted everything spelled out for him. Thanks for letting me know.

    Ravi
  78. marketing BS or magic[ Go to top ]

    Interesting. Are you suggesting that the fields be made public?
    Nope. If you talk to pure OOist you won't have any accessors.
    I though Java and C++ idiomatic usage was to create private fields only, then provide getters and setters.
    True, that is the way it is usually done.
     Maybe you do it differently.
    Only when I have to - ie when I use a RDBMS to store non-transient objects.
    In OO, or otherwise, you have to access the data when you are dealing with what are essentially data structures.
    In OO, you aren't accessing "data".
    I did not realize that I was talking to a Java and OO newbie who wanted everything spelled out for him. Thanks for letting me know.Ravi

    Hmm. Odd how you just called Steve down for this.

    I am not an OO newbie. I am saying that you can access fields (private class variables) in Java without accessors. Right or wrong - you can. I didn't know I was talking to a Java newbie. :)
  79. marketing BS or magic[ Go to top ]

    Well, I consider that there may be things more flexible and powerful than the technologies I personally know about! I don't assume that my limited knowledge covers all aspects, present and future, of IT and computer science.

    Steve, re-read my original mail, and stop being so nauseatingly arrogant and supercilious.

    I had said, For queries, nothing that I know of is more flexible than relational database and SQL, imperfect as it is.

    Your response to that is childish. Grow up!


    Ravi
  80. marketing BS or magic[ Go to top ]

    Steve, re-read my original mail, and stop being so nauseatingly arrogant and supercilious.
    Sorry Ravi, But it seems most of your posts have been that way.
  81. marketing BS or magic[ Go to top ]

    Well, I consider that there may be things more flexible and powerful than the technologies I personally know about! I don't assume that my limited knowledge covers all aspects, present and future, of IT and computer science.
    Steve, re-read my original mail, and stop being so nauseatingly arrogant and supercilious.I had said, For queries, nothing that I know of is more flexible than relational database and SQL, imperfect as it is. Your response to that is childish. Grow up!Ravi

    I do apologise. I try and avoid causing offense. However, I also avoid stating that technologies that I know about are also the best possible solution in all cases.
  82. marketing BS or magic[ Go to top ]

    Enforcing FK rules is what gets nasty in single table mapping.
    Yes, it is not easy and I make a lot this kind of mistakes myself, but it is possible to validate and fix this stuff. We need to lean any stuff to make it performant and safe, I do not believe in easy ways and I think it is silly to marketing BS or magic.
    What I still don't understand is why you consider anything that isn't relational to be silly, BS or magic, when there is clear evidence on this thread that alternative strategies for persisting data not only work, but work well for large volumes of data.I see a parallel here: it is between strictly typed and dynamic languages. I have used both, and I know that both strategies can work very effectively.Relational databases are like strictly typed languages - such as Java - all hard-coding of rules. Object databases are more like dynamic languages - like Smalltalk - with many of the rules in code in classes. (With some object databases, these rules are stored in the actual database, so the parallel is not perfect). Both strategies work: both can provide scalability, transactional integrity, data integrity, secure multi-user access etc. What is puzzling me is this: No competent Java (typed) or Smalltalk (dynamic) developers would, these days, call the other language 'silly', 'BS' or 'magic', as large-scale long-term enterprise projects have been developed in both languages. There may be some technical arguments either way, but the volume of use of both languages proves the effectiveness of both strategies. So why, given the plain fact that object databases not only work, but many companies and organisations are very happy with them, are some relational people hand-waving it away as 'just marketing', and trying to shoe-horn all possible data structures and storage requirements into relational stores? I just don't understand it.And now, some controversial views coming up!Finally, it is my belief that there is a shift in the way information is being handled in IT - increasingly databases are often being used little more than a robust multi-user way to store data, and not as the centre of actual processing. There are still huge numbers of PL/SQL coders, but even the largest database vendors are accepting that the way data will be increasingly defined in the future is in terms of objects and their relationships. Developers won't want to have to spend ages trying to model their data structures and relationships in a mathematically perfect relational form just to be able to store some object instances in a robust and shared store.Object databases are a natural, low-impedance match solution for this kind of use, and I think they will have a bright future. This is why I am so keen on store-agnostic technologies like JDO, in contrast to the pure-relational EJB 3.0, which is a backwards step - an attempt to hold back for a while the tide of non-relational methods of persistence.

    Absolutely beautiful post!!
  83. Henrique says:
    Class customer has e-mail field, and inherits from class person, which has name field. Map these 2 to the same table in RDBMS, PERSONS, and have the following integrity check applied to it: customer's email should not be null. It is doable, using triggers. Now add another type of person to the hierachy, like DEPENDENT inherits from PERSON, with a mandatory relationship to yet another type of person (EMPLOYEE with not null DEPARTMENT field), but which can not be to a CUSTOMER. If you map all this hierarchy to a single table, things will get preety messy.

    That is interesting. Because we would realize that Employee, Customer, Dependent, etc., are roles that a "Person" plays. Hence we would not have a hierarchy at all. A Person can be a customer at one time and an employee at another point of time. A relationship (such as a marriage, or even a child-guardian dependency) can change over time. Hence, these are roles that people play, they do not fall into the same hierarchy as a Person.

    My object model would have a Person play one or more roles. No inheritance, hence no database/tables issues that you describe.

    If I wanted to create a separate table for customers, I could do so and define the email column to be not null. The relationship between person and customer can be embodied in a person_role table with attributes person_id, role_type, role_effective_date, role_end_date, etc. Note that the person_role table would provide the extensibility you need when new roles are added. Code changes would be limited to those required for the new role, and would be orthogonal to existing code. That is, existing code would be unaffected, requiring no regression tests.

      And I believe competent data modellers (whether of the OO variety or relational variety) would agree that using roles is the way to go. Using a hierarchy in this case locks you into a rigid structure which is difficult to modify.

       Which is another interesting aspect of OO. We tend to view the whole world as a hierarchy of objects. Somehow, I feel that there may be an exception here or there.

    Ravi
  84. My object model would have a Person play one or more roles. No inheritance, hence no database/tables issues that you describe.
    Ok, my example was very stupid, of course it would be better to model it as roles. But I have a more complex case (a real one BTW) where I work, which is a telecom company. It happens that we need to have an equipment inventory. Each equipment type has its own set of attributes with many specializations and so on, in a complex hierachical structure. Some of it can be modelled as roles the equipment plays on the network, but some are really intrinsecal to its specialization structure, which BTW is rigid and stable (a radio antenna is a radio antenna, no matter what! ;).

    Anyway, regardless of the specific details, sometimes one just HAS to model hierarchies in RDBMS, and with it comes all the problems I described previously, if one adopts the single table mapping option.

    Regards,
    Henrique Steckelberg
  85. I am not expert in this domain, but probably there is nothing wrong to use network or hierachical databases if you have a reason. We use Lotus Notes user myself, there are problems to get custimized reports form this stuff, but probably it was not a priority to make this decision. Probably it can be too expensive to model it in relational way, it is not easy, but it can save a lot of money later for ETL stuff.
  86. Henrique, you wrote
    It happens that we need to have an equipment inventory. Each equipment type has its own set of attributes with many specializations and so on, in a complex hierachical structure. Some of it can be modelled as roles the equipment plays on the network, but some are really intrinsecal to its specialization structure, which BTW is rigid and stable (a radio antenna is a radio antenna, no matter what! ;).

    Now I am curious. Could you give me some more details so taht I can llok into it? Is it more like a product-component parts structure?

    Most datamodellers do not know how to handle this in a good way. So I would like to give it a try.

    Ravi
  87. I have a more complex case (a real one BTW) where I work, which is a telecom company. It happens that we need to have an equipment inventory. Each equipment type has its own set of attributes with many specializations and so on, in a complex hierachical structure.

    Good example, Henrique.

    For mapping this in an RDBMS, I'd recommend using a single 'trunk' table to store equipment ID, type and maybe name/ description for the entire catalog. Specific classes would then map additional 'fact' tables joined from the trunk.

    The intent here is to identify the fact tables coarsely; to avoid deep joins or indeed join depth >1; and map similar facts into the same fact table.

    When you consider the use-cases, general queries should leave excessively-specific data unfetched. Thus for most use cases, you should be querying the trunk plus 1-3 further facts.

    eg
      TrunkTable( id, type, name, desc)
    plus 1-3 of
      AntennaFact
      AntennaType3Fact
      OtherEquipFact
      ...
      FixedLocationFact
      MobileEquipFact
      MaintenancePlanFact
    etc

    As you see, some of this is segmented by physical type of the equipment, some of this is segmented by role. But query use cases will typically be either specific within a certain few segments (maintenance query) or truncated at a general level of detail (all equipment, fixed location equipment).
    Anyway, regardless of the specific details, sometimes one just HAS to model hierarchies in RDBMS, and with it comes all the problems I described previously, if one adopts the single table mapping option.

    Definitely. But you definitely don't want to implement your class heirarchy as fully-exploded join tables, or unioned tables, either.

    This approach to mapping (Trunk & coarse Fact segments) is really an analog of the Composite design pattern. This is known as one of the most effective GoF patterns and here we're compositing our specific class mappings from a simplified structure of Fact tables.

    Thanks for the great example! Let me know what you think.


    Regards,
    Thomas Whitmore
    www.powermapjdo.com
  88. Thomas and Ravi,

    that is preety much how we are dealing with it. We have created a table with all the common equipment attributes, like vendor, partnumber, ID, site, etc., and specific tables for each equipment type, containing the their specific attributes. This way we are able to make totalization queries on the general table (like, how many equipments there are on this site?) and still be able to query for some specific equipment's information. We are keeping this as flat as possible (1 level only), to avoid many levels of inheritance, which would complicate matters a lot.
    The previous legacy version of this system had one table for each eqpt type, without this "general" table, so it was very hard to make queries which involved all eqpt types, we had to create huge outer join views, and performance was not so good.
    Whoever is interested in telecom process and data modeling, there is a good source of information on TMForum site, vendors have gathered together there in an effort to create a standard for telecom process and data model which are very recorrent in the industry. Java's OSS/J follows this effort too. Links:
    http://www.tmforum.org (look for SID information)
    http://www.ossj.org (APIs regarding telecom's operational support system)

    Regards,
    Henrique Steckelberg
  89. . We have created a table with all the common equipment attributes, like vendor, partnumber, ID, site, etc.,

    Henrique, what you have given is a statement of your solution. Rather, I'd like to know more about your problem. In short, what is it you are trying to model? Some details, please, in layman's terms only. No objects or tables should be involved.

    Ravi
  90. Ravi,

    Telecom equipment inventory involves storing information about location, configuration, construction (componentization), dimension (width, height, weigth), etc., for every equipment installed on the network.
    Most of this information is common to every type of equipment, but there are specific attributes which apply only to certain types (which depend on each installed equipment, like location) or model (which is the same for all equipments of the same model, like dimension).
    For example: antennas have polarization, elevation, frequency, position in tower; modems have baud rate, protocol; and so on. Some equipments have more than one level of specialization, like: omnidirectional antennas, point-to-point antennas and satellite antennas and so on, each with their own set of attributes.
    Componentization relates to the way equipments are mounted in bays, racks, slots, cards and ports (1 to many between each), although a few equipments don't fit into this structure, like antennas and desktop equipments.
    Besides, there is model information for each type, which can have specific attributes, similar to the equipments they describe. So some attributes belong to the equipment itself (ex.: location, ID, configuration) and some belong to the model information (ex.: dimension, vendor, partnumber) of this equipment.

    Given that there are hundreds of different types of equipments, thousands of vendors, each with hundreds of different models, and so on, it is easy to see how complex this can get.

    This is a quick description of what represents the domain of telecom inventory systems. I hope it is detailed enough for you, so have fun!

    Regards,
    Henrique Steckelberg
  91. radio equipment inventory[ Go to top ]

    And if the DOD is involved each LRU has a least 2 names and 2 abbreviations. I was in SATCOM in the AF. It can get crazy. In fact the first app I wrote was one to help us with our inventory. Didn't get it done before I got out. I also did aircraft maintenance in the AF and keeping track of parts there was just as fun.
  92. Thomas and Ravi,that is preety much how we are dealing with it. We have created a table with all the common equipment attributes, like vendor, partnumber, ID, site, etc., and specific tables for each equipment type, containing the their specific attributes. This way we are able to make totalization queries on the general table (like, how many equipments there are on this site?) and still be able to query for some specific equipment's information. We are keeping this as flat as possible (1 level only), to avoid many levels of inheritance, which would complicate matters a lot.The previous legacy version of this system had one table for each eqpt type, without this "general" table, so it was very hard to make queries which involved all eqpt types, we had to create huge outer join views, and performance was not so good.

    I am not sure but it looks like both ways are not optimal ("general" table and table per type ). As I understand (but I can be wrong) you have "one-to-one" relationship and need a join to get "general" and "specific" data anyway.
     I prefer to eliminate *all* "one-to-one" relationships, so it will be a single "large" table. Probably it will be more tables after normalization ( a good practice is to normalize after "concept" to "relation" transformation ).
    "Generic" table becomes just a view ( projection ) after this transformation and "concept" is not lost in logical level and queries too (it is possible to redesign it without breaking "legacy" applications !). It can be more complex to transform constraints, but I think it is better to spend more time for this stuff than to move conceptual model to DB directly (I found it is a mistake).
  93. As I understand (but I can be wrong) you have "one-to-one" relationship and need a join to get "general" and "specific" data anyway. I prefer to eliminate *all* "one-to-one" relationships, so it will be a single "large" table. Probably it will be more tables after normalization ( a good practice is to normalize after "concept" to "relation" transformation )."Generic" table becomes just a view ( projection ) after this transformation and "concept" is not lost in logical level and queries too (it is possible to redesign it without breaking "legacy" applications !). It can be more complex to transform constraints, but I think it is better to spend more time for this stuff than to move conceptual model to DB directly (I found it is a mistake).
    Having the "generic" table be a view had major impact in performance in a previous version of the system, as it involved tens of outer joins, which RDBs are not so fast at. Since we need 1-to-1 joins between general and specific tables only for 1 specific equipment ID, this is not costly.

    Regards,
    Henrique Steckelberg
  94. Having the "generic" table be a view had major impact in performance in a previous version of the system, as it involved tens of outer joins, which RDBs are not so fast at. Since we need 1-to-1 joins between general and specific tables only for 1 specific equipment ID, this is not costly.Regards,Henrique Steckelberg
    I see you do not understad it, you do not need any joins for this tada structure, it as simple as single table. Are you sure this is RDBMS fault ? Probably you designed this stuff in a wrong way it sounds as bug if you need 1-to-1 joins.
  95. Having the "generic" table be a view had major impact in performance in a previous version of the system, as it involved tens of outer joins, which RDBs are not so fast at. Since we need 1-to-1 joins between general and specific tables only for 1 specific equipment ID, this is not costly.Regards,Henrique Steckelberg
    I see you do not understad it, you do not need any joins for this tada structure, it as simple as single table. Are you sure this is RDBMS fault ? Probably you designed this stuff in a wrong way it sounds as bug if you need 1-to-1 joins.
    Reread my posts, I already explained why we did it that way (1 generic table plus 1 table per eqpt type), why having all eqpt types reside on one table only is bad, and why having a view representing the general eqpt data is bad, when using one table per eqpt type and no generic eqpt table.
  96. Hi guys,

      Thanks for a nice and interesting discussion. I enjoyed the give and take spirit of the discussions.

      This is my last post on this thread. While I may read this thread, I will not post any further replies.

      Bottom line, in the type of work I do, I am not convinced that using an OODBMS will give me any benefits compared to an RDBMS. Therefore, I will stock with RDBMS for now.

      By the way, in one small project, for various reasons, I decided not to use a database but XML files as data storage. I believe it is still in Production. (Use the right tool/technology for the job.)

    Ravi
  97. radio equipment inventory[ Go to top ]

    I enjoyed the give and take spirit of the discussions

    Me too.

    I want to apologise again - I try and sound detached and objective in my posts, but I realise that can seem aloof and arrogant in some cases - I will try and do better!
  98. Probably it is very special case, but there is no 1-to-1 relanship or generalization in relational model.
    This is a very brief and "pragmatic" introction, it can help to get difference between conceptual and relational model.
    http://www.utexas.edu/its/windows/database/datamodeling/index.html
    Probably you will start to like relational databases if you will understand this difference and the right way to map both models.
  99. Probably it is very special case, but there is no 1-to-1 relanship or generalization in relational model.This is a very brief and "pragmatic" introction, it can help to get difference between conceptual and relational model.http://www.utexas.edu/its/windows/database/datamodeling/index.htmlProbably you will start to like relational databases if you will understand this difference and the right way to map both models.
    Why are you assuming I don't know relational model, conceptual model or how to map between them? Seems like you are having a hard time understanding my english.
  100. Probably it is very special case, but there is no 1-to-1 relanship or generalization in relational model.This is a very brief and "pragmatic" introction, it can help to get difference between conceptual and relational model.http://www.utexas.edu/its/windows/database/datamodeling/index.htmlProbably you will start to like relational databases if you will understand this difference and the right way to map both models.
    Why are you assuming I don't know relational model, conceptual model or how to map between them? Seems like you are having a hard time understanding my english.
    Probably you know this stuff too good, if you see flaws in logical model.
  101. Probably it is very special case, but there is no 1-to-1 relanship or generalization in relational model.This is a very brief and "pragmatic" introction, it can help to get difference between conceptual and relational model.http://www.utexas.edu/its/windows/database/datamodeling/index.htmlProbably you will start to like relational databases if you will understand this difference and the right way to map both models.
    Why are you assuming I don't know relational model, conceptual model or how to map between them? Seems like you are having a hard time understanding my english.
    Probably you know this stuff too good, if you see flaws in logical model.
    Where have I said I see flaws in logical model?
  102. Prabably I misunderstand it and you know better about herarchies in logical model than me. Why do you ask me ? I do not know about OODBMS and I ask about this stuff for this reason. I do not think it a shame to learn, links are helpfull, google returns to many garbage (marketing).
  103. Ravi,

    I am not an expert, but here's an example (in a simplified syntax, code may not be correct or complete in some places):

    Class Customer {
      private Collection accounts;
      private String email;

      public Customer (String email, Collection accounts) {
        this.email = email;
        this.accounts = accounts;
      }

      public String getEmail () {
        return email;
      }

      public void setEmail (String email) {
        if (validateEmail())
          this.email = email;
        else
          throw new ValidationException();
      }

      public boolean validateEmail () {
        return (email != null && email.length() > 0);
      }

      ...
      etc
      ...
    }

    Class Account {
      private Collection customers;
      private String accountNumber;

      public Account (String accountNumber, Collection customers) {
        if (validateCustomers())
          this.customers = customers;
        else
          throw new ValidationException();
        if (validateAccountNumber())
          this.accountNumber = accountNumber;
        else
          throw new ValidationException();
      }

      public boolean validateCustomers () {
        return (customers!= null && customers.size() > 0);
      }

      public boolean validateAccountNumber() {
        return (accountNumber != null && accountNumber.length() > 0;
      }

      public String getAccountNumber () {
        return accountNumber;
      }

      public void setAccountNumber () {
        if (validateAccountNumber())
          this.accountNumber = accountNumber;
        else
          throw new ValidationException();
      }

      ...
      etc
      ...
    }

    These 2 classes represent your example, and they can show how one can enforce integrity on a OO system. You won't be able to have an account without at least one customer, email field must be "not null", same for account number. That's how integrity is maintained in OODBMS, AFAIK. You must code all integrity checks into your classes, these are the "Class Invariants" which will enforce integrity.

    Yes, they are not declarative, they are code. But some of RDBMS integrity checks are code too, in the form of procedural SQL, triggers and the like. I am not sure, but OODBMS could implement declarative integrity checks too, as Steve has pointed out already, it's not rocket science at all. The OO advantage is that you can use the same code both to maintain DB integrity and on UI form data validation. So if a new check comes up, you'll have to do it once, instead of on your RDBMS and your UI code, and risk have them not sinchronized by mistake (DB has one validation, UI has another).

    Robin have already provided an answer to how one query these objects, by unsing JDO for example.

    Hope I have helped clear things up a bit.

    Regards,
    Henrique Steckelberg
  104. Thanks for showing us the example.

    Just a question.

    Suppose I have created an account, a, and a customer, c. In account class I add the customer c to the account a.

    But I forget to add this account a to the customer object c.

    When I query the customer object and ask it to tell me how many accounts it has, I get the answer none.

    But if I ask the account objects to tell me what customers have accounts (we'll assume that the appropriate code can be easily written), I get back the answer that customer c has an account a.

    Now the same question asked of two different classes gives me different results. How do I know which answer is the correct one?

    That is what I mean by saying that it is not easy to maintain data integrity in an ODBMS.

    Note also that you now have the customer-account association in two places, once in the customer class and the second time in the accounts class. This duplication is the source of the data integrity error just pointed out.

    We can overcome this by coding in some checks to the like: (pseudo code)

       Assert (Customer c).getAccounts().hasCustomer(this)

    (Each and every account that belong to this customer must in turn have this customer in its customers collection.)
     
    and
       
       Assert (Account a).getCustomers().hasAccount(this)
    (Each and every customer that own this account must in turn have this accountr in its accounts collection.)

    Again, too much work for my taste. And possibility of cyclic coding.

       Going with RDBMS and a properly designed database takes care of all these types of issues for me. I can sleep happier at night knowing that my data can not be corrupted.

    Ravi
  105. Ravi,

    Class Bank {
      private Collection accounts;

      ...

      public void addAccount (Custome c, String accountNumber) {
        Account a = new Account(accountNumber);
        a.getCustomers().add(c);
        c.getAccounts().add(a);
        accounts.add(a);
      }

      ...
    }

    or something like the above, where addAccount method would be the only way one could add an account to a Bank object. One is as capable to "forget" to add an account a to customer c, as much as one is capable to "forget" to add the Foreign Key Check to the ACCOUNT_CUSTOMER table. This is something you will usually code once in your object model, similar to the example above, in a properly implemented object model.

    The point is, it may be more difficult to assert some kinds of integrity checks in OO systems than in RDBMSs, but at least you have seen that it is possible and doable, as you were doubting first. OODBs are as capable of maintaining data integrity as RDBMs, in some situations in an easier way, in others in a harder way.

    One example where OODBs may be better than RDBMS is when one needs to store hierarchical data. This kind of information is "natural" to OO, while in RDBMS you must go to great lengths to provide such structure and maintain its integrity properly. This like provides a nice explanation of this problem: http://www.oreillynet.com/pub/a/network/2002/11/27/bioconf.html.

    So it all boils down to good ol' "use the right tool for the job", as always.

    Regards,
    Henrique Steckelberg
  106. OODBs are as capable of maintaining data integrity as RDBMs, in some situations in an easier way, in others in a harder way.One example where OODBs may be better than RDBMS is when one needs to store hierarchical data. This kind of information is "natural" to OO, while in RDBMS you must go to great lengths to provide such structure and maintain its integrity properly. This like provides a nice explanation of this problem: http://www.oreillynet.com/pub/a/network/2002/11/27/bioconf.html.So it all boils down to good ol' "use the right tool for the job", as always.Regards,Henrique Steckelberg

    Yes, you are right. Network Databases have a practical value
    ( OODBMS is just a Network Database, is not it ? ). SQL engine layer is useless if application knows access path in advance ( it is not the case with interactive queries ). Many "large companies" use B-tree directly and it makes sence in some applications too (BDB is one of popular implementations), but I think it does not make any sence to hidde "the right tool for the job" using some "abstract" API and query language compromise.
  107. Henrique wrote:
    where addAccount method would be the only way one could add an account to a Bank object

    While that is true as far as it goes, what is to prevent a developer from adding code like this, three months after the Bank class was written?
    public class XYZ{
       public void addAccount(String acctNum){
          Account a = new Account("12345");
          a.addCustomer("C98765");
       }

    Now the Customer object has no idea that it should have an account with account# 12345!

       This can not happen in a database because there is only one place where the customer_account relationship is kept! So, no data duplication, no chance of a problem.

       The chance of not defining the foreign key is possible, but very unlikely. Would be caught very,very early in the design stage. It is almost like saying that somebody forgot to create the Account class in a banking application!

    The fact that the customer-account relationship must be stored in two places is the one that leads to the possibility of data duplication, and thence to data integrity problems.

    Only very strict project oversight and testing can ensure that the data integrity is maintained. Whereas, using a RDBMS, you define it once, and no application, none whatsoever, can violate it inadvertently.

    Why should I do more work and out project management processes in place to ensure data integrity when a simpler alternative is available? Anytime you require human intervention to guarantee data integrity, there is the possibility of inadvertent lapse in enforcement due to the nature f human beings.

      Define your data integrity requirements once, and then worry not! (Unless the nature of the business changes. But that is a different story and affects any technology used.)


    Also, isn't duplication of a concept (or data) a violation of the software principle: "Once and Only Once"?

    Ravi
  108. Ravi poses the following as a potential breach of bi-directional relationship integrity:
    public class XYZ{
       public void addAccount(String acctNum){
          Account a = new Account("12345");
          a.addCustomer("C98765");
       }

    Firstly, you are working with Strings and not Objects - nowhere do you create the Customer object.

    I've not studied Henrique's code (did he create the Customer inside his adCustomer() method?) but I believe the code which I suggested would not be open to misuse as posed above, at least not by any class outside the package which contains Account and Customer. I presume developers working in the same package as Account and Customer would have a reasnable understandng of the class/relationship structures therein.

    Cheers, Robin.
  109. Opinion: An index is not a book[ Go to top ]

    public class XYZ{
       public void addAccount(String acctNum){
          Account a = new Account("12345");
          a.addCustomer("C98765");
       }

    Lets assume that addCustomer creates the customer (or that you actually meant a.addCustomer(new Customer("C98765")) ) ... the problem you cite does not infact arise if addCustomer makes a reciprocal addAccount call on customer, thus customer does indeed know that it has a related account.

    Paul C.
  110. Yes, Paul. Robin has shown a nice and elegant way to do this. (I would probably make the protected method private, but that is a minor quibble.)

    Nevertheless, the fact remains that quite a bit of effort needs to be expended on this. And when you have several many to many relationships in a model, it can get a bit boring doing the same thing again and again. Not a big deal, though.

    Ravi
  111. Only very strict project oversight and testing can ensure that the data integrity is maintained. Whereas, using a RDBMS, you define it once, and no application, none whatsoever, can violate it inadvertently.

    So define the object model once and only once. Get it right. Put it into a separate project if you have to and deliver a model.jar for the rest of the application team to use.

    I tend to find that companies exploiting bi-directional relationships extensively are using "strict project oversight". Often this is realized as a code-generation approach to the domain object classes, based on a higher-level model. I also once built a bytecode enhancer which takes a simple metadata identifying pairs of collections or single-valued relationships and enhanced the object model classes to maintain integrity (an approach transparent to the developer), but never evolved it beyond the proof-of-concept stage (partly because of bad-vibes regarding class file enhancement at the time).

    I actually think that Java should have a native langage construct for defining relationships as bi-directional. Perhaps annotations will lead us there....
  112. Opinion: An index is not a book[ Go to top ]

    AOP would help with the oversight too.
  113. Opinion: An index is not a book[ Go to top ]

    Robin Roos wrote:

    So define the object model once and only once. Get it right. Put it into a separate project if you have to and deliver a model.jar for the rest of the application team to use.I tend to find that companies exploiting bi-directional relationships extensively are using "strict project oversight". Often this is realized as a code-generation approach to the domain object classes, based on a higher-level model. I also once built a bytecode enhancer which takes a simple metadata identifying pairs of collections or single-valued relationships and enhanced the object model classes to maintain integrity (an approach transparent to the developer), but never evolved it beyond the proof-of-concept stage (partly because of bad-vibes regarding class file enhancement at the time).I actually think that Java should have a native langage construct for defining relationships as bi-directional. Perhaps annotations will lead us there....

    Very cool Robin. The current vibrations are definitely getting better, so resurect that project!
  114. Only very strict project oversight and testing can ensure that the data integrity is maintained. Whereas, using a RDBMS...
    Whereas using a RDBMS only very strict database structure modelling and implementation can ensure that data integrity is maintained. The same ammount of attention should be spent to your domain model, be it in relational, OO, or any other technology.

    Yes Ravi, I agree it is as hard to provide two-way relationships integrity in OO as it is to provide hierarchical data integrity in RDBMS! :)

    Regards,
    Henrique Steckelberg
  115. agree it is as hard to provide two-way relationships integrity in OO as it is to provide hierarchical data integrity in RDBMS! :)
    provide hierarchical data integrity in RDBMS! :)

    That statement is inaccurate. Hierarchies are easily modelled using relational modelling. As a matter of fact, a hierarchical data model is one of the many types of modelling that can be done using relational data models.

    There is no question of data integrity problems with a properly modelled relational database, whether it involves a hierarchy or a network model, or something else.

    Ravi
  116. I am not questioning the fact that hierarchical model can be modelled in RDBMS. Yes, it can be done with no integrity problems whatsoever.

    I am questioning how EASY it is to achive a properly modelled hierarchy model in a relational DB. The link I provided in a previous post proves that it is not as easy as in OO. There are many ways you can do it in RDBMS, each has its pros and cons.

    Regards,
    Henrique Steckelberg
  117. I am not questioning the fact that hierarchical model can be modelled in RDBMS. Yes, it can be done with no integrity problems whatsoever. I am questioning how EASY it is to achive a properly modelled hierarchy model in a relational DB. The link I provided in a previous post proves that it is not as easy as in OO. There are many ways you can do it in RDBMS, each has its pros and cons.Regards,Henrique Steckelberg
    It is not easy, you need to learn it, but tt becomes very easy to maintain valid models.
  118. And I believe competent data modellers (whether of the OO variety or relational variety) would agree that using roles is the way to go. Using a hierarchy in this case locks you into a rigid structure which is difficult to modify.

    Absolutely!

    I can see where Henrique is trying to go, but he chose his example badly. Once an object is instantiated it cannot change its type, so you cannot use inheritance to model a role which is being played by an object since these roles change over time.
    Yes, Paul. Robin has shown a nice and elegant way to do this. (I would probably make the protected method private, but that is a minor quibble.)

    This is just a mis-type on Ravi's part, but for the record that method cannot be private because it is invoked by the collaborating class in the relationship. The only options are package and protected visibility. It cannot be public because it opens the objects in the package to abuse from classes outside the package.

    Kind regards, Robin.
  119. Opinion: An index is not a book[ Go to top ]

    A business transaction which takes place at a moment in time (e.g. ShareSale) or interval of time (e.g. CarRental) are good candidates for inheritance as they never change their type. A ShareSale is always a ShareSale. A CarRental is always a CarRental.

    So hierarchies such as "CarRental/TruckRental extends VehicleRental" and "ShareSale/SharePurchase extends ShareTransaction" are viable, whilst "Customer/Employee extends Person" is generally not.
  120. Suppose I have created an account, a, and a customer, c. In account class I add the customer c to the account a. But I forget to add this account a to the customer object c. When I query the customer object and ask it to tell me how many accounts it has, I get the answer none.
    ur
    In answering these questions we have to differentiate between an Object Database context and an Object-Relational Mapping context, as the observed behavior will differ between the two.

    Object Database Context

    The object database will faithfully store and retrieve your object graph (subject to it passing any constraints which may be in force). If you give the database an object graph in which Account a1 knows about Cutomer c1, but Customer c1 does not know about Account a1, then it will store it for you. If this was significant then you should (a) have coded the bi-directional relationship more carefully, (b) have asked the database to undertake bi-directional relationship management on your behalf (if it has such a feature), or (c) not used bi-directional relationships.

    Object/Relational Mapping Context

    Before you commit your transaction a1 knows about c1 but c1 does not know about a1. When you commit your transaction the intersection row is written to the Account_Customer table. There is only one place where this data is written, and the existence of this single row in Account_Customer implies that the account and the customer know each other. After commit, therefore, the a1's collection of Customers includes c1, and c1's collection of Accounts contains a1. (This may not be evident if your transaction isolation level permits dirty data reads.)

    This is a very well-known side effect of mapping objects to relational structures. It can be eliminated if the two collections making up the bi-directional many-many relationship are mapped to independent join tables. But the net effect of that approach is to render the collections unrelated and therefore not truely bi-directional (it becomes legitimate for a1 to know about c1, and for c1 to know about a2, which is not bi-directional). So one tends to map the two collections to the same relational structure (same join table) and live with the fact that bi-directionality is automatically established on commit().

    Putting Bi-Directional Relationship Management into the Model

    Personally I prefer my domain object models to behave identically when transient (e.g. in JUnit tests which do not use any database) and persistent. Therefore I think carefully before putting bi-directional relationships into the model, and always code the object model to maintain the relationship itself. So adding c1 to a1 causes a1 to add itself to c1 (in a non-recursive way!). To help facilitate this coding I use java.util.Set based relationships instead of Collection or List wherever appropriate to the domain.

    So you end up with code akin to:
    // Account class

    public void addCustomer(Customer c) {
        _addCustomer(c);
        c._addAccount(this); // establish reciprocal
    }

    void _addCustomer(Customer c) { // package visibility
        customers.add(c);
    }

    // Customer class

    public void addAccount(Account a) {
        _addAccount(a);
        a._addCustomer(this); // establish reciprocal
    }

    void _addAccount(Account a) { // package visibility
        accounts.add(a);
    }

    It is extremely difficult to define correctly the semantics of java.util.List based bi-directionality in which duplicates are allowed.

    Are these explanations helpful?
  121. Opinion: An index is not a book[ Go to top ]

    Ok, forget it. I see you do not have information about OODBMS too, probably it it doe's not exist and there is nothing to talk about.

    What a strange thing to say. I have already pointed you and others to plenty of information. Here we go again:

    http://www.versant.com
    http://www.gemstone.com

    are companies that provide ODBMSes. They are likely to have all the information you need. Or, you could use this wonderful tool called 'Google' to look up information.

    I'm flattered that you consider that I have such influence in the IT industry that unless I can personally provide you with detailed information about a technology, then that technology does not exist. I'm afraid to say that is not true.
  122. Steve,

      I think he is asking for the theory behind ODBMS. There is a theory behind RDBMS, basically Codd's relational algebra using Predicate Logic and Set Theory.

      I believe that all relational databases must follow (or attempt to follow) these rules. Absent these, they can not claim to be a relational database.

       Is there any such theory underpinning ODBMS? I believe that is the question.

       A vendor's implementation of object persistence does not qualify as a theory. Just like (for the sake of discussion) any major database vendor's claim that their product is what defines relational theory would be laughed out of existence, an ODBMS vendor can not claim that what they are selling defines the theory. There must be some sound scientific or mathematical basis for it.

    Ravi
  123. Steve,  I think he is asking for the theory behind ODBMS. There is a theory behind RDBMS, basically Codd's relational algebra using Predicate Logic and Set Theory
    Yes I am trieng to find something like this, but google fails to help me too. Marketing is a science too, but I am too stupid to understand proff by analogy.
  124. Yes I am trying to find something like this, but google fails to help me too. Marketing is a science too, but I am too stupid to understand proff by analogy.

    Ok, Juozas just special for you, as you seems acceptable person.

    Ok, so how store graph objects?

    Example:
        Customer
           Addess1
              Email1
              Email2
              ...
              Emailn
           Addess2
           ...
           AddessN
           ACCOUNT(SINGLE , many to many)
        Customer1
           Addess11
              Email1
              Email12
              ...
              Email1n
           Addess12
           ...
           Addess1N
           ACCOUNT(SINGLE , many to many)

    In Oracle, we have Customer table, Address table with f/k to Customer, Account table and AccountToCustomer many-to-many table.

    In ODBMS, we have Customer table, Address table, Account table. One again, table in ODBMS is just an analogy.

    Two really DIFFERENT things is -
    1. Foreign keys.
    2. Many-to-many table.

    1. Foreign keys - Customer and Address and Email.
    In ODBMS all tables Customer and Address and Email store ABSOLUTE POINTER to each other. This of course could vary from Vendor to Vendor - have reference from Email to Customer directly or in two hops via Address. Absolute pointer is something, which can be interpreted and read by ODBMS engine superior quickly. Something like specific 1st disk cluster of record or like. Also, because Address and Email (you tell that via configuration to ODBMS) can't be reused of another object type, it could be much more optimized to storage.

    3. Many-to-many - Customer and Account
    In ODBMS both tables Customer and Account store LIST OF ABSOLUTE POINTERS to each other. Because here many-to-many, generally this less efficient then 1., because some optimization can't be done.

    Because of these absolute pointers, ODBMS much quicker then RBDMS when retrieving data graph, almost like for Oracle retrieve records from single table without any joins.

    Ok, drawback is, when you try execute query with ADHOC joins, this queries would be far less efficient then Oracles, because Oracle is super-mega-optimized for adhoc queries.

    What is ADHOC here? Ok, imagine field AAA (type Date) in Email object and field BBB (type Date) in Account object.
    Query like:

    "select Customer c from Customer, Account acc where c.address.email.aaa = acc.bbb"

    would be inefficient.
    Yes, ODBMS vendors have optimizations for query above, so it not damaged bad, but Oracle spent for those things much more attention, so in Oracle it is quicker. How quicker? Here starting marketing fud.

    I leaving inheritance, constraints and other things out of here. My objective – to show basic.

    1. As you can see, because of mass absolute pointers ODBMS databases become much larger (from 2x to 10x easily) with the same data, then RDBMS.
    2. Now, I hope, it perfectly understandable now, who can SQL like language works for ODMBS.
    3. Also, now Ravi should understand that data integrity not violated here.

    Uff, you owe me bear
  125. Oracle optimizes this stuff using "Index Cluster Tables", it can be faster than direct pointers in file, It stores related rows in the same page and reduces I/O operation.
    Probably it is possible to find some exceptional case to motivate pointer navigation, but there is no performance problems with relationships. SQL optimizers are very smart too (it can select the best access path using statistics), I do not think it is a good idea to navigate pointers manualy, optimizer has more chances to find good plan automaticaly.
    Implementation details are interesting, but I am looking for formal Object Database model. There is a lot of information about relational databases, but I fail to find some usefull information about Object Database theory.
  126. BTW, this linked pointer model is known as "network" in theory. So probably you are talking about special kind of Network Databases.
  127. Even Larry Ellison himself has communicated publically that relationship management in OODB is better/faster than RDB. Also, announcements from IBM regarding their new U2 database line. Plus Microsoft's Bill Gates is touting the rebirth of OODB in their next generation WinFS platform. All RBD vendors tout Object-Relational capabilities and the question is why if relational model is fine for everything. The reason is that it is not, as things become more complex they degrade in performance, software systems in this world are becoming increasingly complex.

    BTW - The Versant OODBMS uses clustering of related objects to optimize disk I/O too. If an object A has a relationship with object's B,C,D then when the first reference to object A is resolved ( either by navigation or query ) and I/O brings it into sever memory .....so it does objects B,C,D. So, then you talk to object A and get B,C,D in a lazy fetch mode, then there is no need for further I/O operations because they are already in server memory. Of course, it is also possible to pre-fetch related objects so that you do not have to go to the server at all. Of course, there is also partitioning for I/O optimization.

    An OK book .... Object Databases In Practice ( Mary Loomis, Akmal Choudrey )

    Lots of misconceptions out there regarding object databases. Mostly because they have so many varied origins and drastically different implementations. Hard to find a relational database that is orders of magnitude faster/slower than the next one. They have generally close to the same performance ( 15% here or there under difference optimized situations ).

    Object datbases on the other hand have significantly different performance characteristics for different applications depending on their architecture. It is not uncommon to find one that is 100X faster than the other for different types of applications or ones that can scale to the TerraBytes and thousands of concurrent users while others fall over at 20 users and 20Gig. That is why they are often billed as "application specific databases" and is why you can also find them orders of magnitude faster then relational.

    Of course, there are the hash/btree/unique/multi-valued/multi-attribute indexes used by the server engine when you need to query instead of using an application specific model to process your business logic.


    You can find our Versant object database in Borland's and Oracle's product offerings for those very reasons.

    Robert Greene
    Versant Corporation
  128. Lots of misconceptions out there regarding object databases. Mostly because they have so many varied origins and drastically different implementations.
    So this discussion makes no sence because nobody us knows that we are talking about. Probably my object database definition using object stream analogy is valid too, is not it ?
  129. For the sake of argument, let us grant that the following is true (from Robert Greene):
    The reason is that it is not, as things become more complex they degrade in performance, software systems in this world are becoming increasingly complex.

    Well, if performance is a problem, and data integrity is not, you'd try to improve the performance, but would you do so at the expense of losing data integrity? All you'll end up with is data retrieved very fast but you'd have very little confidence in the data.

    When the users ask you why two different reports give two different values for the same query, please tell them, "But it ran really fast!", and see where that gets you.

    The fact that Oracle, IBM and Microsoft believe something does not make it true. In fact, they are in the business of selling newer things every few years. Witness Larry Ellison's claim a few years ago that the "Network Computer" is the thing. Witness IBM's refusal to admit that mainframes are not that important anymore. Witness MS continually producing shoddy products and initially ignoring the web a bit.

    They need to sell new things, and therefore will latch on to any new buzzword and try to market it. Management still believes in the philosophy "Nobody gets fired for buying IBM" or whatever today's buzzword is.

    The fact that IBM, Oracle's Larry Ellison and Bill Gates endorse ODBMS means that they feel that they can sell it to management, that is all.

    Incidentally, Bill Gates is the person who wrote in his first book, on page 295, I think, that a real technological breakthrough will be when somebody figures out how to factor large prime numbers! (I suggest that he first try to factor small prime numbers, and report the results to an eager mathematical world.) So, if Bill Gates endorses anything technical, I would take it with a pinch of salt, actually a barrelful of salt!

    Ravi
  130. Some links about OODB theory:

    http://www.cs.sfu.ca/CC/354/zaiane/material/notes/Chapter8/node1.html

    http://userfs.cec.wustl.edu/~cse530/2004/OODB%20Presentation.ppt

    http://portal.acm.org/citation.cfm?id=102675.102678

    Regards,
    Henrique Steckelberg
  131. Some links about OODB theory:http://www.cs.sfu.ca/CC/354/zaiane/material/notes/Chapter8/node1.htmlhttp://userfs.cec.wustl.edu/~cse530/2004/OODB%20Presentation.ppthttp://portal.acm.org/citation.cfm?id=102675.102678Regards,Henrique Steckelberg
    I know this stuff, but it is not a theory. This stuff are analogies.
  132. Some links about OODB theory:http://www.cs.sfu.ca/CC/354/zaiane/material/notes/Chapter8/node1.htmlhttp://userfs.cec.wustl.edu/~cse530/2004/OODB%20Presentation.ppthttp://portal.acm.org/citation.cfm?id=102675.102678Regards,Henrique Steckelberg

    Great find Henrique. Thanks.
  133. Opinion: An index is not a book[ Go to top ]

    "Just change RDBMS for ODBMS (Object database management system) and you get an equally correct statement"Probably it means ODBMS is the same thing. Relational database is an object database management system, it is object oriented and manages objects without problems too.
    I'm sorry but that makes absolutely NO sense!!
    Probably what he means is that you can use relational databases to store objects. Which, of course you can, because in principle you can use them to store anything.The statement that 'relational databases are object oriented ' reminds me of those tiresome C fans who insist that C is object oriented because you can sort of do object-type things if you are really clever with structures and pointers to functions.I think that IT is exciting because there is a constant sense of innovation and there is always something new to learn, so I find it sad when I encounter people who become skilled in one paradigm (such as relational databases) and then can't see anything beyond that paradigm, or become defensive - anything that doesn't match their skillset can't possibly be of use, and anyone who doesn't use their approach must be either ignorant or misguided.Relational databases are a vital technology in IT, but to say that they are the only way that data should ever be stored, and any other mechanism is flawed is a sign of rigid thinking and lack of experience, in my opinion.

    Well said Steve! I agree wholeheartedly!
  134. Steve says:
    when I encounter people who become skilled in one paradigm (such as relational databases) and then can't see anything beyond that paradigm

    Very interesting, that statement.

    Actually, it applies to the vast majority of younger programmers who know Java and believe that OO == Java, without realizing that OO was invented by Alan Kay, the creator of Smalltalk, in the late 60s. The idea for OO was obtained by realizing that cells transmit messages (using chemicals) to other cells. OO is about objects and messages.

    And also, Alan Kay has said words to the effect that "C++ and Java are not what I was thinking of when I developed OO and Smalltalk." He was not being complimentary of Java and C++.

    These newer entrants into the software field are the ones who know very very little.

    I have experience in relational theory and generally work in a System architect role using (gasp!) Java, JSPs, XML, XPath, XQuery, XSL, etc., designing web applications. I also know Python and a bit of Ruby. Have tried to learn Lisp, too.

    Yet I would never use an ODBMS for my persistence (storage). It just does not give me the flexibility and power I want.
    An analogy that may ne appropriate to you, Steve, is how you felt when you moved away from Smalltalk to Java. It takes too much effort to do things in Java that would be done far more easily in a dynamically typed language, like Smalltalk, Python, etc.

    The original poster, Charles Armstrong, claimed that relational databases are terrible for storage. He does not mention any problems that he faced. Just a blanket statement not supported by any other information.

    The inanity of this comment should be obvious. Especially given the fact that relational theory deals with the logical aspects, not the physical aspects of database design. If a RDBMS vendor has implemented storage in a manner not to Charles's satisfaction, he should complain to the vendors. But to suggest that others do so, too, is absurd.

    Most subsequent discussion then focussed around why Charles's statement is, or is not, valid. And why ODBMS can not guarantee data integrity while an RDBMS can.

    I agree with you, Steve, that it would be nicer if people were aware of other technologies and paradigms. Tell that to the Java == OO gang. Maybe some of them will start looking at Smalltalk and other languages then.

    Ravi
  135. Actually, it applies to the vast majority of younger programmers who know Java and believe that OO == Java, without realizing that OO was invented by Alan Kay, the creator of Smalltalk, in the late 60s. The idea for OO was obtained by realizing that cells transmit messages (using chemicals) to other cells. OO is about objects and messages.

    And also, Alan Kay has said words to the effect that "C++ and Java are not what I was thinking of when I developed OO and Smalltalk." He was not being complimentary of Java and C++.

    These newer entrants into the software field are the ones who know very very little.

    Actually the inventors of simula could also claim to have invented OO (I believe simula predates smalltalk).

    Making assumptions about how much other people know and what thier experience is, is a dangerous game in any online debate.

    Paul C.
  136. Relational theory is not guarantee of data integrity even when you use it properly. A good example of it is MySQL database.
  137. Relational theory is not guarantee of data integrity even when you use it properly. A good example of it is MySQL database.

    Interesting comment, that. As a matter of fact, MYSQL is not a relational database by any stretch of the imagination. Key concepts of the relational database concept include logical-physical independence, and the exisitence of integrity constraints, such as referential integrity (or foreign keys.)

    MySQL allows you to specify the format the data must be stored. This clearly negates any pretensions to relational theory claims that MySQL may have. Secondly, it does not implement foreign keys. Therefore, it is not a relational database.

       To use your argument, the fact that MySQL lets you violate integrity only bolsters my argument that anything that is not a relational database can not guarantee data integrity! Thank you, Erik, for helping me!

    As of date, we know of no mechanism other than a relational database that guarantees data integrity when used properly.

    Ravi
  138. Interesting comment, that. As a matter of fact, MYSQL is not a relational database by any stretch of the imagination. Key concepts of the relational database concept include logical-physical independence, and the exisitence of integrity constraints, such as referential integrity (or foreign keys.)

    First of all, are there true relational databases? I doubt. Considering that existing DBMS are close to be relational, and that MySQL claims Relational DBMS, we can't assume the theory is something we should take into account. That said, it's the developer the final responsability of data integrity.
  139. Anything that violates fundamental precepts of relational theory can not be considered as a relational database.

    While there is no truly relational database management system, the others do not violate these basic priciples. Hence, in a well designed Oracle, Sybase/SQL Server/PostgreSQL database data integrity will not, can not, be violated.

    Since MySQL violates these fundamental principles, it is not even a candidate for a relational database. It is no wonder that data integrity issues arise with MySQL.

    As for data integrity being a developer's concern: true to some extent. But it is primarily the concern of the database designer who ensures that the developer can not violate data integrity by the simple expedient of declarative database constraints.

    Ravi
  140. Steve Zara says:Why do you say that OO persistence can't guarantee data etc.? Of course it can! Could you please prove to me that it can? Or cite some reference where it proves that?Relational database theory is based on predicate theory and set logic. When used properly, they can guarantee the integrity of the data. Is there any analogue in the Object world? If you are storing everything in files, and somebody adds a meaningless record to the file, what then? .

    If you are directly persisting objects then your "intergrity constraints" become class invarients. If the class API maintains the class invarients then by definition your object model retains its integrity. If an object model works in memory then it will work (and retain intergrity) when persisted.
    If you are storing everything in files, and somebody adds a meaningless record to the file, what then? .

    If you bypass the primary API then you equally stuffed no matter whether you are bypassing the class API or the SQL interface.
    If somone tinkers directly with a rdbms datafile (rather then updating the DB using SQL) you are just as screwed as if they did the same thing with an OODB.

    Paul C.
  141. Opinion: An index is not a book[ Go to top ]

    If somone tinkers directly with a rdbms datafile (rather then updating the DB using SQL) you are just as screwed as if they did the same thing with an OODB.Paul C.
    Oh yeah, I had forgotten about as400 DB2 being stored in files that can be accessed directly. Tables are called files.

    I'm pretty sure I could write SQL that says "drop trigger/index", "insert/update data" , "create trigger/index". :0
  142. Hi Ravi

    Your concerns are legitimate if there is only one root object from which all queries but begin. However most object-based solutions allow a query to commence from any "extent". I don't have an academic definition of an extent, but in layman's terms an extent can be considered to be all persistent instances of a designated class. It is quite feasible for all major (non-embedded) classes in a model to have their extents made available for query.

    So in your Customer/Account/Transaction example, one query might "start from" the extent of Customer, another from Account, and another from Transaction, etc.
    Whereas, using a relational database you'd write a simple query such as select count(*) from transaction where tx_time between 'day1' and 'day2'

    And exactly the same concept is available from an object model perspective. I don't have a direct analog for your BETWEEN operator, but an equivalent query in, say, JDOQL, might begin with the words

    SELECT [COUNT(this)] FROM com.xyz.Transaction WHERE...

    The WHERE clause is expressed in terms of persistent fields of the Transaction class, references navigable from the Transaction class and, critically, collection fields of other classes which contain Transaction instances. This gives queries significant flexibility of purpose and application for any given domain model, usually without the need to add relationships to the model beyond those actually justified by the business domain.

    Adding to this the ability to nominate any extent as the starting point for the query, and your analogies with single-rooted hierarchy navigation break down.

    I need to be able efficiently to store Java objects and query for them again. I actually don't mind what the underlying paradigm of the datastore is, at least not as far as my Java code is concerned. It is conceivable that one datastore implementation might use an RDBMS as an index into a non-RDBMS repository. But ultimately I have specific characteristics of performance, scalability, accessibility (e.g. some DBs will be accessed by other clients which are SQL-based) and cost of ownership which are important factors. And yes, although most of my applications are underpinned by relational databases today, I value the architectural freedom which I have when my applications refer not to the database, but to the object model.

    Kind regards, Robin.
  143. Opinion: An index is not a book[ Go to top ]

    If you can predict all possible queries ahead of time the performance can actually be quite good

    There is a conceptual leap which many people (not Don!) are slow to make when thinking about queries against an object model instead of against a table structure. When selecting objects through an object model what you (typically) get back is objects. (Of course you can do aggregation/projection when necessary.)

    If your query gives you Transaction objects matching the filter criterea, then your application can usually ask those specific Transaction objects to provide the underlying property values required. This means the query only needs to traverse those reltionships (joins) necessary to get values for the fields which actually appear in the WHERE clause. There is no need additionally to provide the navigation expressions (joins) by which to populate each field of the "result set" since, unless you're using projection, there is no user-defined "result set" as such.

    Therefore many object model-based queries are substantially simpler than their SQL equivalents. Consider querying for all Transactions since last thursday and then displaying the transaction value (from the Transaction class) and the customer name (via the Transaction's "customer" property). The query for this might look like:

    SELECT FROM com.xyz.Transaction WHERE transactionDate > :lastThursday

    In this case :lastThursday is an incoming parameter of type java.util.Date. So the query has no need to involve Custmer fields which are not part of the WHERE clause, even though the code will subsequently invoke getCustomer().getName() for some or all of the matching Transaction instances.

    And by choosing the appropriate fetch groups the above query is executed such that all data required to support the getCustomer().getName() invocations are retrieved as part of the native database query (probably SQL) that the JDOQL was translated into. So you potentially have 1 SQL query, not N+1.

    Now I'm not going to state that every conceivable SQL query can be expressed as JDOQL - that would patently not be true. But if your object model is an elegant abstraction of the business many (if not all) queries pertinent to that business can be conveniently expressed in terms of the object model. It is not necessary to know in advance the complete set of queries that will be required.

    Of course the above discussion considers object model-based queries regardless of whether an ODBMS, RDBMS etc. is in use.

    Kind regards, Robin.
  144. Robin Roos says:

    So the query has no need to involve Custmer fields which are not part of the WHERE clause, even though the code will subsequently invoke getCustomer().getName() for some or all of the matching Transaction instances.


    Elsewhere, he says:

    Why do you say that OO persistence can't guarantee data etc.? Of course it can!

    How do you propose to link the customer to the transaction and simultaneously be able to get all the transactions for a customer? Are you planning to traverse from the customer tree to find transactions? And then also store customer reference in transaction?

      This implies a redundancy since the customer and transaction relationship is stored twice (at least). This type of redundancy is what a RDBMS avoids. What if an account is owned by two customers? Then will the transaction be a part of both customers? How will the counting of transactions be affected then?

    Now, as to the statement that using the Transaction Object will get the exact number of transactions in a given period, there are cases when the path chosen will result in different answers. If we do as Robin says:

    The WHERE clause is expressed in terms of persistent fields of the Transaction class, references navigable from the Transaction class and, critically, collection fields of other classes which contain Transaction instances.

    Then transactions may be counted several times if they happen to belong to joint accounts.

    There are many such issues that are not clearly resolved using Object databases. The accuracy of the data presented is in doubt. Hence such databases have limited use.

    Ravi
  145. Elsewhere, he says:Why do you say that OO persistence can't guarantee data etc.? Of course it can!

    That was me, not Robin.
    How do you propose to link the customer to the transaction and simultaneously be able to get all the transactions for a customer? Are you planning to traverse from the customer tree to find transactions? And then also store customer reference in transaction?  This implies a redundancy since the customer and transaction relationship is stored twice (at least). This type of redundancy is what a RDBMS avoids. What if an account is owned by two customers? Then will the transaction be a part of both customers? How will the counting of transactions be affected then?Now, as to the statement that using the Transaction Object will get the exact number of transactions in a given period, there are cases when the path chosen will result in different answers.
    The accuracy of the data presented is in doubt. Hence such databases have limited use.Ravi

    I don't know how object databases are implemented, as I don't write them. However, you are stating that they must have all kinds of problems which, in reality, they obviously don't. Object Databases are being used in large and critical applications.

    You are using something called 'argument by incredulity': just because you can't believe or understand how OO databases work and can be safe, you are stating that they can't work and can't be safe.

    I get the same feelings of disbelief every time I see a multi-hundred-ton Jumbo Jet lift off the runway!
  146. Steve, disabuse me of my misconceptions then by the simple act of rebutting my arguments rather than claiming that However, you are stating that they must have all kinds of problems which, in reality, they obviously don't.

    You have fallen into the pit you accuse me of: You are using something called 'argument by incredulity' You seem to be using the argument by credulity when you use the statement referred to in the previous paragraph.
  147. Steve, disabuse me of my misconceptions then by the simple act of rebutting my arguments rather than claiming that However, you are stating that they must have all kinds of problems which, in reality, they obviously don't. You have fallen into the pit you accuse me of: You are using something called 'argument by incredulity' You seem to be using the argument by credulity when you use the statement referred to in the previous paragraph.

    No, because what I say is based on evidence. You keep on and on saying what object databases can't do. I provide clear evidence that you are wrong. Here is more: The Chicago Stock Exchange uses Versant's object database, which handles high volumes of transactions. Do you think they would trust their data to a system that was unreliable or would corrupt data or give false results?
  148. I had written earlier:
    Such a thing can not happen in a properly designed database. Any attempt to add a transaction without associating it to an acocunt or customer, as the rule may be, will simply be rejected by the DBMS (Database management system) because it violates integrity constraints.

       Can your object databases assure me of this level of integrity? If they can not guarantee this level of integrity why should I replace RDBMS (Relational Database Management System) with whatever is being peddled as "better" than a RDBMS?

    Can any object database claim to enforce this level of integrity? That is my main argument against ODBMS.

    Any developer can write code that violates data integrity. Not so in a relational database.

    The fact that others have used ODBMS in critical applications only convinces me of the skills of snake-oil salesmen.
  149. XML and Object persisitence are essentially hierarchical file storage mechanisms.

    True, XML is by nature hierarchical but there are varioys methods to search an XML document and not having to go all the way from the root. Think of XPath expressions.

    In the end, I would say that there is enough room for all storage mechanisms being RDBMS, OODMBS or XML. It is just a matter of making good use of them depending on the problem domain.


    Markos Charatzas
    Network Applications Developer
    Research & Development
    FORTHnet S.A.
  150. Markos says:
    True, XML is by nature hierarchical but there are varioys methods to search an XML document and not having to go all the way from the root. Think of XPath expressions.

    XML, XPath, XQuery, etc., all use "record at a time" navigation. While you can express a condition using XPath, it still has to traverse the graph linearly, a record at a time. They actually do end up searching from the root. XPath is just a convenient way for you to express your desires. The XPath engine has to traverse the whole file.

      Granted that you can create indexes on data stored in XML format, what about the persistence of indexes? Do they have to be created every time a query is asked? or every time the file changes? What happens when you shut down the server and restart it? Do you have to recreate the indexes? When you are dealing with a really large sized data with millions of records, how long do you think indexing will take? And how much effort is needed when a record is added or deleted?
  151. XML, XPath, XQuery, etc., all use "record at a time" navigation. While you can express a condition using XPath, it still has to traverse the graph linearly, a record at a time. They actually do end up searching from the root. XPath is just a convenient way for you to express your desires.

    Really ? What prevents from having an XPath implementation that indexes the document before serving Xpath queries ?
  152.  How, then, do you find out how many transactions happened in a given period of time? Not so easy, is it? You'd have to traverse the complete list of Customers, possibly go through each account, then go through the transactions for each account and find out if it meets your criteria. Whereas, using a relational database you'd write a simple query such as select count(*) from transaction where tx_time between 'day1' and 'day2'    Any other viewpoint is easily handled using a relational database.

    The problemw with RDBMS is that all the data is expected to be tabular. Just to see what kind of problems this gets you in consider how would you implement a persistent queue of messages in an RDBMS. You need to add messages at the end of the queue, remove them from the front. Each message can be arbitralily long string of text.

    The enqueue and dequeue operations must be transactional, and a dequeue on an empty queue should suspend the caller until an element shows up.

    What's the SQL query to get the first element?

    ...richie
  153. Richie says:
    Just to see what kind of problems this gets you in consider how would you implement a persistent queue of messages in an RDBMS.

    Well, to save a message, just add it as a row of a table when you get it. If it is a structured message, with a From, To, subject, etc., then put these in the appropriate columns. Assuming that you have a way of identifying the current set of messages, and you have a system generated column called message_id, then the query is elementary.

    You should be able to figure it out given the hints. OK, here it is:

    select * from messages
        where status = 'active'
        having message_id = min(message_id)


    If, instead you are storing the message received time, then the query will be modified to:
    select * from messages
        where status = 'active'
        having received_timestamp = min(received_timestamp)




    The above is ANSI-SQL compatible SQL that works with all ANSI compliant databases.

    Storing the data is a completely different issue from manipulating it after retrieval. You are confusing the two issues when you talk of push and pop, and enqueue and dequeue.

    If you are talking about the push and pop operations, then you are talking about processing the data. Pushing and popping are elementary. Exactly the same logic that you use in your Java code would be used to do it in a pure-database application.

    Ravi
  154. Ravi wrote:
     You should be able to figure it out given the hints. OK, here it is: select * from messages     where status = 'active'     having message_id = min(message_id) If, instead you are storing the message received time, then the query will be modified to: select * from messages     where status = 'active'     having received_timestamp = min(received_timestamp) The above is ANSI-SQL compatible SQL that works with all ANSI compliant databases.Storing the data is a completely different issue from manipulating it after retrieval.

    You are assuming that "message_id" is increasing. If messages exist in a separate table, and each has its own ID, you can't assume that "message_id" in the queue is in any order.

    Using time is little better. But one othe requirements you may have is priority. Some messages have to be placed at the head of the queue, ahead of others. The pure timestamp does not work.

    Finally, there is an issue of concurrency. If two processes dequeue from the same queue I'd want them to get distinct messages.

     You are confusing the two issues when you talk of push and pop, and enqueue and dequeue. If you are talking about the push and pop operations, then you are talking about processing the data. Pushing and popping are elementary.

    Not sure what you mean here. Typically push and pop are used when talking about a LIFO structure (a stack), and enqueue and dequeue when we're talking about FIFO structure (a queue).

    I'm talking about queues. Queues have an implied order. A table has no order. So, implementing a queue in RDBMS is not straightfoward.

    ...richie
  155. Richie says: " But one of the requirements you may have is priority. "
    Sure, priority would be a part of the message and would be stored in a separate column of the database as such. While retrieving the query you would use "Order By priority" ascending or descending as your requirement may be.

    Richie says: "Not sure what you mean here."

    Here I agree my original post was not quite clear. What I meant was that the issue of storing the message is separate from the act of pushing and popping and enqueueing and dequeueing. The actions of pushing, popping, etc., have no bearing on the persistence of the message. They are application specific tasks.

    Again quoting Richie: Queues have an implied order.

    Certainly true. If we assume that the order of arrival of messages implies some order, then it too can be stored as a column in the database and retrieved in whatever way one wants. My guess is that no matter what technology is used to handle these processes, the ordering, implicit or not, will be known and can then be made explicit by storing in the database.

      Databases are generally designed to handle/facilitate all types of concurrency issues. Handling simultaneous request can be easily implemented using a database, or applciation code. Your choice. It has nothing to do with the persistence arguments which was the main topic of discussion.

    Ravi
  156. Ravi wrote:
    Databases are generally designed to handle/facilitate all types of concurrency issues. Handling simultaneous request can be easily implemented using a database, or applciation code. Your choice. It has nothing to do with the persistence arguments which was the main topic of discussion.Ravi

    Right, I think we agree on this.

    My point is that some types of data storage are much easier to implement with OODBMS than with RDBMS. The reason I brought up a queue, is because that's something I built. At the time, we were using Sybase with no row level locks, and some of the requirements did not seem easy to implement (especially concurrency).

    For a queue, we wanted to get the second element, if someone had the first element locked. Again, this did not seem possible to do within plain RDBMS.

    Then we implemented the queueing system on top of Versant OODBMS, with minimal fuss.

    ...richie
  157. For a queue, we wanted to get the second element, if someone had the first element locked. Again, this did not seem possible to do within plain RDBMS.
    It is. You either have to learn to work with Sybase or if it's not up to the task, something which I doubt, use something else. For example in Oracle:

    select * from msgqueue where [conditions] for update with nowait; // if lock already take this fails
    exception when [cannot lock exception]
      [move onto the next candidate]

    Sorry but I can't be bothered to produce valid pl/sql at the moment, on my way to work.
  158. Just to see what kind of problems this gets you in consider how would you implement a persistent queue of messages in an RDBMS.

    All I saw and keep seeing is benefits from doing that because I can have my data normalized and therefore I can access it in any way I want it, not in a way that the OO persistance dictates, and with application of data integrity rules that otherwise I have to implement rather than handle in the application logic.

    So I am all for implementing the MQ in RDBMS...

    And another thing...just consider serialization/deserialization of your objects in OO persistances, different platforms, API versions and etc., all that bites you in the ass when you deal with OO persistance.

    It is data for crying out load, it should be independent from all the technical hoopla around it. It should be accessible by anything at anytime and it should not be reduntant.

    All that is reversed in the OO persistance by spec data duplication, platform/version lockins, API availability lockins, proprietory QL, proprietory data types and structures and etc.

    OO persistance is fast, but maintenance nightmare that it carries with itself does cancel the scalability feature out.

    Cheers,

    Artem D. Yegorov
    http://www.activexml.org
  159. Not all data firs into tables[ Go to top ]

    It is data for crying out load, it should be independent from all the technical hoopla around it.
    You mean like a fancy RDBMS? :)

    So we need to have transient and non-transient "data" that has integrity and relationships and validity and attributes and variable attributes and periods of existence and ... all without technology [hoopla]?

    One man's hoopla is another man's hooya.
  160. Not all data firs into tables[ Go to top ]

    I do not understand "fancy" as anything that describes a technology, so I do not know what you mean by a "fancy RDBMS". The point that I wanted to make is data should be accessible from your presistance regardless of what you are using to access it, be it your custom application, third party query tool, reporting engine and etc.
    So we need to have transient and non-transient "data" that has integrity and relationships and validity and attributes and variable attributes and periods of existence and ... all without technology [hoopla]?

    That's it. Without any custom technology that a developer puts around data like DAO, accessors, queries, reports and other application specific stuff. One should model an application after data, since data is what most likely survive longer than any application used to access it. A data persistance modeled after an appliaction tends to fail as it changes with every small revision of such application and when a major change comes around, the migration becomes either a nighmare or something that is utterly impossible.

    Your mocking of being proprietopry to a given RDBMS implementation has no grounds because there are plenty of middle-ground solution that are cross-RDBMS compliant and still retain integrity of their data.

    OO persistance implies locking into a specific platform or redundant specification for cross-platform compatibility.

    Cheers,

    Artem D. Yegorov
    http://www.activexml.org
  161. Not all data firs into tables[ Go to top ]

    I do not understand "fancy" as anything that describes a technology, so I do not know what you mean by a "fancy RDBMS".
    Sorry. Should have said something like "one of them there fancy reelational daterbases" and had you read it like Jeff Foxworthy (better yet Larry the Cable Guy). If you don't know who he is, I don't know what to tell you. Other than "Get er done!".
     The point that I wanted to make is data should be accessible from your presistance regardless of what you are using to access it, be it your custom application, third party query tool, reporting engine and etc.
    So I need to write different data access and business logic for each type of technology? And any time anything changes, make the corresponding number of changes? Very costly. I am facing that right now.
    ...That's it.
    I think you missed my point that RDBMSs are technology hoopla.

     One should model an application after data, since data is what most likely survive longer than any application used to access it.
    Only because most people do it that way.

    BTW, an RDBMS is an application that wraps the "data".


    Your mocking of being proprietopry to a given RDBMS implementation has no grounds because there are plenty of middle-ground solution that are cross-RDBMS compliant and still retain integrity of their data.OO persistance implies locking into a specific platform or redundant specification for cross-platform compatibility.
    If you fully implemenent all business rules the db, which you will need to allow multiple applications/tools/systems to access it, you will be just as locked in. And at a higher cost.
  162. Not all data firs into tables[ Go to top ]

    So I need to write different data access and business logic for each type of technology? And any time anything changes, make the corresponding number of changes? Very costly. I am facing that right now.
    Yes, it is very costly. Buzzwords change and you need to rewrite code. Probably the best solution is to use stable technology like JDBC. Buzzword driven technology like JDO or EJB3 can die at any time.
  163. Not all data firs into tables[ Go to top ]

    So I need to write different data access and business logic for each type of technology? And any time anything changes, make the corresponding number of changes? Very costly. I am facing that right now.
    Yes, it is very costly. Buzzwords change and you need to rewrite code. Probably the best solution is to use stable technology like JDBC. Buzzword driven technology like JDO or EJB3 can die at any time.
    It is not the technology change that is the issue, but the logic + data. Integrating at JDBC does not help. Some of the work is with files.
  164. Not all data firs into tables[ Go to top ]

    It is not the technology change that is the issue, but the logic + data. Integrating at JDBC does not help. Some of the work is with files.
    Agree, I think it is better not to pollute data with code and programming language concepts for this reason. AOP, OOP is very good stuff to code, but it is not a reason to store code with data (some of OODBMS definitions are about OOP code in data).
  165. Not all data firs into tables[ Go to top ]

    So I need to write different data access and business logic for each type of technology? And any time anything changes, make the corresponding number of changes? Very costly. I am facing that right now.
    Yes, it is very costly. Buzzwords change and you need to rewrite code. Probably the best solution is to use stable technology like JDBC. Buzzword driven technology like JDO or EJB3 can die at any time.

    JDO and EJB3 are buzzword driven?!?! What?!

    Juozas what are you talking about? As far as I can tell, from my experience with JDO, it is an excellent technology driven by some very intelligent software archiects on the specification team and some very dedicated vendors.

    I'm sorry but if buzzwords change, I do not need to rewrite any code. That's simply rediculous.
  166. Not all data firs into tables[ Go to top ]

    JDO and EJB3 are buzzword driven?!?! What?! Juozas what are you talking about? As far as I can tell, from my experience with JDO, it is an excellent technology driven by some very intelligent software archiects on the specification team and some very dedicated vendors. I'm sorry but if buzzwords change, I do not need to rewrite any code. That's simply rediculous.
    Is is driven by "Transparience" and "Easy-of-Use" (marketing BS).
    BTW Is Java your real name ?
  167. Not all data firs into tables[ Go to top ]

    Sorry. Should have said something like "one of them there fancy reelational daterbases" and had you read it like Jeff Foxworthy (better yet Larry the Cable Guy). If you don't know who he is, I don't know what to tell you. Other than "Get er done!".

    Strange that you do not see your own technique in trying to take the words in my post and turn them around to make your point, which really has nothing to do with what I've said.
    So I need to write different data access and business logic for each type of technology? And any time anything changes, make the corresponding number of changes? Very costly. I am facing that right now.

    Where exactly do you see me proposing any of that? I am oposing exactly that mindset. Why you have to write access to the same data everytime, I do not understand, but if you have a reporting engine like Cognos or Brio which requires to access your data for report generation, data aggregationa and analysis so you do not have to write proprietory stuff, it gets kind of tough with OO pesistance as you end up writing data access code to integrate into each third party solution.
    I think you missed my point that RDBMSs are technology hoopla.

    No, I did not, but it is a widely supported technology hoopla. Most applications written do take in consideration Oracle, MySQL or MS SQL Server specifics and JDBC lets you abstract from most specifics letting the driver to handle such, given that administration would still be specific to each vendor, such is the case for OO persistance as well. There are plenty of DAO/Mappings frameworks that abstract such specifics.
    Only because most people do it that way.

    Quite the opposite, most applications are written before data is modeled, hence all the migration, data interoperability and access issues.
    BTW, an RDBMS is an application that wraps the "data".

    It manages the data and its integrity and provides a query based access to it. So why reinvent the wheel and rewrite all of that in your application? And BTW, Thanks for opening my eyes, really, I thought it's a kitchen appliance... ;)
    If you fully implemenent all business rules the db, which you will need to allow multiple applications/tools/systems to access it, you will be just as locked in. And at a higher cost.

    Business rules have nothing to do with having technology-agnostic data access. Business rules should be in your business layer to furnish the need of your application to process the available data such a dictated by application requirements. Other applications will have other sets of business rules/logic to interpret the data in the way they need to. (Not implying that you have to write a new application every time) I do agree that any custom application that you have control over, should access the data through some common middleware to avoid redundancy and bring cost down, but there are application that you do not have control over, see examples above, and such can care less about the business rules you've got for your proprietory need.

    Sincerely,

    Artem D. Yegorov
    http://www.activexml.org
  168. Hi Richie,

    Nice example! You've obviously found a non-trivial problem and the area here is profuse.

    What you've generally found, is the area of 'sets' within a larger table/ extent. Since these are not 1st-class identities like rows, sets are more amorphous under transactionality and less expressible by query language. For pretty much any kind of database.

    Anyway, the solution is to provide a 1st-class entity (row, instance, whatever) to manage the queue. This holds the queue pointers and can be locked for update. Use of select min() is probably not suitable for production code.

    What's the SQL query to get the first element?

    select M.ID, M.TEXT
    from QUEUE_HEAD as Q
    left outer join QUEUE_MSG as M on M.ID=Q.FK_FRONT

    Looks 1000% bulletproof to me.

    ----------

    Plug: we're currently running a limited offer of our Professional Edition JDO mapping tools, for Java developers. Interested people may wish to check this out, it's a limited time offer.


    Cheers,
    Thomas Whitmore
    www.powermapjdo.com
  169. Back in the mid 90's, TopLink and the only ever briefly successful OODB, GemStone, partnered to achive basically what this opinion is suggesting -- a combination of OODB and RDB. Frankly though, there was just never any real interest in the combination and only a handful of people ever really used it. When doing due diligence, most people decided to pick one or the other, not both. If you could somehow facade one with the other in a truely transparent way, you might have a winner. I disagree that OODB are bad at queries. Specifically, they are bad at ad hoc queries. If you can predict all possible queries ahead of time the performance can actually be quite good

     - Don
  170. Finally someone with common sense!
  171. I have had the same thought..[ Go to top ]

    It is always nice to know that there are people out there who think like me.

    In addition, it is killer that my feeling about how to use relational databases within an object oriented context is confirmed by an independent programmer.

    I attempted to prototype something, however, and ran into some stumbling blocks. In particular, the seperation of objects from their unique identification is complex and non-cohesive.

    That said, I would be interested in further researching such a product if market research proves it a viable venture.

    Best,

    John C. Dale
    MS MIS, December 2005
    The Eller College of Management
    The University of Arizona
    Tucson, Arizona
  172. I have had the same thought..[ Go to top ]

    In particular, the seperation of objects from their unique identification is complex and non-cohesive.

    Entity beans already conquered that with primary key classes and handles.
  173. OK, this a stretch as it relates to this thread, but I couldn't resist. Funny anecdote:

    During my dot com days I worked with an Engineer who was, among other things, very arrogant and very opinionated. He would typically espouse on any topic, whether he knew anything about the it or not. Usually at inopportune times, like in the restroom, in a team meeting or an interview (those were the best because we could watch the captive candidate squirm and have great hopes they'd go nuts and cause harm to the 'omnipotent' Java Engineer).

    At the end of a nightly release cycle, tied together with many late nights and probably too many hours together in confined spaces, he proceeded to tell myself and another co-worker why RDMBS's and OODBMS's were useless. In summary, his view was that the world would be a better place if we all used Hashtables to store our data. That's right, java.util.Hashtable. In memory none-the-less, not persisted, to improve performance. Since he had just finished his Java cert. and knew many superfluous things about Java and the JVM, this notion apparently appealed to him.

    Now, issuing great restraint and having flashbacks of what I used to call my life, I pondered my next move. Knowing that he liked to argue everything, I mean everything, even what color markers to use on the white board, and having an extreme desire to go home before the sun rose again, I had limited options. So, almost without thought, I quickly unplugged his PC and said, "How's your Hashtable?" While he sat speechless and in great shock, I left...

    That said, there is a place for both RDBMS's and OO/OR solutions. I think the author is on to something. His argument is similar to an Object view in Oracle. 'til we have a definitive answer though, I think I'll try my handy Hashtable and hope my UPS is fully charged... ;D
  174. Opinion: An index is not a book[ Go to top ]

    OK, this a stretch .. In summary, his view was that the world would be a better place if we all used Hashtables to store our data. That's right, java.util.Hashtable. In memory none-the-less, not persisted, to improve performance. Since he had just finished his Java cert. and knew many superfluous things about Java and the JVM, this notion apparently appealed to him.

    I just published a book on this:

    http://www.jroller.com/page/cpurdy/20050305#my_new_book

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  175. Opinion: An index is not a book[ Go to top ]

    I just published a book on this
    Is available as an Ebook? Do you have downloadable chapters? Has Rolf found proof why this book will be the downfall of Java scientists?
  176. Opinion: An index is not a book[ Go to top ]

    Hi Cameron -
    I've read many of your TSS comments over the years enjoyed your recent interview with TSS. I'm curious about your new book too. However, my comment about a Hashtable was a joke as my co-worker didn't want to persist anything, that's why I unplugged his machine. Couldn't have happened to a nicer guy. :D

    Anyway, I'm sure that your topic of Enterprise HashMap is much more fail safe than my previous co-worker's... ;) In all seriousness, I am interested in the subject. I've dealt a lot with Collections/Maps over the years. I've implemented several of my own for various caching and generic data container (graph) needs. Are you planning on posting any chapters?

    Thanks -
    Jon Schuck
  177. Was that supposed to be funny? Or have you actually not understood that it's a joke ?!
  178. SQL AOP the way to go[ Go to top ]

    If you are stuck with RDBMS, then at least you can use SQL AOP now.

    Of course, OODBMS AOP has been around for a long time....

    Dion
  179. SQL AOP the way to go[ Go to top ]

    If you are stuck with RDBMS, then at least you can use SQL AOP now.Of course, OODBMS AOP has been around for a long time....Dion
    Excellent! Anyone working on AOP for XSL(T)?
  180. SQL AOP the way to go[ Go to top ]

    If you are stuck with RDBMS, then at least you can use SQL AOP now.Of course, OODBMS AOP has been around for a long time....Dion
    Excellent! Anyone working on AOP for XSL(T)?
    Yes, aspect oriented databases will be great, but service oriented must be better.
  181. SQL AOP the way to go[ Go to top ]

    I _do_ hope that this is all a joke. If it is, Dion, then it's a good one, in the dry English style. If not oh dear.
  182. SQL AOP the way to go[ Go to top ]

    Anyone working on AOP for XSL(T)?

    There are a handful of aspect and/or annotation based XSLT bindings for Java. O'Reilly owns this one: http://www.aspectxml.org
  183. SQL AOP the way to go[ Go to top ]

    But are they AOP for XSLT?

    BTW, I was attempting humor in my previous post. :)
  184. SQL AOP the way to go[ Go to top ]

    If you are stuck with RDBMS, then at least you can use SQL AOP now.

    No offence, Dion, but it's ASQL, not SQL AOP.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  185. SQL AOP > ASQL[ Go to top ]

    We will see Cameron.

    I have a new feature for SQL AOP that is coming out in a couple of weeks. I think it will blow ASQL away.

    Dion

    ps. competition is good
  186. Opinion: An index is not a book[ Go to top ]

    [...]
    In summary, his view was that the world would be a better place if we all used Hashtables to store our data. That's right, java.util.Hashtable. In memory none-the-less, not persisted, to improve performance.
    [...]

    Let me guess: That guy's name was Klaus Wuestefeld and he lateron founded that nutty project called Prevailer?

    SCNR,
    Lars
  187. RDF......[ Go to top ]

    Hmmm this reminds me of RDF (Resource Description Framework). With the jena (HP) implementation you store your data in a boject oriented way, but you can query this with RDQL (looks and feels like SQL).

    The jena impl supports the persistence in a Relation database.
  188. Wow, a typical blog - vapid, devoid of arguments, author reasons only by analogy. This will definitely appear on dbdebunk in a few weeks.

    Let's tear it appart, just for fun...
    Well then why does everyone use relational databases for storage?
    I usually use storage arrays or internal HDDs for storage. Though solid state disks look very nice... Databases (OO, relational, hierarchic) operate on a higher level - they manage data
    Relational databases are great for query. They are analagous to the index of a book.
    How are they "analagous"? "Content" is in the book, not in the index, but content is definitely in the database. So database != index.
    Anyway, analogies are for public speeches, not for technical discussions.
    Relational databases are a terrible storage mechanism. Anyone who has tried to map a complex object graph to a relational storage mechanism will know this.
    Well, map complex object graph to anything that is not a "complex graph" and you will get the same result. For instance XML. But with databases, I don't want to stop at "mapping". What about querying, enforcing integrity ("business rules"), evolving the schema of data, sharing data between users who have different ideas of how the "complex graph" shoud look like...
    Object databases are terrible for queries.
    Is there any other way to get an object out of a database than by a query? Most likely not. So OO databases are fundamentally flawed, or that statement is baseless.
    Overcome the mental block that has gotten us thinking storage and query are the same thing.
    Look, if you store something, you need to get it back later. Unless you are using write only memory. So querying is kinda important, isn't it? And yes, storage and query are not the same. For starters, we have two different words for each concept...
  189. Opinion: An index is not a book[ Go to top ]

    Next, someone is going to come out and say that html/javascript really is a crappy way to develop application UIs. ;)
  190. Opinion: An index is not a book[ Go to top ]

    Is it not? ;D ...but we use it anyway because it is convenient... Hey let's see some midless discussion on this topic... :-)))
  191. Opinion: An index is not a book[ Go to top ]

    Next, someone is going to come out and say that html/javascript really is a crappy way to develop application UIs. ;)
    There is always a "someone". In your case: http://www.macromedia.com/software/flex/ :-)

    Regarding the topic, IMHO it's a PEBKAC case.

    I do not use O-O database and will try to - until any of my clients asks for it - don't really have that much spare time to fool-around. So, just like Cameron - I should be the last person to defend O-O databases, but I do not understand the heat: the article was theoretical, so why should the reply be - "oh, but existing O-O databases do not perform"? If the idea behind is good, anything is possible. 30 years ago, modern CPU power was hard to even imagine. 200 years ago, people would stone you if you, as much as, dreamed about stepping a foot on the Moon. So what? I am unfamiliar with the kind of "logic" used, does not seem too "logical" to my limited mind.

    Regading what is "fancy" and "in fashion": really, I have seen people developing much better software using archaic technologies than others using the "cutting-edge" ones.

    So, give me a break...
  192. I think, the main problem in this topic, that both parties "try discuss taste of kiwi, which nobody really eat".

    Ravi never saw Object Databases and say they can work. Ravi, the same fud church guys say to Galileo - "The Earth just can't be a sphere. We see - it not!".

    Juozas Baliuka with his "persitent storage for "complex object graph" then "java.io.ObjectOutputStream" is an OODBMS by definition" just make me mad. Juozas, your statement is ravings of a madman.

    Gents, it really hard to talk to somebody about Pushkin, if he just deaf.
    Object databases is a big IT area. It is industry, not just papers. Read more, play with ODBs, try do something yourself. But convince you now in anything is just impossible. Because you have no background.
  193. I just do not know OODBMS definition, google says it is the almost the same stuff as RDBMS, but data is polluted by code (it is something like stored procedures).
    http://www-2.cs.cmu.edu/People/clamen/OODBMS/Manifesto/htManifesto/node2.html
  194. This is an intersteng stament from thie same source:
    "Whereas Codd's original paper [Codd 70] gave a clear specification of a relational database system (data model and query language), no such specification exists for object-oriented database systems [Maier 89]. We are not claiming here that no complete object-oriented data model exists, indeed many proposals can be found in the literature (see [Albano et al. 1986], [L\'ecluse and Richard 89], [Carey et al. 88] as examples), but rather that there is no consensus on a single one. Opinion is slowly converging on the gross characteristics of a family of object-oriented systems, but, at present, there is no clear consensus on what an object-oriented system is, let alone an object-oriented database system"

    It looks like I am free to define OODMS myself too.
  195. the same fud church guys say to Galileo - "The Earth just can't be a sphere. We see - it not!"

    Just so you know, the above is a modern creation. By the time of Galileo, it was commonly accepted in academia that the world was round, and it had been for an EXTREMELY long time.

    (Google on "earth flag myth galileo" to find more.)

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  196. By the time of Galileo, it was commonly accepted in academia that the world was round, and it had been for an EXTREMELY long time.

    Good. Anything about internal meaning of post, which is about kiwi and taste?
  197. not the "world is flat" nonsense again[ Go to top ]

    the same fud church guys say to Galileo - "The Earth just can't be a sphere. We see - it not!"
    Just so you know, the above is a modern creation. By the time of Galileo, it was commonly accepted in academia that the world was round, and it had been for an EXTREMELY long time.(Google on "earth flag myth galileo" to find more.)Peace,Cameron PurdyTangosol, Inc.Coherence: Shared Memories for J2EE Clusters
    Wow. Darned scientists.
  198. Answer my question about data integrity.

    Can object databases guarantee data integrity like a relational database can?

    If not, why should I use one?

    If yes, then what mechanism does it use to do that? If it uses referential integrity, then it is almost like Relational. An object model and data model ERD will be practically identical then. Does it have an easy to use language like SQL? If not, what benefit do I get from using OODBs?

      When you are competing with an established technology, the onus is on the newcomers to show why their technology is superior to the existing one. It is not the job of the user of the current technology to prove the merits of every new technology that claims to be better.

      What if I say to you that I have a language that is "n" times more productive than Java and J2EE? You'll say prove it.

      That is exactly what I'm asking the ODBMS proponents to prove to me. And so far I have not heard any meaningful replies to the data integrity question except inane comments that XYZ organization uses it, so it must be fine. That is stupidity, abdication of our responsibility to think.

       Just prove the data integrity issue, or shut up and admit that ODMS are not better than RDBMS.

    Ravi
  199. Answer my question about data integrity.Can object databases guarantee data integrity like a relational database can?


    YES SIR, YES SIR, YES SIR, YES SIR, YES SIR, YES SIR. Man, who much time should I spend too make you sure the relational model itseft does not garantee any integrity? Only properly written code can do that. OODBMS vendors write this properly code.
    If not, why should I use one?If yes, then what mechanism does it use to do that? If it uses referential integrity, then it is almost like Relational. An object model and data model ERD will be practically identical then.

    Execution of below is skipped, because supposition is false.
    Does it have an easy to use language like SQL?

    Yes, the have OQL, which is like HQL (Hibernate Query Language). Basically, all ODBMS have their variant of OQL (Versant VQL), which is vary like SQL between Oracle and MySQL
    If not, what benefit do I get from using OODBs? When you are competing with an established technology, the onus is on the newcomers to show why their technology is superior to the existing one. It is not the job of the user of the current technology to prove the merits of every new technology that claims to be better. What if I say to you that I have a language that is "n" times more productive than Java and J2EE? You'll say prove it. That is exactly what I'm asking the ODBMS proponents to prove to me. And so far I have not heard any meaningful replies to the data integrity question except inane comments that XYZ organization uses it, so it must be fine. That is stupidity, abdication of our responsibility to think.Just prove the data integrity issue, or shut up and admit that ODMS are not better than RDBMS.Ravi

    Ok, man, nobody want to prove or sell to you something. We just say - you have no information and strong imagination.
    Moreover - you don't know what mechanism garantee "data integrity" in RDBMS. What are you doing, is just using RDBMS and imagine you know everything about.

    If I'm wrong, write here about mechanism that garantee data integrity in RDBMS. (BTW, also write what do you think IS data integrity). Dixi.
  200. Nice try, Dmitri. You still have not shown that data integrity is guaranteed by ODBMS. But instead ask me to prove my claim.

    Relational Technology is well known. I do not have to prove anything about it. Data integrity is one of the basic ideas of relational technology.

      Based on sound mathematical principles underlying Predicate Logic and Set Theory (each at least a 100 years old, well established and irrefutable so far), we can guarantee the integrity of data when relational database technology is used.

       If ODBMS proponents claim to have invented something that does not use predicate logic and set theory, but still guarantees data integrity, the mathematical world will be very eager to find out what it is, I am sure.

      Using RDBMS, there is no need to write code to enforce data integrity. Just declarative statements suffice. Can an ODBMS do that?

       Go read about it. Again, you are showing a woeful lack of knowledge and understanding about relational database technology in your postings.

       As said before, today nobody has to prove that the earth is round. Or that the earth goes around the sun. Likewise, there is no need for anyone to prove that data integrity can be guaranteed when relational technology is used properly.

       I urge ODBMS proponents to show how they are doing it and what is it they are doing that is different from relational technology. If there is no difference, then why use it?

    Ravi
  201. Based on sound mathematical principles underlying Predicate Logic and Set Theory (each at least a 100 years old, well established and irrefutable so far), we can guarantee the integrity of data when relational database technology is used.

    Integrity which you are talking about in reality is -
    0. Isolation levels. (READ_UNCOMMITED, READ_COMMITED and so on.)
    1. Locking (row level, page level, doesn't matter).
    2. Atomic commit.

    All staff above exist in ODBMS.

    If ODBMS proponents claim to have invented something that does not use predicate logic and set theory, but still guarantees data integrity, the mathematical world will be very eager to find out what it is, I am sure.

    Again, it not theory, it 3 points above.
    If using RDBMS, there is no need to write code to enforce data integrity. Just declarative statements suffice. Can an ODBMS do that?
    You should do same 3 points above.
    I urge ...

    You can urge only things that you are pay for.
    ODBMS proponents to show how they are doing it and what is it they are doing that is different from relational technology. If there is no difference, then why use it?Ravi

    You want we here teach you? Funny. Learning is self process.

    But, I love to hear that. I would ever get my salary increased because of this kind guys.
  202. Nope, the integrity that I am talking about is not:
    Integrity which you are talking about in reality is -
    0. Isolation levels. (READ_UNCOMMITED, READ_COMMITED and so on.)
    1. Locking (row level, page level, doesn't matter).
    2. Atomic commit.

    That is just the Isolation level of commonly accepted ACID transactions, and implementation.

    I am talking about data integrity that ensures that, for example, every account has at least one customer associated with it, every transaction code has meaning, every banking transaction has one account attached to it, etc. Database technology uses declarative referential integrity contraints to do it.

    If ODBMS is using the same referential integrity constraints, then it is an RDBMS in disguise. And therefore, the Object Model and the Data Model will be very similar.

     The Object Model will be larger because it will include helper classes (such as EventHandler) required to manipulate data. Whereas, using relational database technology, you'd put the helper code (business logic) in the application layer, whether you use SQL (PL/SQL, T-SQL, pgplsql) or C, C++, C#, Java, Smalltalk, or whatever.

    Using an ODBMS ties you into the language you use for the ODBMS. If you want to write a report, you must use a language supported by the ODBMS. If you want to do data warehousing and higher level reports, tough luck. You could still do it, but it is generally painful.

    Whereas, using a RDBMS does not tie you into any language other than ANSI compatible SQL. Then do whatever you want to do with your data.

    Ravi
  203. Yes, ODBMS have constraints.

    Please read at least Content table of documentation for Versant.
    http://www.versant.com/get_collateral?productcategory=vds&collateralid=docs/database_fund_man.pdf
  204. Dmitriy,

       Thank you for pointing me to a 540 page document.

       I browsed thorugh the contents and the index.

       The index contained no reference to integrity! That in itself is very telling.

       Nothing in the contents, either, suggested that integrity was of any concern. Hoist on your own petard, methinks!

       I went to page 44 that has one line about referential integrity when it says: " An object that contains an aggregation of embedded objects provides referential integrity ..."

       I understand that thus: I can create two classes, say Customer and Account. Now, I can embed "Account" in Customer. When I persist Customer the accounts also persist. And there you get referential integrity.

       Two problems with this:
    • When an account belongs to multiple customers, then the same account object is embedded in two customer objects. Hence, it is possible for me to go to one of the customers, modify the account object belonging to that customer, and not have it seen by the other customers. There is loss of data integrity.

      The same account now means different things to different people.

    •    Another portion of my applciation can persist an account object without having a customer associated with it. What meaning does this account have?

    Now compare this with a relational database where you hacve a Customer table, an Account table and a Customer_Account Table that links one or more customers to one or more accounts. Declare the appropriate foreign keys for these tables. Write database level code to ensure that an account must always be linked to at least one customer.

    No matter what application you use to access and manipulate the database, you can not have meaningless data about the custoemr account relationship.

    That is what I mean by data integrity and the features a relational database provides.

    Hope that helps you understand what I am talking about and why ODBMS can not guarantee data integrity.

    Ravi
  205. The index contained no reference to integrity! That in itself is very telling.

    But the main text does... I shall leave you to search for it yourself!
    Two problems with this: When an account belongs to multiple customers, then the same account object is embedded in two customer objects. Hence, it is possible for me to go to one of the customers, modify the account object belonging to that customer, and not have it seen by the other customers. There is loss of data integrity. The same account now means different things to different people.

    No. This is a fundamental misunderstanding. The account need not be 'embedded' in the customer objects. The customer objects contain references to the account. (This is the way Java naturally works with objects). If you access the account from one customer, you will be dealing with the account that is seen by all the others. This is just as true for objects in ODBMSes as it is true for objects in memory with Java.
    Another portion of my applciation can persist an account object without having a customer associated with it. What meaning does this account have?
  206. Now compare this with a relational database where you hacve a Customer table, an Account table and a Customer_Account Table that links one or more customers to one or more accounts. Declare the appropriate foreign keys for these tables. Write database level code to ensure that an account must always be linked to at least one customer.No matter what application you use to access and manipulate the database, you can not have meaningless data about the custoemr account relationship. That is what I mean by data integrity and the features a relational database provides.
    There is nothing fundamental about relational database that helps you here. You have to write 'database level code' to ensure the link. You do exactly the same with an ODBMS. For example, with JDO as the API, you can implement the 'jdoPreStore()' method if you want to ensure that links are present, or whatever.

    The Versant ODBMS has built-in facilities and data types that allow guarantees of the integrity of links between objects.
    Hope that helps you understand what I am talking about and why ODBMS can not guarantee data integrity.Ravi

    Well, perhaps you should let major organisations who use ODBMSes about this. Companies like British Airways, who, according to you, must be constantly suffering from data integrity problems.

    There are two possibilities:

    1. You are right, and all around the world there is serious ongoing data corruption in major companies and organisations (including the US Department of Defense) as a result of ODBMSes.

    2. You are wrong.

    I think I know which possibility I'm going for.
  • Opinion: An index is not a book[ Go to top ]

    Just got home and want to continue discussion, but another guys seems catch up :))
    Companies like British Airways, who, according to you, must be constantly suffering from data integrity problems.There are two possibilities:

    1. You are right, and all around the world there is serious ongoing data corruption in major companies and organisations (including the US Department of Defense) as a result of ODBMSes.

    2. You are wrong.I think I know which possibility I'm going for.

    Steve, after very long discussion, seems I agree with you.
  • No Dmitriy, there is a third possibility.

    These organizations must be writing a lot of application level code to ensure that data integrity is maintained. And must be enforcing a bit of discipline on their developers to ensure that improper code is not written.

    Once again you resort to the belief that just because large organizations are doing something they must be right. If so, they would never change their technology.

    And again, there is no proof from you that the data is protected from loss of integrity at the enterprise level. Rather there is an emphasis on application level policies to ensure that.

    In the 1960s and earlier, large corporations used application level enforcement to try to protect data integrity, then switched to relational technology to overcome the problems they faced then. Why are they going back to concepts that have proved to be of limited use?
  • No Dmitriy, there is a third possibility. These organizations must be writing a lot of application level code to ensure that data integrity is maintained. And must be enforcing a bit of discipline on their developers to ensure that improper code is not written.

    Apart from the fact that the Versant system does include constraints, I would argue that if you writing substantial business logic and you are relying on the final stage of persisting your data to guarantee the integrity of your data you are way out of line. All serious code should have application-level data integrity checks anyway! If you are accepting data from a web form, do you only expect to have deal with validating that data at the point of insert into a database?
    Once again you resort to the belief that just because large organizations are doing something they must be right.

    No. I'm not saying that they are right - just that they are successfully deploying and using technologies that you imply should not be successful.
    If so, they would never change their technology.

    Indeed. Some of them have recently moved to ODBMSes.
    Rather there is an emphasis on application level policies to ensure that.

    What is a trigger or a constraint but an application level policy? It is application code embedded in the storage mechanism. Same with ODBMSes.
    In the 1960s and earlier, large corporations used application level enforcement to try to protect data integrity, then switched to relational technology to overcome the problems they faced then. Why are they going back to concepts that have proved to be of limited use?

    I would disagree that they are moving backwards, but anyway the point is that relational databases are just one way to store data. Some people assume they are of universal applicability, but this is just revealing a lack of imagination and experience. Sure, you can shoehorn all kinds of structured and irregular data into relational stores, but that is not what they were designed for. If relational stores were so perfect, they would not need 'blob' types....
  • Steve wrote:
    All serious code should have application-level data integrity checks anyway! If you are accepting data from a web form, do you only expect to have deal with validating that data at the point of insert into a database?
    You decide what types of checks must be applied at what layer. Some that are very important must be applied as close to the data persistence as possible. Others, such as standard codes (for example, 'M' for male and 'F' for female sex) can be applied closer to where the interaction takes place. Note that even here the values ('M' and 'F' in this case) may be defined and obtained form your data store in the first place thus obviating the need for validation.
    What is a trigger or a constraint but an application level policy?
    Not quite. This is an enterprise level policy. Any application that uses it must conform to this rule. No application will be permitted to violate these rules.

    Ravi
  • Steve said: "If relational stores were so perfect, they would not need 'blob' types.... "

    This is a serious misconception about relational databases. Usage of blob types does not violate relational theory in any way. A Blob is a user defined type, just like a currency is, or any other type. And relational theory does permit arbitrary and arbitrarily complex types.

    If you choose to store your data as a blob, that is your choice only. If you want to parse the blob and store it any other way, you may do so, too. Just like an email message can be stored in its entirety as a blob, or in its constituent parts. Nothing here affects relational theory or is prohibited by it.

    Ravi
  • This is a serious misconception about relational databases. Usage of blob types does not violate relational theory in any way.

    I may not have made my point well. Blobs are often used to store information that can't be easily dealt with in a relational way - sometimes even trees of serialised objects! What I was illustrating was that relational databases are sometimes a bad fit for storing some types of data.
  • What is a trigger or a constraint but an application level policy?
    Not quite. This is an enterprise level policy. Any application that uses it must conform to this rule. No application will be permitted to violate these rules.Ravi

    This is a fair point, but as we have already pointed out, using Versant as an example, object databases do have such rules that can be applied globally.

    Perhaps a better example is Gemstone. with Gemstone/S you have concurrent access to a shared Smalltalk system, which acts as the store. The code you add to the Gemstone classes is exactly like stored procedure code in a relational store - it applies to everyone who uses the system. Obviously only DBAs would have authority to add such code! This code can perform any enterprise-level action, including validation.
  • Opinion: An index is not a book[ Go to top ]

    The code you add to the Gemstone classes is exactly like stored procedure code in a relational store - it applies to everyone who uses the system. Obviously only DBAs would have authority to add such code! This code can perform any enterprise-level action, including validation.
    It sounds better than object stream, but I think Databse Management System must manage data integraty, access control, recovery, concurrency control without any custom code. Real DMS do not have and do not need any asumptions about client application programming language and prefered data view. Relational dabases have all DBMS features. I do not need to hide data in application code, because it is safe to share data managed by DMS. I think this is the main RDMS advantage, I can use JDO,JDBC,EJB, .NET, perl, report generator, WS, OOP,AOP, script on crontab, .... to access the same data using industry standards and my prefered view without any custon data managenment code. Probably OODMS is good thing too, but nobody knows that is OODMS. I do not think analogies is a good way to make informed decision.
  • Opinion: An index is not a book[ Go to top ]

    It sounds better than object stream, but I think Databse Management System must manage data integraty, access control, recovery, concurrency control without any custom code.

    I was only talking about validation. Good ODBMSes have all of these features built-in, and with no requirement for custom code to handle them. There is no need, for example, to write custom code for access control or concurrency. Why did you think there would be?
    Real DMS do not have and do not need any asumptions about client application programming language

    Good ODBMSes also have language independence! Just look at Versant: you can use C++ and Java on the same store.
    and prefered data view.

    I don't understand this point. I know of no ODBMS that has a 'preferred data view': They just store objects, which you can retrieve by reachability or query.
    I do not need to hide data in application code

    ODBMSes don't 'hide data in application code'. I'm not even sure what this means.
    because it is safe to share data managed by DMS.

    ODBMes handle sharing of data without problems.
    I think this is the main RDMS advantage, I can use JDO,JDBC,EJB, .NET, perl, report generator, WS, OOP,AOP, script on crontab, .... to access the same data using industry standards

    Yes, this is definitely an advantage of relational systems. However, with the JDO standard you can use the same code to access object databases, at least if you stick with Java!
  • Dmitriy, do you really expect me to read a 540 page document? Why don't you tell me the page number which has the information?

    If, as you say, the Customer object holds references to Account objects, what happens when you persist them? Do they persist the account reference, too?

    Doesn't seem like it because ObjectIds will change when the server is next started. Do they use another identifier, like a primary key? If so, then they are using the same mechanism as a relational database. And hence are no different in that aspect. So what other advantage does an ODBMS have that can balance the risks?

    But even then, another application can possibly take the same data from the object persistence store, manipulate it violating the constraints that you have placed on them, and return it to you with meaningless data. Let us say that the other application needs only your data, not the framework you are using to enforce the rules.

      With a relational database, the DBMS is the framework that has the rules! There is no separation of the rules from the data! [Now this sounds suspiciously like Objects with data and methods, does it not? Stragely enough, when dealing with persistence, OO technology seems to decouple the data from the rules, while relational technology does not!] Hence, no matter what application uses the data, it can not corrupt the data.

       Hope that helps in explaining my point of view.

    Ravi
  • Should have been Steve in the previous two posts.
  • And if you must use the same framework (Versant or whatever) for all your applications to ensure data integrity, you have vendor lock-in with possible problems when thinking of migrating to another vendor.

    How is this different from being locked in to a relational database?

    And all for what? Just to add on your resume that you used ODBMS?

    Ravi
  • I am talking about data integrity that ensures that, for example, every account has at least one customer associated with it, every transaction code has meaning, every banking transaction has one account attached to it, etc. Database technology uses declarative referential integrity contraints to do it.

    In a object model those same constraints would be enforced by class invarients. Since the objects are persisted directly there is no scope for an invarient breach merely by adding persistence, thus there is simply no need for any additional persistence related integrity checking.

    For the record I do realise that OODBs have severe flaws such as data migration issues and querying BUT intergrity is simply not one of them.

    Paul C.
  • Paul, when you say
    In a object model those same constraints would be enforced by class invarients.
    that is correct as far as it goes.

     It does not preclude another developer from writing another class that violates the invariants declared in your class.

      In a database, once you declare a referential integrity constraint, then, as long as the constraint exists, no application, none, nobody, can violate that constraint.

    Malicious people with privileged access to the database can still corrupt the data. But there can not be an inadvertent corruption of the data.

    When an ODBMS can guaantee me this level of integrity I will look into them seriously.

    Ravi
  • In a object model those same constraints would be enforced by class invarients. Since the objects are persisted directly there is no scope for an invarient breach merely by adding persistence, thus there is simply no need for any additional persistence related integrity checking.
    Doe's it means invarients are defined in application level and are not managed by database ? How you maintain class invarients in applications with different view to the same data ? I hope it is possible to define constraints in OODBMS and to have many views on the same data, is not it ?
  • In a pure OODB you cant have different views of the same data because the data has no meaning outside instantiated objects.

    With an OODB you do everthing in terms of objects - which is both a benefit and a curse depending on your particular requirements.

    Paul C.
  • So Paul, if, as you say,
    In a pure OODB you cant have different views of the same data because the data has no meaning outside instantiated objects.
    then how can I write reports, for example?

    I must use the same object model with its predefined hierarchy? What if I want a different hierarchy? What if I am only interested in accounts and not customers? Or I want customers per account? How do I get that information since my hierarchy starts at Customer? Do you begin to see the problems with OO persistence?

    Ravi
  • I think you are misunderstanding now OO persistence works. What you seem to be assuming is that you can only retrieve objects if they are reachable within some kind of hierarchy. This is, of course, not true. All objects are retrievable - you simply need to code the query to find them.
    I must use the same object model with its predefined hierarchy?

    No.
    What if I want a different hierarchy?

    Just locate the object you want via a query.
    What if I am only interested in accounts and not customers?

    Just get a collection of accounts. A JDO example:

    PersistenceManager pm = ...;
    Collection accounts = (Collection) pm.newQuery(Account.class).execute();
    Or I want customers per account?

    Either each account will have a reference to a Collection object which will hold references to the customers:

    List account = aCustomer.getAccounts();

    or you can write a query, retrieving customer objects where the account referred to shares the same object identity as the account you want.

    Something like:

    Query query = pm.newQuery(Customer.class,"account == anAccount");
    query.declareParameters("Account anAccount");
    Collection customers = query.execute(ActualAccount);
    How do I get that information since my hierarchy starts at Customer?

    The hierarchy is irrelevant.
    Do you begin to see the problems with OO persistence?Ravi

    These problems simply do not exist.

    Interestingly, the way you assume is the only way that objects can be reached is the mechanism that provides 'transparent persistence and retrieval'. Once you have a Customer instance, just accessing the Account field will automatically retrieve it from the store:

    Account theAccount = aCustomer.getAccount();

    In JDO on a relational store, this will transparently generate SQL and retrieve the account object. On an ODBMS, what happens is different. Your JDO code is the same, which is the beauty of JDO.

    However, this kind of retrieval is obviously not the only way to get data - you don't have to rely on reachability in a hierarchy of objects. That would be hopelessly impractical.
  • If I want customers per account, using a relational database I can simply write:

    Select account_id, count(customer_id)
    from t_c_c
    group by account_id

    Simple.

    Your ODBMS query is not so simple. Three statements just to get the account once a customer is known! Presumably, I have to write code to get all the customers first. Then I have to aggregate them per account. Too much work.

    Well, the query does depend on the hierarchy as we've just seen. You must start with the Customer.

    Unless you are storing a list of customers in the Account class, and a list of Accounts in the Customer class. But then, you are introducing data redundancy and the possibility of corruption.

    For example, I can easily add a Customer to an Account's list of Customers without addding the Account to that Customers list of Accounts.
    Now, to avoid hierarchy dependencies, I've introduced the possibility of some application level code introducing data discrepancies.

    And to prevent the possibility of data discrepancy, I must write code not once, but twice, once in the Customer class, so that every time an account is added I must add the Customer to the account; and also in the Account class, to ensure that adding a Customer must also force the addition of this customer to its accounts list.

    But wait, this introduces a cyclic dependency, because when I add an account to a customer, it will force the account to add this customer to its list, which in turn will trigger the Customer to add this account to its list, ...

    Now I must handle this problem, too. While not insurmountable, it is a bit of work.

    Contrast this with the relational way, where you have a customer_account table linking one or more customers to one or more accounts. Add it once, and you are done. Sinmple, isn't it?

    Of course, your object model can also resolve this using a CustomerAccount object. But then you are reinventing relational technology and data modelling and normalization.
  • Rather than take you through every possible stage of using object persistence myself, why don't you download a demo and actually try it? Or, read up on it? Your posts are full of so many misconceptions about what you can or can't do, and you seem to be inventing so many imaginary difficulties with object databases that just don't exist!

    I apologise for stopping here, as I can't carry on with this posting rate :) I have other things to do!

    But seriously, why not actually try a free object persistence product, or actually read through that Versant manual? I think you will be surprised.
  • So Paul, if, as you say,
    In a pure OODB you cant have different views of the same data because the data has no meaning outside instantiated objects.
    then how can I write reports, for example?I must use the same object model with its predefined hierarchy? What if I want a different hierarchy? What if I am only interested in accounts and not customers? Or I want customers per account? How do I get that information since my hierarchy starts at Customer? Do you begin to see the problems with OO persistence?Ravi

    I'm not contesting the fact that OODBs have limitations which may be crippling for certain applications (although in fairness modern OODBs do attempt address these limitations with varying degrees of success), I'm merely saying that the single issue of structural integrity isnt the killer problem its being made out to be.

    Paul C.
  • In a pure OODB you cant have different views of the same data because the data has no meaning outside instantiated objects.With an OODB you do everthing in terms of objects - which is both a benefit and a curse depending on your particular requirements.Paul C.
    It means this data becomes private application data (data structures managed by user code and can not be shared), if data life cycle is equal to application life cycle then it is a useless stuff. I can not believe it, I hope you just made a mistake in this statement.
  • Data integrity is one of the basic ideas of relational technology.  Based on sound mathematical principles underlying Predicate Logic and Set Theory (each at least a 100 years old, well established and irrefutable so far), we can guarantee the integrity of data when relational database technology is used.    If ODBMS proponents claim to have invented something that does not use predicate logic and set theory, but still guarantees data integrity, the mathematical world will be very eager to find out what it is, I am sure.
    ODBMS proponents just use proff by analogy, it sounds like this "Imagine you want to persist complex object graph ..." It is very fun to read papers about this technology, it sounds like a joke.
  • Intergrity WRT to databases is essentially a problem of its own making - the fact that data is being manipulated in its own right rather than the manipulation being part and parcel of encapsulated behaviour.

    With any direct object persistence scheme you dont have a "problem" with intergrity because you havn't created one, because you havn't seperated data and function, and therefore havnt introduced any scope for a breach of intergrity to manifest itself (assuming a classes API enforces appropriate invarients).

    Paul C.
  • Paul, yopu exhibit a total lack of knowledge of relational technology. Therefore, do not argue about its merits or lack thereof.

     Learn about it first, and then we can have a meaningful discussion.
  • Paul,

       You are confusing the storage of data with its manipulation for various purposes. Data, after manipulation, need not be stored. Reports are an example of that.

    Data integrity problems will not arise in a properly designed database. That guarantee can not be given in a ODBMS unless it uses a relational technology underneath.

    Ravi
  • Opinion: An index is not a book[ Go to top ]

    Juozas Baliuka with his "persitent storage for "complex object graph" then "java.io.ObjectOutputStream" is an OODBMS by definition" just make me mad. Juozas, your statement is ravings of a madman.

    Couldn't have said it better myself! LOL. :-)
  • Charles Armstrong points out that sometimes relational databases aren't exactly the best way to actually store data. Charles writes: Obvious right? Well then why does everyone use relational databases for storage?
    Thousands of reply here are centered around Java buzzwords, xml, persistence etc etc. Here is my response from a purely database perspective (not mixing the Java buzzwords).
    Relational databases, though theoretically talked about prior to the 80s, really got the boost when IBM put their stamp of approval with the introduction of sql/ds and later DB2. Obviously the benefciary was not IBM, but Orcale and other mid-range vendors of that time. VAX was the platform. Later Unix implementations reinforced this advantage in the market.

    SQL was accepted because it was very easy to create a table (table like file structure) and call it a relational database. Many products claimed themselves to be databases without really having a DBMS engine. (Dbase, Symphony, even MS access .. none of them have any DBMS engine...)

    SQL gurus and relational proponents like late Dr Codd and Date amplified the hype around relational databases in their fight against Network model (E-R model)... IBM's reluctance to support E-R model helped the relational hype all the way.

    Later in the new world of OO programming, one realize that relational databases are not the best fit at all. OO objects could have been easily delivered by an E-R model db... But the vendor interests prevailed....

    OO databases like Illustra didn't take off either...
    Today the industry is forced to live with the relational hype. Most of the new programmers of Java, xml, Unix know only the current huped relational databases... They can never have any other opinion because they don't know what else could have been the options.

    Today anyone can call anything a database. Don't eb surpised to see an xml based database in the near future!!
  • Relational databases, though theoretically talked about prior to the 80s, really got the boost when IBM put their stamp of approval with the introduction of sql/ds and later DB2.

    Relational owes much/most of its prosperity to Microsoft inventing ODBC.
    Don't eb surpised to see an xml based database in the near future!!

    There're already some native XML DBs. Also vendors such as Oracle offer an XML view of the DB, ala XSQL, so that clients can use an RDBMS as an XML DB. XML DBs or facades have the advantage of HTTP tunneling and transformation pipelining. Since the relation folk in this thread have invoked data longevity and data sharing as core values, they should appreciate the ability to bolt arbitrary transforms onto the DB. This means that data can be more easily retargeted to arbitrary application models, even models defined outside the company.
  • Relational owes much/most of its prosperity to Microsoft inventing ODBC.

    If I remember right, ODBC was not invented by Microsoft - it came from the SQL Access Group in 1992, and was originally CLI (Call Level Interface).