TSS Asks: POJO as Lingua Franca for persistence?

Discussions

News: TSS Asks: POJO as Lingua Franca for persistence?

  1. One of the latest trends really picking up steam is the use of POJOs for persistence. JPA still uses annotations to suggest persisted data, but DB4O and GigaSpaces 6 use POJOs directly to save data, as does the Java Serialization API. There are some potential drawbacks to the idea, but it's definitely the "simplest" way to go. What are your thoughts? Dan Creswell, author of Blitz, pointed out over IM that persistence should be all about the data we actually wish to persist, but sometimes we have to add in artificial fields like ID fields, UUIDs for easy reference, etc.
    ...it's kind of splitting our persistent data across both code and configuration which just feels not nice. So the question is, what is the right subset of stuff we should consider as persistent data and can we make that universal or will it be full of exceptions (e.g. we have a field required for a JavaSpace, another for RDBMS and if it's serialization neither of those fields is populated/used.) So then you get to the thing about a Rome object [Editor's note: one of the things that started this discussion was the idea of persisting a Rome SyndFeed directly into a JavaSpace] which is that although it might be serializable for convenience it's not really designed for persisting - rather it's designed for you to quickly pass around a little but ultimately you should render it into some other "neutral" format. Is that a POJO? I doubt it. So then the question is what POJO's are suitable for persistence when and what is it about them that makes it okay? Seems like not everything that could be considered POJO is suitable. Therefore, can a single POJO be a good vehicle for expressing, say, a collection of things to persist and a task (to be executed by some worker via a JavaSpace) and a ROME object?
    Now, Dan's points are catering to a Space-based architecture, where a master-worker pattern is quite feasible, and less appropriate for an OODBMS, which would have the same issues, but the question is portable. Are POJOs appropriate for storage across persistence engines? Is there a good pattern for this sort of thing? For example, consider a container object, which has a persistence-appropriate set of artificial IDs, and then has a object in it:public class FeedContainer { // assume delegate methods and mutators/accessors are coded, please int feedId; SyndFeed feed; }Would this be the appropriate storage medium? A container could be specialized for the persistence engine itself, and still provide the POJO being persisted directly. What do you think? What is the correct abstraction to use to store data? If a POJO is appropriate internal to an application, shouldn't that be a good indicator that it is appropriate for persistence as well? BTW, if you're not familiar with the term "lingua franca," it's explained on Wikipedia as "any language widely used beyond the population of its native speakers," meaning that the POJO would be used to refer to a data object for persistence, processing, transfer, etc.

    Threaded Messages (53)

  2. Can we get a link to the original article/blog?
  3. This is the original article. :)
  4. Can we get a link to the original article/blog?
    Unfortunately, there isn't one - this came out of an IM chat that Joe and I were having and he thought others might find it interesting. The "article" then is really a quick summarization of a flow of discussion which ideally should be distilled further through further conversation.
  5. I think that Scott Ambler has already written something related to Id. More, JDO already suggests the use of datastore identity that lets you avoid the explicit declaration of Id field (may hibernate too). I remember in mid 90's that I read something related to persistence and id field in Smalltalk. It sounded like: What bloody object-oriented is if I have to reference objects with id !!! Guido.
  6. I think that Scott Ambler has already written something related to Id.
    More, JDO already suggests the use of datastore identity that lets you avoid the explicit declaration of Id field (may hibernate too).
    I remember in mid 90's that I read something related to persistence and id field in Smalltalk.
    It sounded like:
    What bloody object-oriented is if I have to reference objects with id !!!

    Guido.
    Well, those IDs are there because the space represented by the memory location used as object identifiers in the JVM for using "==" just isn't big enough over time and space to identify a piece of data. You'll very easily see conflicts with the same identifier if you try to use those memory location identifiers as long-term ids. This is why for all of my persistent entity objects I automatically assign them a UUID created in the constructor, and use these as the object ID for equals() and for persistence.
  7. I think that Scott Ambler has already written something related to Id.
    More, JDO already suggests the use of datastore identity that lets you avoid the explicit declaration of Id field (may hibernate too).
    I remember in mid 90's that I read something related to persistence and id field in Smalltalk.
    It sounded like:
    What bloody object-oriented is if I have to reference objects with id !!!

    Guido.


    Well, those IDs are there because the space represented by the memory location used as object identifiers in the JVM for using "==" just isn't big enough over time and space to identify a piece of data. You'll very easily see conflicts with the same identifier if you try to use those memory location identifiers as long-term ids. This is why for all of my persistent entity objects I automatically assign them a UUID created in the constructor, and use these as the object ID for equals() and for persistence.
    I think that in a correct role separation scenario, is the persistence engine that have to assign (and retrieve) objects id. No need to declare an Id member, it is the engine that knows the id in its "address space" Guido.
  8. I think that in a correct role separation scenario, is the persistence engine that have to assign (and retrieve) objects id.
    No need to declare an Id member, it is the engine that knows the id in its "address space"

    Guido.
    And if you use those domain objects with more than one external adapter (i.e. Hibernate / JPA as well as XFire / Axis, etc) which identifier is the canonical identifier, or do you maintain multiple framework-assigned identifiers? I think an identifier is a basic enough concept across all external adapter frameworks that adding it to your domain class isn't a bad idea, especially since you can just add it to a base class and never see it in your individual domain model classes.
  9. I think that in a correct role separation scenario, is the persistence engine that have to assign (and retrieve) objects id.
    No need to declare an Id member, it is the engine that knows the id in its "address space"

    Guido.


    And if you use those domain objects with more than one external adapter (i.e. Hibernate / JPA as well as XFire / Axis, etc) which identifier is the canonical identifier, or do you maintain multiple framework-assigned identifiers? I think an identifier is a basic enough concept across all external adapter frameworks that adding it to your domain class isn't a bad idea, especially since you can just add it to a base class and never see it in your individual domain model classes.
    Let have a common understanding of terms. If the id is the persistence system identifier, this is **relative** to that particular persistence system and so there is no (global) canonical identifier. Otherwise it is a (unique) business field as any other and the particular domain object instance can be retrieved with a "query" for that member having a certain value. Guido
  10. Stupid id field argument[ Go to top ]

    The ID field argument is stupid, at least with respect to JPA. In JPA, you would typically declare persistent fields that are private or protected. These are, by definition, not part of a domain object that consists of the number public of methods the object exposes (or the number messages the object reacts to). Now, do we want to externalize the ID mechanism from the implementation class to have total portability whatever datasource we would ever slide under our persistence framework? Seems rather overengineered, I'd say. Also note that JPA is a framework for accessing relational databases.
  11. Re: Stupid id field argument[ Go to top ]

    Forgive me, but I don't understand how this
    The ID field argument is stupid, at least with respect to JPA. In JPA, you would typically declare persistent fields that are private or protected. These are, by definition, not part of a domain object that consists of the number public of methods the object exposes (or the number messages the object reacts to). Now, do we want to externalize the ID mechanism from the implementation class to have total portability whatever datasource we would ever slide under our persistence framework? Seems rather overengineered, I'd say. Also note that JPA is a framework for accessing relational databases.
    fits in my post.
    I think that in a correct role separation scenario, is the persistence engine that have to assign (and retrieve) objects id.
    No need to declare an Id member, it is the engine that knows the id in its "address space"

    Guido.


    And if you use those domain objects with more than one external adapter (i.e. Hibernate / JPA as well as XFire / Axis, etc) which identifier is the canonical identifier, or do you maintain multiple framework-assigned identifiers? I think an identifier is a basic enough concept across all external adapter frameworks that adding it to your domain class isn't a bad idea, especially since you can just add it to a base class and never see it in your individual domain model classes.

    Let have a common understanding of terms.
    If the id is the persistence system identifier, this is **relative** to that particular persistence system and so there is no (global) canonical identifier.
    Otherwise it is a (unique) business field as any other and the particular domain object instance can be retrieved with a "query" for that member having a certain value.

    Guido
    Guido
  12. I remember in mid 90's that I read something related to persistence and id field in Smalltalk. It sounded like: What bloody object-oriented is if I have to reference objects with id !!!
    IDs are not object-oriented, but they: - use little memory - don't change - are unique - are a simple concept "Make everything as simple as possible, but not simpler". I think, some people try to avoid simple solutions... Thomas
  13. > I remember in mid 90's that I read something
    > related to persistence and id field in Smalltalk.
    > It sounded like:
    > What bloody object-oriented is if I have
    > to reference objects with id !!!

    IDs are not object-oriented, but they:
    - use little memory
    - don't change
    - are unique
    - are a simple concept

    "Make everything as simple as possible, but not simpler". I think, some people try to avoid simple solutions...

    Thomas
    Just as an example of what I mean. class EmployeeBad { int Id; int deptId; public Department getDepartment() { return PersistenceEngine.get(deptId); } } class EmployeeRight { Department dept; public Department getDepartment() { return dept; } } Then I would say that **objects** are unique, not because of their Id member, but because of their nature. Id should be considered as an externalized reference of an object in a specific objects hosting environment. Guido
  14. Persistence for SDO[ Go to top ]

    I apologize if it is slightly off-topic, but I would really be interested in knowing if there is a persistence framework capable of handling with SDOs... I am not talking about JDBC mediators but about JPA-like engines (perhaps using instead of annotations?)
  15. SDO[ Go to top ]

    Xcalia Intermediation Core 5.1 is able to deal with POJOs (JDOs) and SDOs. The SDO client API is available for both Java and .Net. The mediator (DAS) has full mapping features (JDO at the moment, JPA will be supported). Eric.
  16. Have a look at Prevayler. Of course it suffers from poor Java serialization performance, lack of built-in SQL querying mechanism (only OGNL, JXPath, etc) and problematic schema evolution (which is also related to Java serialization). At the same time the idea of object prevalence attracts me very much. I think it's a natural way to work with model objects comparing to Hibernate and other ORMs. But lets avoid starting a flame war like "This is a stupid idea to store everything in a RAM"... Best Regards, Fyodor Kupolov
  17. Prevayler URL[ Go to top ]

    Have a look at Prevayler.
    I've forgotten to mention the Prevayler's web site: http://www.prevayler.org/wiki/
  18. JPA, POJO persistence ala db4o or GigaSpace are really just using slightly semantically different ways of declaring what parts of your model are persistent. Plus, last time I checked, JPA was POJO persistence ;-) Generally, the approach chosen tends to be as much emotionally as technically founded and for that reason the above technologies ( along with other similar ones ) generally let you choose your pleasure .... be it annotations, external meta files, language keywords, etc. Regardless, not every POJO ( or part of pojo for that matter ) is persistent in your models. So, there needs a mechanism for declaring what is/is not. So....each gives you lots of options. Even the issue of identity. It's there in every implementation whether visible to you or not (unless it's a tiny weny special purpose embedded persistence). The question is whether or not you are cluttered by the visibly of its existence. However, even if you are not, you may have a need to use it, even though you cant "see" it. So, most give you a way to access it, to use it directly, even when they say you don't need to declare it in your code. It is the nature of things. We want options, we want performance. We use what we like, we do what we must. I see little difference between the above... but for likely the install footprint and runtime characteristics for say full ORM with mapping activities -vs- direct native storage. ...But... that is a whole nnnnnuther topic ... then we need to start talking about application needs, query capabilities, caching semantics, external reporting ...you get the picture. Right tool for the right job and as Lenny Kravitz says, "let domain model rule" ...or was that Love Cheers, -Robert
  19. POJOs are certainly one option. We allow application objects to be stored directly in an ObjectGrid. This works but handling complex graphs of objects is cumbersome with this approach, especially with a simple Map API. ObjectGrid customers can also use an application independant schema that the application pojos are mapped to also. This allows applications with different POJOs to still share the data in an ObjectGrid. We then don't store application objects in the grid servers, we just store the state as tuples in an object independant form. Of course, while much easier for customers to program to, bijecting this state to/from POJOs costs path length and requires us to detect changes etc. This slows down the application when compared with working with Map APIs directly so we allow both to be mixed in a single application. This way, 80% of the app can use the 'nice' graph based POJO model to interact with data stored in the grid. But, the 20% thats performance critical can kick down to work with the data tuples directly for maximum performance. It's like CMP/Hibernate versus JDBC. Using JDBC directly uses less path than doing the same thing through an OR mapper. So, while we offer the easier programming model, applications can still 'kick down' to get the performance is thats what they need. Scale out doesn't always help here as some customers measure transactions/thread/sec as well as total throughput and obviously, path length is a factor in that type of test. So I think we'll have POJO and raw access methods for state moving forward simply because of the performance angle. The important thing is being able to mix both even in the same transaction.
  20. Hi Billy,
    ObjectGrid customers can also use an application independant schema that the application pojos are mapped to also. This allows applications with different POJOs to still share the data in an ObjectGrid.
    Now you guys are on to something here, but perhaps I am missing a point. Isn't that what the relational model is doing? For example, I can use Hibernate to create 3 subtly different versions of a Customer, but only have one 'storage version' in the database. or are you just saying you're moving this to your process space instead of the database process space so you can do the same thing while transacting in memory? Isn't that really just a model to model mapping problem? Versant does the same using an OO model, something called loose schema mapping. For us, this means applications with older versions of a model can continue to operate against models that have been upgraded in the database. I think robust, runtime, model to model mapping will be an important driver to the full adoption of domain driven design. I imagine DBA's actually using their computer science degree to model domain specific abstractions in OO for the database. Abstractions that get richer and more robust with time leaving polo (l-language) applications to "hook up" with those ....instead of DBA data modeling and subsequent developer SQL customization. Application models can vary in time, but still adhere to the core rules of the underlying model. Over time, this should result in more natural mappings and better object/model reuse in both the application and data layers. Plus, it facilitates a more agile development, deployment process.
    We then don't store application objects in the grid servers, we just store the state as tuples in an object independant form.

    Of course, while much easier for customers to program to, bijecting this state to/from POJOs costs path length and requires us to detect changes etc. This slows down the application when compared with working with Map APIs directly so we allow both to be mixed in a single application.
    Yeah, that's basically the object-relational mapping problem, albiet in a different form (layer)....it has definite runtime cost. Hey, that's why we are still in business ;-) We don't have that problem, since we store objects directly in the object database. However, model transformations, when they exist, do indeed have a cost. I find this return to Map API as a step backwards. I do understand why people are doing it .... we do what we must ( read I need performance and horizontal scalability ). However, there is no reason why drivers cannot be built for applications that are smart enough to handle all this identity/distribution stuff transparently. We do it already with Versant and I could see how a thin mapping layer could provide the same to an RDB storage manager. JPA with style ... versus custom coding Map push/pull calls in all your getter/setter methods ....yuck.. who ever thought someone would buy that again ...oh yeah, someone did ...Oracle...hee hee. Well, clearly enough folks were/are in scalability hell to start justifying desperate approaches. AND..if nobody is creating the drivers for the RDB ... then might as well make them obsolete by transacting in memory and writing behind. No wonder Oracle now owns the approach :-0 don't want that baby wandering too close to the pool. The only real rub I could think of with an application level JPA style with distribution driver is that adhoc reporting tools need to be model aware so they can also drive transparently through the distribution. That means a lot of changes to the tools as people know them today. -Robert
  21. Maybe I'm missing something but it seems to me that POJOs is an ill-defined term. It seems to me that a lot of people are thinking 'JavaBean' when they write or read 'POJO' which is something that doesn't seem all that 'plain' or elegant to me. Is a proxy Object a POJO? Is an Object with no state a POJO? Is a java.util.Map a POJO? What are we really talking about here?
  22. Maybe I'm missing something but it seems to me that POJOs is an ill-defined term. It seems to me that a lot of people are thinking 'JavaBean' when they write or read 'POJO' which is something that doesn't seem all that 'plain' or elegant to me. Is a proxy Object a POJO? Is an Object with no state a POJO? Is a java.util.Map a POJO? What are we really talking about here?
    I was thinking the same thing. POJO to me means "places no requirements on the interface of your objects," but what it really seems to mean is "roughly follows JavaBean conventions in a way suitable to the framework."
  23. Maybe I'm missing something but it seems to me that POJOs is an ill-defined term. It seems to me that a lot of people are thinking 'JavaBean' when they write or read 'POJO' which is something that doesn't seem all that 'plain' or elegant to me. Is a proxy Object a POJO? Is an Object with no state a POJO? Is a java.util.Map a POJO? What are we really talking about here?


    I was thinking the same thing. POJO to me means "places no requirements on the interface of your objects," but what it really seems to mean is "roughly follows JavaBean conventions in a way suitable to the framework."
    From my point of view, a POJO is not a JavaBean since a JavaBean is just a structure with getters and setters and no behavior. The POJO term was given by Martin Fowler and others just to show the difference with JavaBeans (http://www.martinfowler.com/bliki/POJO.html). I consider the java.lang.String class being a POJO because it contains all the behavior you can expect from a string instance. Most java classes are POJOs except some classes as java.util.Math and others which are just utility classes full of static methods. I use to say that a POJO = knowledge + behavior and that's a way to increase the reusability. Cheers, Juan
  24. I agree. Actually, POJOS may work, a few of them, to reassemble a reality and we can make them play by the business rules. We may need to persist them, to persist the current use case. Or even we may want to create the use case from storage. But I see not suitable to use them to manage data. POJOs are business information with behavior, data is just data. William Martinez
  25. POJOs are the future[ Go to top ]

    As POJOs are just plain simple java objects (stating the obvious for effect) they adhere to the KISS strategy in regard to persistence while ever you have a truly transparent persistence framework via which you persist them. POJOs require truly transparent persistence which is a wonderfully powerful thing: It's possible with JDO (eg., JPOX) and if you can turn a blind eye to the requirement to add an id field and it's setter/getter then you can treat Hibernate as 'effectively' providing transparent persistence also. Some tools (eg., Javelin - shameless plug!) implicitly generate the Hibernate id field and accessors for you so at the class diagram level your domain models still look like they consist of POJOs - not an id field in site. There are many things that can subtly draw you away from the POJO world and you must be wary of them. The biggest one I see these days is annotations. By definition a Java class file ceases to define a POJO when annotations are added. POJOs are so powerful because they are pure Java and contain no vendor specific artifacts inside them. They conform to the Java class contract and that's all. You can persist that class using any number of transparent persistence technologies. Once you start adding vendor specific annotations to your Java class files then, IMHO, you've left the world of POJOs and you're now locked into your vendor specific world - which makes your persistence provider very happy but also hyperwarps you back about 15 years to the days when vendor lock in ruled the software high seas. We study history at school in the hope that people don't repeat the mistakes of the past - be very wary of annotations! Tools like the one mentioned above automatically generate and update meta data in separate XML files as you make changes to your 'live' class diagrams - keeping your meta data synchronized with your design and code. So it's perfectly feasible and wonderfully freeing to create and maintain a POJO based domain model - knowing that your persistence directives are all stored in separate, automatically generated and managed XML meta data files. In other words your POJOs can remain POJOs.
  26. Java needs real properties[ Go to top ]

    I think that this question would be addressed well by proper properties in java. Up till now we have been doing it by convention with Java Beans, but a property syntax with support for observability, ,easy property inventory/introspection (with something more efficient/simpler than reflection) and annotated transience/persistence would be a huge boost everywhere from GUIs to storage. I would argue that POJOs would be the easy answer to this quandary if properties were available in plain old objects. I am probably more concerned than the next guy about polluting the language with unnecessary features, but property support is such a basic functionality that it really belongs at the language level - with plenty of hooks for efficient usage by tool vendors and regular-programmers alike.
  27. I think that this question would be addressed well by proper properties in java....
    Hear!Hear! IMO properties were nailed perfectly in Object Pascal: property name string: read functionOrFieldToRead [write functionOrFieldToWrite]; so code like if (obj.name == null){ obj.name = "None";} always works, even if implementation changes field "name" to be "name" property. See more http://info.borland.com/techpubs/delphi/delphi5/oplg/classes.html
  28. We need a HDBMS[ Go to top ]

    I think there are two separate things here, one is the need to store hierarchical data (HDBMS) and the other is object persistence, these of course overlap. One thing is clear, the (Relational) RDBMS is not solving these problems, it requires a very tiresome O/R mapping layer to translate the different representations, as the name suggests, object to relational and back. If we had a HDBMS we could create a much easier persistence mechanism for objects but this would not be the same as an OODBMS, the latter holds object (with behaviour) not just the data. A HDBMS would be able to persist complex XML naturally without translation or loss, queries would use something like XPath / XQuery and life would be a lot easier for people looking to persist messages. To store behaviour then a pure HDBMS wouldn't work, this is where the OODBMS comes in, using an OODBMS however would need the extra wrapping of XML by an object to persist even the simplest XML and would therefore add an overhead. IDs and UUIDs were mentioned, these aren't necessarily part of a RDBMS, yes they are used and in most cases essential to maintaining the relational model but they are simply details that are unfortunately exposed to the user and often not needed in the business layer. Objects have references but we don't think of them as IDs because the implementation is hidden, they're still there though. So I can't see anything wrong with adding UUIDs to OODBMS or HDBMS, they're simply a part of the implementation and ideally should be hidden from every day use just as addresses and references are hidden from the Java programmer. I think what the world needs now is a Hierarchical Persistence API for storing complex Hierarchical data (typically XML). It should be open and preferably language independent, making use of XPath/XQuery etc. for queries and updates. Since XML is typically immutable I suggest the implementations of this HPA are also immutable, i.e. using CoW (Copy on Write) technology (a la SubVersion or ZFS). What you then get is a (hopefully) simple API for storing complex XML with no ORMs to slow everything down and add complexity every time something changes, sorry Gavin, I love Hibernate but it's time has come! You can update the instance but go back to any previous version like a VCS. The instance versions themselves can also change version (schema version), something that's impossible in a classic RDBMS. I'd also suggest you should be able to register for events on the instance and get updates through ATOM or JMX in the Java world. Who would do this? Well JPA might just do it already, hopefully a conversation with Patrick next week in Barcelona will settle that, we might need a new JSR but this API should work on top of Oracle, Sybase, DB2 and other "classic" RDBMSs that handle XML already, the advantage being that it's an common and open API. The REALLY interesting stuff will be the in-memory technologies like GigaSpaces et al that provide a memory based implementation of this API. RDBMS RIP long live the HPA and HDBMS. Just thinking out load, -John-
  29. Re: We need a HDBMS[ Go to top ]

    Weren't RDBMSes invented to overcome all the terrible weaknesses of hierarchical databases?
  30. Re: We need a HDBMS[ Go to top ]

    Weren't RDBMSes invented to overcome all the terrible weaknesses of hierarchical databases?
    Tell me more, I go back 30 years in technology but don't remember that bit. I dare say requirements have changed though but I'd be interested to hear anything you know on this subject. -John-
  31. Re: We need a HDBMS[ Go to top ]

    Weren't RDBMSes invented to overcome all the terrible weaknesses of hierarchical databases?
    Just thinking about it, there are a lot of things that we spent decades replacing from the past that are now coming back. Take Virtualisation for example, in the old days you'd have a mainframe that ran several virtual machines (IBMs for example), we then moved off such things into multiple servers. Moore's law has given us (indirectly) machines that are several orders of magnitude more powerful than we had 40 years ago (when Gordon Moore first made the statement). The servers are so powerful that we're back to virtualisation again using Xen and VMWare etc. Full circle. Just because it was obsolete decades ago doesn't mean it won't solve problems today or in the future, technology and needs change. -John-
  32. Re: We need a HDBMS[ Go to top ]

    I'm old enough to have lived through all this, so let me share a bit. So you are suggesting we return to IMS? That's a 40 year old technology. Let's see, it was/is fast but it required redundant storage and was pretty brittle. If you didn't design for the types of access you wanted, you paid a heavy price. How about System 2000? Next came network databases (the model some ODBMS's use) which was standardized (DML/DDL) under CODASYL (IDMS, DMS-1100, etc.) Redundant data was not required. All sorts of complex relationships and structures could be implemented. It was very fast and efficient. Again the problem was flexibility. Maintaining a CODASYL database to keep it going was a lot of work. Your interface to it was very physical and of course it didn't support the lines of query you didn't think of at design time very well. Both of these required a lot of knowledge about the physical schema and organization and you hand coded all your navigation. If you changed the schema you had(ve) to physically unload and reload all the data. It is mapped to disk! Not true with RDBMS. That is why we use RDBMS today. SQL itself is not relational and is still too physical, but as bad as it is it's more flexible and powerful than what it replaced. RDBMS vendors have come up with optimisations and extensions to support hierarchies and networks (Oracle's connect by), unfortunately there isn't a standard shared by all. Well, there never was a standard to accomplish the same things in the other models, so we are still better off. One thing the RDBMS vendors excel at is performing when you have to do or add things you never anticipated. No current or past ODBMS, HDBMS, or CODASYL DBMS does this well. I've been a DBA and DB designer/developer for each of these models on at least 2 products of each type, so I have some direct experience with this. There is a something the move to RDBMS has in common with OO. Both are successful because they significantly reduce the price you pay for not knowing or anticipating everything up front. Since we've found that we never do know everything, (as the history of "waterfall" development has shown us) it's an easy choice. Don't tell me that advances in technology have changed all that. The laws of physics haven't changed. Differences in relative speeds of CPUs, memory, disk and networks have widened if anything and we are demanding a lot more of our systems than ever before. The truth is if we had 25 years to mature a really state of the art network database we might be able to match what has happened in the 25 years (27 if you count Oracle 1.0) we've been using RDBMS, but it would probably look the same since all of the DBMS models use the same data structures and algorithms at the physical level anyway and the RDBMS vendors have been incorporating every improvement they have found over the years. (Yes, B-tree is very common inside an RDBMS.) Now, could we do the same with a hierarchical model? Why? Hierarchical models cannot represent all structures and relationships without redundancy - relational and network can. So why settle for a subset? Truth is all we really need is a standardized way of navigating logical graphs in an RDBMS backed by an efficient implementation. All that is missing is the "standardized". (BTW I still like and use OODBMS, but only for certain things and not for the system record or reporting.)
  33. Re: We need a HDBMS[ Go to top ]

    Truth is all we really need is a standardized way of navigating logical graphs in an RDBMS backed by an efficient implementation. All that is missing is the "standardized".
    ...and declaritively expressing constraints of graph structure. ...and extensions to relational algebra and calculus that incorporate object-orientation. ...and truly relational RDBMSes... But we can start with a standard way to navigate logic graphs.
  34. Re: We need a HDBMS[ Go to top ]

    RDBMS RIP long live the HPA and HDBMS.
    I've read this three times now and I still can't tell if he's being facetious. Someone throw me lifeline here: should I be laughing or crying?
  35. Re: We need a HDBMS[ Go to top ]

    RDBMS RIP long live the HPA and HDBMS.


    I've read this three times now and I still can't tell if he's being facetious. Someone throw me lifeline here: should I be laughing or crying?
    Hi Gavin, it was really thinking aloud, there are a lot of people (banks et al) implementing XML storage in CLOBS and we need a more generic solution, perhaps not "HPA and HDBMS" but ORM onto RDBMS isn't working. -John-
  36. Re: We need a HDBMS[ Go to top ]

    RDBMS RIP long live the HPA and HDBMS.


    I've read this three times now and I still can't tell if he's being facetious. Someone throw me lifeline here: should I be laughing or crying?


    Hi Gavin, it was really thinking aloud, there are a lot of people (banks et al) implementing XML storage in CLOBS and we need a more generic solution, perhaps not "HPA and HDBMS" but ORM onto RDBMS isn't working.

    -John-
    Isn't working in what context? I think ORM is working for a lot of people in a lot of systems. There's a lot of interesting ideas in your post, but it's intermingled with words that remind of a age-old technology and way over-used things like XML. What problem are you trying to solve?
  37. Re: We need a HDBMS[ Go to top ]

    Isn't working in what context?
    ORM is great in many many, probably most contexts, I use it myself extensively and used it [hibernate] in our C24 product "Integration Objects" (now IONA's Artix Data Services), BUT Imagine trying to store several hundred thousand (or million) large XML files based on a very complex schema (FpML) with over 4 thousand elements and over a dozen levels of depth. Add to this the need to work on several versions of the same schema simultaneously as well as a few derivatives based on the same schema but uniquely different. Now, ORM can be made to work for any one version of the schema, Artix DS is very good at that, it uses Hibernate but firstly the SQL to retrieve what was originally hierarchical data becomes incredibly complex, everything is very slow, it requires lots of locking when you make changes and you're completely buggered (English term sorry) when it comes to storing the same instance using a slightly different version of the schema (as happens). That's our problem, it's shared but virtually every investment bank, most hedge funds and brokers, many telcos and probably many others. The XML is too complex and H into R just doesn't work. Some use proprietary XML implementations from classic database vendors like Oracle, Sybase et al but many of these break with the complexity of FpML, provide crap tools or just run like a dog. Even if it did work and I've yet to hear of a bank using any of these in a big way for XML then the solution is proprietary and therefore not attractive to a large bank. -John-
  38. Re: We need a HDBMS[ Go to top ]

    Whoa. Let's stop for a moment. Let me see if what I understansd is correct: 1. We have massive data, that we separate into identifiable sets of attributes (entities) and then relate to other entities. We have Relational and RDBMS. 2. We also have real life representations with behavior, discrete and encapsulated, full contained and not related. We have Objects and OODBMS 3. We have documents with structured or semi-structured content, grouped into shelfs, each representing a complete unit. We have XML and XDBMS 4. We have hierarchically organized data, that represents complex information units separated into even more detailed parts. We have HDBMS. So, if I'm correct, we are all using RDBMS to store all the other types of data. And SQL is not totally good to RDBMS, but still it is used to work with all the other types of data too!. Interesting.
  39. Re: We need a HDBMS[ Go to top ]

    1. We have massive data, that we separate into identifiable sets of attributes (entities) and then relate to other entities. We have Relational and RDBMS.
    2. We also have real life representations with behavior, discrete and encapsulated, full contained and not related. We have Objects and OODBMS
    3. We have documents with structured or semi-structured content, grouped into shelfs, each representing a complete unit. We have XML and XDBMS
    4. We have hierarchically organized data, that represents complex information units separated into even more detailed parts. We have HDBMS.
    A reasonably good summary although I'd suggest 3 and 4 (above) are pretty much the same thing, 3 is just a specific implementation or 4 although having said that XML does have some rather "special" features that might be difficult to squeeze into a pure HDBMS. But, there again you'd be pretty daft to design an HDBMS that didn't handle XML so I'll stick my first statement, they're the same. -John-
  40. Re: We need a HDBMS[ Go to top ]

    What you should probably be using is an OODB like Versant. That's what the Financial Times does with their 40+ Terrabytes ( and growing ) of content. Yet that data is push/pulled from 9,000 different news sources near realtime in ....ta da...XML. Things like JAXB do great runtime translation to your language objects. The issue is that you eventually need to move your XML into the structure provided by a language anyway so that you can manipulate it the way most real applications ( business logic ) need to. Well, seems these days everyone like to use objects, so you are translating the XML into language objects .... so hate to then translate to relational and deal with 3 layers of translation. So, just use an OODB. I can't count on all appendages the number of times I've heard this discussion. The problem is simply, you don't have rich programming constructs for XML. So, you need to translate it to something that does. Unless there are some leaps and bounds for XML in the area of language integration you will not benefit from actually storing the XML. If you just want to query of raw XML sources ... use something like Verity to index it and write some OO code around the use of those indexes. Cheers, -Robert
  41. Re: We need a HDBMS[ Go to top ]

    Hi Robert, I've seen Versant used in these cases but there are two problems for the general investment banking community, firstly performance and secondly, probably the real sticking point, it's proprietary. I'd like to suggest we need a standard in this area and I'd like to see someone, possibly even myself or the leaders in this area, Gavin K, Patrick L etc. help with a standard way of handling hierarchical data (read XML). That way vendors like Versant can implement it and banks can then write generic code that allows them to move onto something else when the vendor disappears, changes plans, gets bought, double its price etc. Banks build with a view to it working for the next 10 years, that's why they like open source and large companies like IBM who support 10 year old versions of their software. With all due respect to the FT if their Versant system has a hiccup one day because 3 times the usual volume of people want to access the 40+ TB of data then they're not likely to make a $100m company disappear off the stock market or lose tens of millions of bucks/quid, banks have slightly higher SLAs. 9,000 different news sources in "near realtime" probably means seconds, perhaps it means milliseconds but again what's the impact of one of those taking 2 seconds, probably very little. We have requirements where we'll process over 10-50,000 a second and need a 10ms response time, I wrote something similar to this on TSS a few years ago here. So just use an OODB? Have you ever tried doing something fast on an OODB? I'd love to but the only solutions for us are technologies like GigaSpaces where everything's in memory. Verity (Autonomy) is a pretty good tool, I've seen this used but it doesn't solve the specific business problems in the bank. A lot of banks use Google's search engines too, they use there to search through huge amounts of data and it works very well. Does anyone have a standard API for XML persistence that will allow me to use a variety of implementations (like JPA, JDBC, Hibernate, JMS etc.)??? Thanks for your input Robert, for the FT the choice of Versant and perhaps Autonomy was a good one, it's not right for the problem in hand though. -John-
  42. Re: We need a HDBMS[ Go to top ]

    Does anyone have a standard API for XML persistence that will allow me to use a variety of implementations (like JPA, JDBC, Hibernate, JMS etc.)???

    -John-
    Take a look at Service Data Objects - the SDO API is an XML style API that is available in both Java (via JCP) and C++ (via OASIS). The data is handled at the back end via a Data Access Service - so you can implement a DAS for XML, JPA, JDBC, Hibernate, etc, inn fact, for any data format you happen to need.
  43. Re: We need a HDBMS[ Go to top ]

    Take a look at Service Data Objects
    Thanks PJ, I've looked into SDO in the past, we thought about implementing it in C24. I'll take another look but I think the level of integration might be overkill for some of the higher performance requirements. Still, any suggestions are better than none I'll come back with comments at a later date. Thanks, -John-
  44. Re: We need a HDBMS[ Go to top ]

    Take a look at Service Data Objects


    Thanks PJ, I've looked into SDO in the past, we thought about implementing it in C24. I'll take another look but I think the level of integration might be overkill for some of the higher performance requirements. Still, any suggestions are better than none I'll come back with comments at a later date.

    Thanks,

    -John-
    You head of Skytide? I think they do exactly what you are describing. Nat Wyatt (of Illustra / Cloudscape fame). Smart folks. I like your suggested model as well, John. Take a look at these guys though. And if you have, what's your opinion?
  45. Re: We need a HDBMS[ Go to top ]

    You head of Skytide? I think they do exactly what you are describing. Nat Wyatt (of Illustra / Cloudscape fame). Smart folks. I like your suggested model as well, John. Take a look at these guys though. And if you have, what's your opinion?
    Skytide looks interesting (so far), I'll have a read through what they've got. I've not come across them before but I'm impressed they mention FpML in their web product doc. that's a good start in my vertical. Thanks for the pointer, -John-
  46. Re: We need a HDBMS[ Go to top ]

    Hi John, I think standards are good, but I've seen that especially the financial industry generally looks past them when needed... which is why Tangosol, GigaSpaces and others are making great headway in that vertical industry at this moment. Regarding performance ....( sorry to sound like a sales guy here, but facts are important )...did you try us back in the mid 90's when we were a multi-process model? Did you know that Versant IS the database running the American Stock Exchange options trading platform ... I think it's doing something like 70,000 tx/sec. Not to mention a host of other nearly as serious other financial institutions. We are also running the availability processor for Sabre... American Airlines, Travelocity, etc. Booking airline tickets across multiple hops across multiple carriers in realtime are some of the most challenging transactions in the world. The last numbers I've heard were 150,000 tx/sec. Plus, they've had zero down time in well over 1 year something I'm told never seen before in the history of that company. There will be no hiccup at FT, but should it happen we are well equiped to fix. I seriously doubt performance would be a problem with Versant. That is THE SINGLE REASON we continue to sell millions in licenses every quarter. I wish you the best of luck getting your standard movement headed in a positive direction. Right tool for the right job, -Robert
  47. Re: We need a HDBMS[ Go to top ]

    I wish you the best of luck getting your standard movement headed in a positive direction.
    Robert, Thanks, I've seen Versant perform some pretty impressive throughput in the past, perhaps I'll get back in touch if we get a standard together, they'd be a valuable member. -John-
  48. there are a lot of people (banks et al) implementing XML storage in CLOBS and we need a more generic solution,
    XML was designed primarily as a way to 'transfer' knowledge of an object graph between two points in a language independent way. It has also been used extensively as a way of expressing configuration for a wide variety of systems... ...but to use XML as a way of storing high volume data? I don't think this is a move in the right direction. We already have far too many ways of storing data/objects - please don't encourage people to develop a 'standard' to help people access objects stored in CLOBerized XML! You're trying to solve a problem that shouldn't exist in the first place... Instead of storing data as XML in a CLOB just parse your XML files into object graphs and store the objects using an ORM. You can then use the existing technologies to index them, query them etc., When you need to 'transfer' your object graph as XML then convert the appropriate object graph to XML and send it as XML. Using XML as a high volume storage mechanism would be like storing data for a data driven web site as HTML in CLOBs - it simply doesn't make sense IMHO. It makes a lot more sense to store the data in database fields and render the HTML on demand.
  49. Re: We need a HDBMS[ Go to top ]

    Probably a discussion to take up with legendary database guru Micheal Stonebreaker. In this May's issue of ACM QUEUE he can be quoted as saying: jdbc, odbc, their the worst interfaces on the planet and SQL could very well disappear and RDB's are built on 30 year old archtectures that have bolted on new features over time and relational databases have inherent scalability problems in the multi-terrabyte range and you don't build a skyscraper floor by floor, year by year by year, by committee and expect it not to fall over eventually. http://www.acm.org/acmqueue/digital/Queuevol5no4_May2007.pdf Of course, he also contends that OODB's have failed.."because they went after CAD and CAD didn't want them". I know more than a few companies that would disagree with Michael on that one. Plus, nearly 2 straight years of multi-million dollar profitable quarters as a public company would suggest Versant is successful ( even the #1 company in 2006 according to Business Week in the NASDAQ small cap ). Well, I always say, right tool for the right job. As the nature of the "job" is changing.... so will the tools. Don't cry though... just like people, technology, from the moment born, is destined to die. And relax, every dart thrown at RDBs as the one size fits all storage technology is not an implied threat to your beloved Hibernate RDB mapping tool. Right tool for the right job, -Robert
  50. Re: We need a HDBMS[ Go to top ]

    I'd also like to point out that the ACM Queue magazine that Robert linked to has a very interesting article by John O'Hara about AMQP on page 68 ("Towards a Commodity Enterprise Middleware"). You might think this is off track but there's an interesting pattern here. Solutions exist for banks to send messages both internally and externally, they work on just about every platform known to man but the JP Morgan et al were still compelled to create a new open wire-level protocol called AMQP to help commoditise the market place (and of course reduce cost). I see the same problem with databases, we've got some good open source databases but we're still using customised and proprietary solutions for XML. What we need it an open standard/API for XML. Have a read of the article, it's very good, if you think it's a daft idea then ask yourself why several very large blue-chip companies, three of the world's largest banks taken it up, three of them in the last month??? There's a lesson to be learnt here. -John-
  51. Re: We need a HDBMS[ Go to top ]

    RDBMS have been successful for two reasons: 1. Better tools than previous technologies (e.g. IMS) 2. Market momentum and not because of a superior underlying theoretical model. I'm seen as a dangerous radical for proposing that RDBMSs are the wrong solution to many of the problems we have in my company. People associate hierarchical DBs with their memories of the tooling that supported IMS/DB, rather than the theoretical model. And the tooling was really bad (ASI/Enquiry anyone for reporting? No thanks). But there's no reason for it to be this way. Adding a new attribute can be made as easy as adding a column in RDBMSs, it just requires tooling. After all, early RDBMSs often required an unload and reload to make schema changes. With more sophisticated schema maintenance and reporting capabilities, these objections belong in the past. I think it's easy to point to the deficiencies in the relational model. I'd contend that most data is naturally hierarchical - i.e. an element depends for it's identity partly on a parent entity and is not independent. If you examine most real world DBs you'll see a few core entities, and the rest are really children or intersections. While all data can be represented as tables, it's not the most natural form. Also, the relational model can be supported in a network hierarchical database. It's just the simple case where every node is a root node. This should ease the mental transition for RDBMS die-hards. But we don't need to hope for a new hierarchical DB or even a new API. We can use the RDBMS with its supporting infrastructure and mature tooling, but add an abstraction layer on top that supports hierarchical data modelling. And that layer exists today in the form of the JSR170 Java Content Repository API, e.g. Apache JackRabbit.
  52. Oh, I dunno[ Go to top ]

    But today we have 64-Bit machines coming out of our ears. The most significant advance from that is we get 32 more bits for addressing. So, let's just persistently mmap a few TB of virtual disk space in to the heap, and work with that. Just map it in to the Permanent Generation of the JVM, or make a "No, Really, this is the Super Permanent Generation" segment. The difference is that "NRSPG" segment has to be manually GC'd. ID == a JVM pointer. voila!
  53. Re: Oh, I dunno[ Go to top ]

    ID == a JVM pointer. voila!
    Multi-user? Transactions?
  54. BTW, Joseph. POJO persistence is not the simplest way to go. Far from true. POJO is not actually equal to data. You can persist POJOs, but that is not the best data representation. Data should be processed, converted to information, stored, recovered and analysed. Data can have several structure types, and can represent several things. Thus, POJO persistance is just a commodity to save what you have at hand, but real data intensive applications cannot rely on it to solve their needs. William Martinez Pomares.