The 5 most critical things to consider for proper collections usage in Hibernate

Discussions

News: The 5 most critical things to consider for proper collections usage in Hibernate

  1. First things first, this article was inspired after Burt Beckwith's presentation about Advanced GORM - Performance, Customization and Monitoring at SpringOne 2GX on Jan 27, 2011. In short, Burt Beckwith discusses potential performance problems using mapped collectionsand Hibernate 2nd-level cache in GORM, along with strategies for avoiding such performance penalties.

    Nevertheless the performance issues regarding mapped collections that Burt Beckwith pinpoints in his presentation apply for every Hibernate enabled application in general. That's why after watching his presentation I came to realize that what he proposes is exactly what I myself have been doing, and dictated my colleges to do when developing using mapped collections in Hibernate.

    I present you the 5 most critical things to consider when working with Hibernate mapped collections :

    Hibernate mapped collections performance problems - Java Code Geeks


    Hibernate Made Easy by Cameron McKenzie

    Threaded Messages (12)

  2. Thanks but it's not new.

    It's the problem of object-mapping and even of object design. In the example provided by the author, there is probably no business needs to manage the visits of a library at library level. The problem is not a performance problem but a design / needs problem. Of course if you what to manage the "list of Visit" you will have perfomance problems.........but it's not the problem

  3. Is this going to be part of a series? Because I only count 1 "critical thing", not 5 :P

    More seriously: I understand the performance concern, but it still looks fishy to me that we are polluting domain objects with foreign keys and database concerns, such as library_id. I know what it's doing there, but it just doesn't look right to me. It seems as if you were giving up the OOP-ness. Not sure how to solve this problem and keep the purity of the model, though.

  4. Thank you all for your comments,

    @Erik, the provided example is Hibernate mapped collections 101! and as such it is a standard design principle to be able to add/remove "child" elements from a "parent" element. Now, as far as the specific example is concerned, I strongly believe that it is entirely "logical" to be able to manage Visits directly from a Library instance.

    @Andrés, unfortunately there is no *nice* solution to the problem. We just have to give up a little of out OOP-ness in the sake of a huge performance boost.

  5. make it bidirectional[ Go to top ]

    Why not solve your mapping problem the way it's advices in the Hibernate "bible"...: make it a bidirectional mapping with inverse="true".. such that the child will always be in charge.

    So adding a child to a parent: create the chlild, set the parent and save the child, that's it...(no set is loaded)
    You can make methods like addChild and removeChild in the parent and use the Concurrent collection wrappers such that the collection (set) can't be used to add a child (jre will throw an exception).

    This is how I solve it, and works very nice... It's logic and straight forward coding...

    Note: hibernate works fine, AS long is you do it the "Hibernate way" ;)...

  6. make it bidirectional[ Go to top ]

    Why not solve your mapping problem the way it's advices in the Hibernate "bible"...: make it a bidirectional mapping with inverse="true".. such that the child will always be in charge.

    So adding a child to a parent: create the chlild, set the parent and save the child, that's it...(no set is loaded)
    You can make methods like addChild and removeChild in the parent and use the Concurrent collection wrappers such that the collection (set) can't be used to add a child (jre will throw an exception).

    This is how I solve it, and works very nice... It's logic and straight forward coding...

    Note: hibernate works fine, AS long is you do it the "Hibernate way" ;)...

    Unfortunately making the one to many (or many to many) mapping bidirectional will not solve the problem. Although the "child" will always be in charge, the non inverse part of the association makes the changes. Thats why you have to maintain a "children" collection on the "father" item and to add/remove the "child" item from it also in order for the in-memory model to be consistent with the database model upon flushing Hibernate's session.

    I do not say that Hibernate does not work fine! Hibernate works as a charm, and collection mapping is far from useless. What I am saying is that there as certain cases (especially when a lot of data is involved) where you should consider an alternative approach to Hibernate's collection mapping implementation in order to maintain performance in acceptable levels.

    BRs

  7. make it bidirectional[ Go to top ]

     the non inverse part of the association makes the changes. Thats why you have to maintain a "children" collection on the "father" item.....

     

    Why?
    This is exactly what you need to let go to gain performance through some simple changes like proposed in my post above. Try it, it works fine and straightforward. The collection is ONLY retrieved from the db when someone really wants the whole collection. Then make sure you have your fetching set correctly and you will be fine.

  8. make it bidirectional[ Go to top ]

    +1. Make it bidirectional (or unidirectional, but in the other direction). This way, adding a visit to a library consists in creating a new visit pointing to the library. If you need the visits for a library, you probably want it for a single day, or a single visitor, so you'll use a query anyway.

    The visits property of the Library could only be useful to ease the navigation in HQL or criteria queries. If it's the case, then make it private and lazy, so that the full collection is never loaded.

    Sorry to be rude, but the article, IMO, shows exactly what you should NOT do.

  9. make it bidirectional[ Go to top ]

    Well, OK people I admit that the one to many association type example that I have selected can be a little misleading, since there are many business cases where one can bypass the performance issue without compromising the OOP-ness of the model.

    It is certain that I should use a many to many association example in order to promote in a more robust way the benefits of the proposed approach.

    Nevertheless I must pinpoint that the proposed approach is far from useless especially in all those cases where we deal with high amount of data and performance is a major factor.

  10. make it bidirectional[ Go to top ]

    Here are a few of my Hibernate tips and tricks that I documented a long the way:

     

    inverse=”true”

    Use this as much as possible in a one-to-many parent-child association (to another entity or value-type that is used as an entity).

    This property is set on the collection tag like “set” and mean that the many-to-one owns the association and is responsible for all db inserts/updates/deletes. It makes the association part of the child.

    It will save an db update for the foreign key as it will occur directly when inserting the child.

     

    Especially when using “set” as mapping type, it can gains performance as the child don’t need to be added to the parent collection which can save the loading of the complete collection. That is: due to the nature of a set-mapping, the whole collection must always be loaded when adding a new child as that’s the only way hibernate can guarantee that the new entry isn’t a duplicate which is a feature of the JRE Set interface.

    In case it concerns a component collection (= collection containing only pure value types), inverse=true is ignored and makes no sense as Hibernate has full control of the objects and will choose the best way to perform his crud actions.

    If it concern detached DTO objects (not containing any hibernate objects), hibernate will delete all value-type child’s and then insert them as it doesn’t know which object is new or existent because it was completely detached. Hibernate treats it as it is a new collection.

     

    lazy Set.getChilds() is evil

    Be careful using getChilds() that returns a Set and will lazy load all child’s.

    Don’t use this when you want to add or remove just a child as it will first

     

    always implement equals/hashcode

    Make sure to always implement the equals/hashcode for every object that is managed by Hibernate, even if it doesn’t seem important.  This counts also for Value type objects.

    If the object doesn’t contain properties that are candidates for the equals/hashcode, use a surrogate key, that consists of a UUID for example.

    Hibernate uses the equals/hashcode to find out if an object is already present in the db. If it concerns an existing object but Hibernate thinks that it’s a new object because the equals/hashcode isn’t implemented correctly, Hibernate will perform an insert and possible a delete of the old value.

    Especially for value types in Set’s is important and must be tested as it saves db traffic.

    The idea: you are giving Hibernate more knowledge such it can use it to optimize his actions.

     

    use of version

    Always use the version property with an entity or a value type that is used as an entity.

    This results in less db traffic as Hibernate uses this information to discover if it concerns a new or existing object. If this property isn’t present, it will have to hit the db to find out if it concerns a new or existing object.

     

    eager fetching

    Not-lazy collections (child’s) are by default loaded through an extra select query that is just performed just after the parent is loaded from the db.

    The child’s can be loaded in the same query as loading the parent by enabling eager fetching which is done by setting the attribute “fetch=join” on the collection mapping tag. If enabled, the child’s are loaded through a left outer join.

    Test if this improves the performance. In case many join’s occur or if it concerns a table with many columns the performance will get worse instead of better.

     

    use surrogate key in value type child object

    Hibernate will construct the primary key in a value-type child of a parent-child relation that consists of all not-null columns. This can lead to strange primary key combinations, especially when a date column is involved. A date column shouldn’t be part of a primary key as it’s millisecond part will result to primary key’s that are almost never the same. This results in strange and probably poor performance db indexes.

    To improve this we use a surrogate key in all child value-type objects, that is the only not-null property. Hibernate will then construct a primary key that consists of the foreign key and surrogate key, which is logic and performs well.

    Note that the surrogate key is only used for database optimization and it’s not required to be used in the equals/hashcode which should consists of business logic if possible.

     

  11. make it bidirectional[ Go to top ]

    Thank you Ed for your tips. They are very usefull. I hope you do not mind me reposting them to Java Code Geeks (designating the original author of course!)

    BRs

  12. make it bidirectional[ Go to top ]

    Thank you Ed for your tips. They are very usefull. I hope you do not mind me reposting them to Java Code Geeks (designating the original author of course!

    He Justin,
    No problem. I hope it's useful for others.

    Ed

  13. The source code should be the SOURCE of any especulation

    https://github.com/hibernate/hibernate-core/blob/master/hibernate-core/src/main/java/org/hibernate/collection/PersistentList.java

    https://github.com/hibernate/hibernate-core/blob/master/hibernate-core/src/main/java/org/hibernate/collection/PersistentSet.java

    https://github.com/hibernate/hibernate-core/blob/master/hibernate-core/src/main/java/org/hibernate/collection/PersistentBag.java

    Interesting methods: add(Object) remove(Object) iterator()

    Yes you are right Justin.

    By the way as far as I remember looking at the source code, the collection is not fully loaded only in extra-lazy mode, in this mode you get each element executing a query, that is, if you get three elements three independent queries are executed, definitely not the best :(

    Unfortunately effective transparent persistence over RDBMs is far far away of perfect, I'm not sure but ODBMSs do not run in this issues providing true OOP transparent persistence, am I right? In this days of NoSQL is there some future for ODBMSs?