Autofetch (ORM extension) 1.0 beta released

Discussions

News: Autofetch (ORM extension) 1.0 beta released

  1. Autofetch (ORM extension) 1.0 beta released (19 messages)

    Autofetch 1.0 beta has been released. Autofetch automatically specifies prefetch/fetch profiles for queries in ORMs or object databases. Currently, Autofetch only comes with a Hibernate connector, although more may be on the way. Autofetch is an open source project licensed under the LGPL. Most object persistence frameworks allow the user to fetch/prefetch associations for object returned by a query. For example, Hibernate allows a user to specify which associations are lazy in their configuration mapping or include fetch joins in their queries. Autofetch automatically adds these prefetch directives for the user by dynamically profiling the program's behavior on query results and figuring out how to cluster that data. In other words, as your application uses an object graph retrieved from a query, Autofetch adds prefetching to that query so that the database sees fewer requests. Here is a simple example:public List getLazyEmployees() { ... // Invoke query we have built up and return list of employees return (List) criteria.list(); ... } public void tellManagersAboutLazyEmployees() { for (Employee e : getLazyEmployees()) { sendWarningEmail(e.getManager(), e); } } public void printLazyEmployees() { for (Employee e : getLazyEmployees()) { System.out.println(e.getName()); } }Here we have a single method - getLazyEmployees - which is invoked in two different places. In one invocation we would have liked to prefetch the manager association, while in the other no prefetch was necessary (assuming name is a primitive property of Employee). We have three options:
    1. Add an argument to the getLazyEmployees method which tells us what traversal will be run. This complicates our method signatures and defeats the idea of separation of concerns.
    2. Create multiple versions of the getLazyEmployees method. Unfortunately, duplicate code causes maintenance headaches.
    3. Pick one static prefetch specification to use all the time. This is probably the most popular option, but is difficult when you have recursive associations and may load a lot more data then needed for a particular use case. It is also problematic when traversals get larger.
    The problem above is made worse when the traversals and queries are separated in different architectural layers. It is very troubling if modifications to the information displayed in the view layer requires modifying code in the model layer. The prefetch specifications create a subtle performance dependency between the different layers of the application. Autofetch solves this problem by determining the right prefetch specification for a query in a given program context (inlcudes program stack). All that is needed to enable Autofetch is addition to the classpath and a modification in acquiring the SessionFactory:Configuration cfg = new AutofetchConfiguration().configure(); SessionFactory sf = cfg.buildSessionFactory();One current limitation of the Hibernate connector is that it only adds prefetch specifications to criteria queries and load/get queries (not HQL). Please try Autofetch out!

    Threaded Messages (19)

  2. Looks interesting[ Go to top ]

    I skimmed over your paper, this projects seems very interesting. I'll really have to try it out sometime. If it actually works, this is a very elegant solution to getting efficient queries without having the view layer mandate its optimal fetching plan (either through specialized calls or using some other param object).
    One current limitation of the Hibernate connector is that it only adds prefetch specifications to criteria queries and load/get queries (not HQL).
    Any plans on getting rid of this elimination? -- Kind regards, Christophe Vanfleteren
  3. HQL queries[ Go to top ]

    I working on the adding support for HQL queries. The really difficult part is adding in the fetch joins into the query string. I really don't want to parse the query...
  4. Re: HQL queries[ Go to top ]

    ... I really don't want to parse the query...
    I believe, it could be done on AST level...
  5. HQL AST[ Go to top ]

    ... I really don't want to parse the query...


    I believe, it could be done on AST level...
    Yes, it could be done at the AST level or at the loader level. The problem is there is no good way at getting to these from a Hibernate extension point such as listeners or interceptors or the public API of Query. I will try talking to the Hibernate folks about providing such an extension point. It might also be possible to do something through the JPA fetch profile interface...
  6. Autofetch solves this problem by determining the right prefetch specification for a query in a given program context (inlcudes program stack).
    Nice idea!
  7. Seems interesting. Looks like autofetch-1.0beta.jar is corrupted. I am eager to try it out as I our apps have lot of problem with all those pre-fetch's.
  8. Fixed[ Go to top ]

    Thanks for pointing out that the Sourceforge download was not working. There is no problem with the file, but Sourceforge always returns an empty file for one of the downloads. I had to create a new release (with the same files) to fix the issue. Please let me know what you think!
  9. I'd like to suppose an alternative approach for lazy fetching, that could be combined with autofetch in order to gain a performance benefit also for the first query .. I need an example schema in order to explain what I mean. This would be the join, that fetches the whole object graphs for all requisitions: select r from Requisition r join r.lines l join l.article a join a.productGroup .. no need to say, that this query is quite expensive and you don't want to load the whole object graph for every requisition, if you are going to display a list of requisitions only. from Requisition .. would be enough. However if you need the whole graph for a single requisition for displaying it in a form, it would be even worse to start with the second query and then traverse the object graph, causing hibernate to fetch all objects one by one (with little or no optimization .. maybe batch fetching). The requisition list needs the requisitions only. The requisition form needs a quite complete object graph (to a certain depth) and some requisition logic might need another part of the object graph. Ideally you should have a specific query for every purpose. You end in a situation, where the persistence code is specifically aimed for a certain ui and logic. A possible solution to this problem would be a deferred lazy fetching strategy: o at first, the requisitions itself are loaded only o as soon as the first line of the first requisition is touched, all lines of all requisitions are loaded o as soon as the first article of a first line is touched, all articles of all lines are loaded o as soon as the first product group is touched, all product groups of all articles are loaded I was not able to achieve this behavior with hibernate's own means. Maybe I just didn't get it and somebody could tell me how? However it could be achieved quite efficiently with sub-secutive selects with in-clauses (or subselects in some cases). from RequisitionLine where id in (requisitionid1, requisitionid2, ...) from Article where id in (articleid1, articleid2, ...) Any ideas? Comments? Holger Engels
  10. I'd like to suppose an alternative approach for lazy fetching, that could be combined with autofetch in order to gain a performance benefit also for the first query ..

    ... Comments?

    Holger Engels
    Great ideas! Actually, there have been a couple of academic papers describing this exact approach ("Context based prefetch...": Bernstein, "PrefetchGuide": Han et al.). In my paper, I discuss adding this capability to Autofetch. After adding support for HQL, this is my next highest priority work item. Hibernate has some rudimentary support for scenario you describe. You mentioned batch fetching and there is also subselect fetching. The problem with subselect fetching is that it only works 1 level deep which means you can still get the n+1 select problem. Batch fetching is actually quite good for small amounts of data, but it has some pathologies for large amounts of data and tends to be too aggresive in some cases. Right now, I would probably recommend a combination of batch fetching and Autofetch for most cases, until a better solution can be designed.
  11. Yes, using sub-selects is a very elegant way to solve the original problem proposed by the post. Hibernate 3 already supports this very well, but it seems that it is not a feature that is widely used/understood. Specifically, fetch='subselect' and batch-size='N' are the settings that allow this behavior. The great part about subselect is that (in your example) for *all* Requisition objects already in the Session, as soon as *one* of the articles is loaded -- all (or a specific batch size) of the lines are loaded. It's exactly as you say: "as soon as the first line of the first requisition is touched, all lines of all requisitions are loaded." See http://www.hibernate.org/315.html for a short explanation of this. The best source on the matter is Gavin and Christian's book 'Java Persistence with Hibernate' This also has the excellent property that the Session-level and 2nd-level caches are checked for the associated entity before the subselect. Whereas a custom HQL query specifying the join would cause the query to always execute and pull from the database, the subselect allows for optimal cache usage on the subselected entity. - MarkWPiper
  12. Subselect fetching is not enough[ Go to top ]

    Yes, using sub-selects is a very elegant way to solve the original problem proposed by the post. Hibernate 3 already supports this very well, but it seems that it is not a feature that is widely used/understood.

    Specifically, fetch='subselect' and batch-size='N' are the settings that allow this behavior.

    The great part about subselect is that (in your example) for *all* Requisition objects already in the Session, as soon as *one* of the articles is loaded -- all (or a specific batch size) of the lines are loaded. It's exactly as you say: "as soon as the first line of the first requisition is touched, all lines of all requisitions are loaded."

    See http://www.hibernate.org/315.html for a short explanation of this. The best source on the matter is Gavin and Christian's book 'Java Persistence with Hibernate'

    This also has the excellent property that the Session-level and 2nd-level caches are checked for the associated entity before the subselect. Whereas a custom HQL query specifying the join would cause the query to always execute and pull from the database, the subselect allows for optimal cache usage on the subselected entity.

    - MarkWPiper
    Subselect fetching is nice, but actually in Engel's example you still will get the n+1 select problem! When you traverse the first requisition line, you will get all the requisition lines which is great. However, once you traverse the first article, you will only get all the articles for a single requisition line. Since there could be many requisition lines, you still get many queries. Similarly you will execute one query for each article for its product group. The problem is that subselect fetching only works one level deep. Engel described what you need to do, although I would modify it to load all requisition lines once you see the SECOND request just to be a little more conservative about assuming an iterative pattern. Autofetch will be able to handle Engel's example in one query after the traversal is executed a single time. What we would like to improve is the first traversal's performance.
  13. .. yes exactly .. and subselect fetching is not supported as an override in criteria queries (look at the class FetchMode) ..
  14. This is similar to what JDO specified in the 2.0 release. From the JDO specification (available at http://db.apache.org/jdo/specifications.html):
    A fetch plan defines rules for instantiating the loaded state for an object graph...A fetch plan can be associated with a PersistenceManager and, independently, with a Query and with an Extent. A fetch plan also defines rules for creating the detached object graph... A fetch plan consists of a number of fetch groups that are combined additively for each affected class; a fetch size that governs the number of instances of multi-valued fields retrieved by queries; a recursion-depth per field that governs the recursion depth of the object graph fetched for that field; a maximum fetch depth that governs the depth of the object graph fetched starting with the root objects; and flags that govern the behavior of detachment.
    A "fetch group" is simple: it's just a named list of the fields to be included in the group, and it's defined in your persistent class's metadata. You can specify different fetch plans per use case, thereby allowing you to reuse the same logical query but return differing breadths and depths of your object graph. That way, you don't have to change your query in order to get the data you need for each particular use case. It's good to see the Hibernate community getting this feature. Kudos to the Autofetch team! -matthew
  15. Hibernate already has fetch plans[ Go to top ]

    This is similar to what JDO specified in the 2.0 release. From the JDO specification (available at http://db.apache.org/jdo/specifications.html):
    A fetch plan defines rules for instantiating the loaded state for an object graph...A fetch plan can be associated with a PersistenceManager and, independently, with a Query and with an Extent.

    A fetch plan also defines rules for creating the detached object graph...

    A fetch plan consists of a number of fetch groups that are combined additively for each affected class; a fetch size that governs the number of instances of multi-valued fields retrieved by queries; a recursion-depth per field that governs the recursion depth of the object graph fetched for that field; a maximum fetch depth that governs the depth of the object graph fetched starting with the root objects; and flags that govern the behavior of detachment.
    A "fetch group" is simple: it's just a named list of the fields to be included in the group, and it's defined in your persistent class's metadata.

    You can specify different fetch plans per use case, thereby allowing you to reuse the same logical query but return differing breadths and depths of your object graph. That way, you don't have to change your query in order to get the data you need for each particular use case.

    It's good to see the Hibernate community getting this feature. Kudos to the Autofetch team!

    -matthew
    Actually, Hibernate already had the equivalent to fetch plans/groups both at the query and configuration level. Autofetch is much more than just that, it automatically specifies the fetch plans for you! It does that might dynamically profiling your application and figuring out the appropriate fetch plans to use for each query at runtime.
  16. Ali Ibrahim - this looks interesting I think! I very vaguely remember having seen a similar idea years ago in one of the early JDO implementations of the time; unfortunately I can't remember which one (maybe LiDO, but really not sure; or another one listed in the Appendix of my Core Java Data Objects book). Will you put together a JPA implementation of your framework that works with e.g. the JPA API of Hibernate, or even better with OpenJPA (simply because that happens to be the JPA implementation I use - then I might even try it).
  17. JPA / OpenJPA[ Go to top ]

    Ali Ibrahim - this looks interesting I think!

    I very vaguely remember having seen a similar idea years ago in one of the early JDO implementations of the time; unfortunately I can't remember which one (maybe LiDO, but really not sure; or another one listed in the Appendix of my Core Java Data Objects book).

    Will you put together a JPA implementation of your framework that works with e.g. the JPA API of Hibernate, or even better with OpenJPA (simply because that happens to be the JPA implementation I use - then I might even try it).
    I would love to put together a JPA/OpenJPA implementation. I have already started to look at OpenJPA. The main stumbling block is that neither provides enough extension hooks to implement all the functionality I need. For OpenJPA, I might be forced to fork the codebase to integrate it with Autofetch.
  18. Ebean Autofetch...[ Go to top ]

    I have standed putting this into Ebean. Something potentially interesting is that I'm collecting all the query execution stats for a given "query/callstack". What is interesting about that is that I'll be able to get a "total query cost" for a given object graph. That is, add up the cost of the "original" query plus the costs of all the related lazy loading queries. This could be used to compare directly with the same "query/callstack" after "auto tuning/changing" the query to use joins etc (and probably partial objects) and reducing lazy loading. So yes for a given "query/callstatck" I'm collecting both the object graph usage/traversal and all the related queries. I'm also sure that there will be some scenarios where the lazy loading will outperform the joins (aka queries with joins are "wider" and that extra cost maybe greater than lazy loading if it is highly redundant). Good stuff though... liking this...
  19. AutoFetch and "Partial" objects...[ Go to top ]

    The other point to note is that Ebean has "partial" object support. So AutoFetch is not only suggesting the joins but also the properties used. That is, Ebean will be able to use this to just select the properties that are used. This could be used on the original query but I can see this could also be used for the lazy loading queries - future stuff but there is a lot of potential here.
  20. A algum tempo descobri um video tutorial totalmente em português sobre JPA produzido pelo Fabio Kung da Caelum, onde mostra passo a passo como utilizar essa biblioteca que é uma mão na roda de quem quer fazer mapeamento de objetos com banco de dados. Nesse tutorial é explicado desde os downloads dos jar necessários, configuração do banco de dados (mysql), utilização do jpa dentro do eclipse, geração das tabelas, etc, etc... dentre todos os videos tutoriais que já vi sobre java em geral, esse foi um dos melhores de todos. Muito bem explicado e detalhado. Para quem não conhece, JPA é a biblioteca usada de base para se construir os EJB Entity Beans, que fazem o papel de manter a integridade entre os objetos de entidade e o banco de dados. Mais uma vez, excelente tutorial! Parabéns Fábio! http://www.jornaljava.com/2008/11/video-tutorial-utilizando-jpa-com.html