EJB design: Problem with loading many entity beans.

  1. Problem with loading many entity beans. (13 messages)

    The problem involves loading many entity beans and is one that was also encountered by a Sun representative who visited us recently.
    We deal with pharmaceutical data. One typical data type represents a DNA sequence. As a sequence represents a row in the database the obvious modeling choice is to make a sequence an entity bean. When you do an ejbFindBy on the entity bean it is possible to get hundreds or thousands of primary keys returned. The container then calls load with each primary key. The problem is that for every load call there is at least one database call instead of just one call that loads all the data for the primary keys at once. So many database calls is very slow. Furthermore there are typically hundreds of thousands or millions of sequences in these databases so smart caching would be a problem.
    For example if we wanted to show all the sequences returned in a graph we have to load all of the entity beans with all of their data. We overcame this by creating a session bean that returns lightweight pass by value objects containing only the data that was needed for displaying them in the graph. Each PBV contained a reference so that it could navigate to the real bean if necessary, (to update it for example).
    The Sun rep who visited us had implemented an airline booking system over the reference implementation. The final stage of the booking system involved drawing a picture of an airplane superimposing all the reserved seats on top of it. Each seat was represented by an entity bean and as there are around 400 seats on a 747 this was taking quite a while! Their solution was to provide a session bean returning pass by value versions of the seat data, i.e. a similar solution to ours.

    Would anyone like to comment on the ideal solution to this problem? It seems like a bit of a hack to EJB model having to create lightweight PBVs of the real thing!
  2. Jeremy,
        Most containers won't load the bean until a business method is called. So if you query 100,000 entity beans, they won't be loaded until a business method (such as the method to get a value object) is called. If you do get a value object for every item in the resultset, then the following optimizations could help:

    1) There may be CMP implementations that perform bulk loads.

    2) Use dbIsShared. <-- forces app. server to cache entity bean state in memory, make second call for same data VERY FAST.

    3) Not use entity beans.

          You may also consider using JDBC to show just the data you need for your particular usecase, and load your entity beans when you want more information, such as information about a particular sequence.

        You may be wondering why EJB requires n+1 database calls. This happens in order to support entity caching. That is, the finders return a set of primary keys so that the app.server can return a reference to an entity bean that it had allready stored in memory. This will really speed up most deployments, but I guess it would also slow down others like yours which only query particular subsets of data once in a while. But then again, for a system like the one you are building, you probably have the budget for a mega machine with globs of memory, so maybe you can take advantage of caching after all.


  3. By modeling each sequence as an entity bean, the expectation is to have one "remote" business object per sequence.

    In the scenario you explained, one of the applications of these beans to construct a graph. In such cases, I'm not sure if a sequence can be modeled as an entity. In fact, if the data that a sequence represents is not transactional, what's the need for defining this as an entity. If caching is the sole objective, caching dependent objects is much more simpler and more effective in this case. So I think a better alternative is to model these as dependent objects.

    I also consider the airline example a bad candidate for demonstrating entity beans. In this case, each object represents some data with one of the attributes representing if a seat is reserved. A better choice would be to model these as dependent objects.

    As long as you model such collections as entities, the problem of n+1 calls can not be solved, since each sequence entity will be loaded by a finder, and the finder will have to make atleast one trip to the database.

    - Subrahmanyam
  4. In the case of drawing a graph with these sequences, it would seems sensible alright to have them as dependant objects. However there are other times when you might want to update these remote objects, i.e. you would genuinely need a transactional remote object.

    Perhaps a better way would be to write a 'Value' object (the pattern in which a normal Java class implements the business interface and the entity bean extends it). These Value objects could be returned from a session bean. If one needed to update a sequence it would probably be possible to navigate to the entity bean.

    I was thinking about the ejbFind methods as the way that one must navigate to this data. But I suppose it would be ok to get a session bean to return representations of this data as value objects. Session beans generally carry out business logic so this bean could be though of as carrying out a specific piece of logic- retuning sequence data to plot a graph. Any comments?!

  5. I disagree with the idea of navigating from value objects to respective entities.

    From an implementation point of view, this could mean embedding remote object handles/references in the Value object. This reduces the utility of the value object.

    Apart from this, we should be able to transport Value objects from VM to VM safely with no awareness or concern for EJBs.

    - Subrahmanyam
  6. Without knowing too much about what you are doing (and still at the very beginning of my first EJB impl) it seems like a command pattern implementation would help you.

    How about something like this:
    Use the ejbFind to get the Sequence. The Sequence has value objects that can quickly and easily share your data. Then use a session bean called SequenceService (or SequenceManager) which has a method like

    modifySequence(SequencePK, newSequenceValueData);

    I think this also gives you the right transactional level (um, I don't mean isolation level, I mean it does the right number of steps as one transaction). I suppose you will update the Sequence a number of times and then say - "Yup, save that baby". This would make versioning of sequences easier as well if that is ever a requirement (but if YouArentGonnaNeedIt, don't worry about it).

    I plan doing alot of things this way, but I'm not sure of all the issues I'll run into yet.

    Thought folks?

    Also, in this case why use EJB at all? Where is the value-add over a JDBC connection.commit() and rollback()? Especailly if you have a lot of infrequently used data.


  7. Though I get the general idea of your excellent (I mean it) solution, I would like to extend it.

    - Model the collection of sequences as an entity, and each sequence as a value object.

    - Provide accessors and modifiers on the entity to operate on sequences.

    Of course, you are better modeling the collection as an entity to benefit from container-managed semantics on the collection.

    The moral of the story seems to be - Don't apply client-side OO modeling principles on the server side.

  8. Thanks for all your replys!

    I agree with your solution of having a collection bean and also your moral! However, how does the entity bean rule of thumb hold that there should be one entity bean per row in the database?

    As it happens one sequence does correspond to one entry in the database (it's not dependant on anything else) and having a sequence entity bean would seem obvious. How does a collection entity bean relate to this rule? Is this reality kicking in?

  9. Reality Kicks In[ Go to top ]

    The EntityBean rule of thumb is that EntityBeans should be course-grained, not fine-grained. So in fact, you do not want one row per entity bean.

    I'd like to hear more about the data involved, but here's my guess at what it would look like (should look like?), given a very poor-man's knowledge of genetics:

    public class SequenceEntity {
      String subjectId; //matches the sequence to an animal type, or particular lab rat

      List sequenceOrder;//the list of sequence members

    public class SequenceMember{
      chromosomeNumber;//some higher level ordering, or maybethe sequenceOrderAbove is a List of Lists by Gene
      sequenceNumber;//the number of this base in the sequence
      baseLetter;//AGCT or U is the domain

    So even though each row might look like this:
    animalId, chromo#, seqNumber, baseLetter

    There are actually many fewer entity beans than rows.

    In any case, there is one large Entity consisting of smaller parts. Both are stored in the table. But perhaps you have another table for which additional, high level information is stored - this is the entity and the sequence table is a dependent object.

    Depending on your analysis (lookjing for patterns in the entire sequence vs. trying to fill in specific parts of a sequence), you may want to have another level between big entity and the dependent List (such as chromosome, etc.), and this could be a candidate for your entity.

    I also think this is EJB 1.1 reality kicking in. The 2.0 spec (see the Javaworld article) talks about how in EJB 2.0 (supposedly downloaded in beta now from BEA - that's exciting), manages these dependant collections for you. one caveat - so far it looks like COllection is the only collection you get, though you definately need a List. I have a major issue with only being able to pass Collections back and forth, I suppose it is a question of how to deal with how the container is supposed to order it. I think even getting a dependent collection back and ordering it yourself later may be sufficient.

    How this helps. Hope to hear more.

    Michael Bushe
  10. Re: Use Command Pattern?[ Go to top ]

    You should really devite from the notion that a row in the table should be respresented as an EntityBean. In your particular case, I don't think it is necessary to make a Sequence an EntityBean since it does not need any transactional features of an EntityBean. You don't need the overhead of using EntityBeans if your objects are read-only.

    I'm currently implementing a web-based catalog repository for use of our commerce server. I initially implemented Item as an EntityBean. This worked fine when I was using our sample database contain only 1k items. But when we tested it in a full database with 500k items, the server went to a grinding halt when users came in and made simultaneous searches. Faced with this, I changed my design. I provided a stateless session bean (Catalog) that does nothing but service search requests from users. I also converted the Item EntityBean to a vanilla Java class. The Catalog returns a collection of Item objects which were fetched from the DB.
  11. Re: Use Command Pattern?[ Go to top ]

    The advantage of using an entity bean, whether it is to represent a single row of data or a collection of data, is that once it has been instantiated, no database access is required to retreive data from it.

    The principle is that since there is only one instance of the entity bean that is shared between all users and providing all updates to the data it represents (one or multiple rows) are done via the entity bean, its cached data is always in synch with the datebase. Therefore transactions are not required for read-only business methods.

    I think the decision on whether or not to use an entity bean should be based on whether it represents a set of data that is frequently 'used' and is not so big that it is unrealistic for it to be permanently cached in the server, rather than whether it represents a set of data that is more often read than updated.
  12. Re: Use Command Pattern?[ Go to top ]

    Does that mean that if it's frequently used then it is a good candidate for an entity bean, or the opposite?
  13. Let's put your words in a different way. More than frequently, if it is prone to concurrency issues we can go for a perfect Enterprise JavaBean.

    Also it is a very good idea to have Read only Data as normal value object instead of having it as an EJb.
  14. Maybe the "the obvious modeling choice is to make a sequence an entity bean" is not the right choice for your architecture.

    I am not an expert in genetics so I cannot really help you, but I strongly advise you to make to follow a analysis & design process to make choice about what to put in EJBs..

    If you've already done that, take a look at the pattern described at http://www.c2.com/cgi/wiki?EjbRoadmap