Discussions

Performance and scalability: Cache-based solution to N+1 database calls problem

  1. I would like to describe a solution for the N+1 database calls problem in the bean-managed persistence (BMP) entity beans (EJBs) scenario. Our organization is considering implementing it, so I would appreciate any comments. Our application is servlet-based, and the described solution involves a cache implementation. The following describes a read-only entity bean:


    I. Infrastructure


    1. A single custom primary key class (PK for short) that holds all primary key values for all entity beans.

    2. A single custom data class (Data for short) that holds all data for all entity beans.

    3. A singleton that contains a map whose entries have entity bean IDs (possibly entity bean class names) as keys to other maps whose entries have PK instances as keys to Data instances.

    4. The singleton contains a map that holds HTTP session IDs as keys to PK lists.

    5. The singleton contains a map that holds HTTP request context IDs as keys to PK lists.


    II. Behavior


    1. Each ejbFind<METHOD> method of each entity bean gets the primary keys and the data needed by its ejbLoad() callback from the database. It then puts the PK instances inside the singleton under the entity bean ID as keys to Data instances containing the data retrieved and returns an enumeration of the same PK instances to the EJB container.

    2. Each ejbLoad() callback gets its data from the singleton, not from the database.

    3. Each ejbFind<METHOD> method of each entity bean feeds the singleton with a map entry whose key is the HTTP session ID of the HTTP session in which the finder method was invoked and whose value is the list of PK instances retrieved from the database (inside the singleton, each map entry whose key is a particular HTTP session ID contains all PK instances retrieved by all invocations to finder methods from inside a particular HTTP session).

    4. Each ejbPassivate() callback removes from the singleton all PK to Data map entry whose PK is not contained in any HTTP session ID to PK map entry.

    5. Each HTTP session invalidation calls a special home interface method that removes from the singleton any map entry whose key is the invalidated HTTP session ID.

    6. An entity bean may have a special finder method that feeds the singleton with a map entry whose key is an HTTP request context ID and whose value is a list of the PK instances retrieved from the database. The HTTP request context ID is a regular HTTP request parameter value used to unify a set of requests as belonging to the same context.

    7. An entity bean may have a special business method that returns bulk data to the client, based on a list of PK instances keyed by a particular HTTP request context ID (in EJB 2.0 this can be done in a ejbHome<METHOD> method).


    III. Benefits


    1. All clients use the same data, for the singleton is not an EJB, so the data is passed to the ejbLoad() callback by reference, not by value.

    2. Only data in use by active HTTP sessions is cached.

    3. Unused data can be freed by ejbPassivate(), but this only happens when the EJB container actually needs to have some memory deallocated, that is, this only happens when ejbPassivate() gets called.

    4. The data caching maximum memory needs can be calculated based on the amount of data in the database, not taking in account the number of concurrent users.

    5. The cache, that is, the singleton benefits from the container built-in memory management mechanisms to do its own memory management, for both the cache-singleton and the entity beans live under the same JVM. When the container detects there is no memory left for a new particular entity bean to be loaded this means there is no memory left for data caching as well. The container then calls ejbPassivate() on a number of entity beans which has the side effect of clearing the cache for a number of HTTP sessions related data.

    6. Browsing mechanisms can be used to only invoke business methods on EJBs contained in a particular page, avoiding massive RMI calls.

    7. Bulk business methods can be used to get massive data from the database when browsing mechanisms are not possible, as described in the last two behavior points.


    RFC

    Guilherme Gusmão da Silva
  2. You might run into problems if you try to handle updates.
  3. OK, I should have read the original post more carefully, "Read Only Entity Bean" ...

    sorry.
  4. We have a situation here. We have to cache data into some place. This data is not changed frequently. Hence to avoid the db hits, the cache.
    Can anyone tell me which is the best place to cache data.
    We use JSP, EJP with weblogic 5.1
    The options which came to me are:
    storing in a bean with scope as application
    storing in the work spaces of weblogic
    storing in entity bean and referencing from stateful session bean.
    Can any one clarify the above and please explain each in detail and specify why the advised option is better than the others.
    Thanks,
    Nicklesh
  5. That's all depends on sort of data and your caching needs - for read-only/read-mostly db data you can use Entity Beans, for caching data in the presentation layer (JSP) you can use CacheTag (starting with sp6 (or sp5 - dont remember)) - it is useful for caching JSP output data, or input (calculated) data.
  6. Workspaces in Weblogic is definitely not a good idea - the next version of Weblogic (6.0) is phasing out workspaces. In fact I think it is not even supported.

    How about using a static Java HashMap to store your data?
    --Das
  7. Hi,
    Using HashMap, yes one can store data. But is it available for more than one session?
    Thanks,
    n
  8. Yeah. The HashMap will remain alive until the JVM is alive. The HashMap will be loaded only once by the JVM (when it is called the first time). But you have to make sure that you use a synchronized HashMap to avoid any threading issues. Also this should not be part of the EJB - it should be queried by EJBs just like a standalone singleton Java class ...

    --Das
  9. Hi!

    I am no EJB guru, but I really have to learn that stuff now.
    As far as I know each singleton is a BOTTLE NECK!!!,
    especially if you are scaling up to several server machines.

    Imagine you have 10 machines,
    only one can have the singleton!!!,
    otherwise it is no real singleton.
    If the singleton is very important/needed/central
    (and it sounds very important in your solution)
    all other 9 machines have to wait until the singleton
    is unlocked.

    Such singletons are always problematic!!!

    This is a very important design discussion in my opinion
    and for me. Am I wrong?

    Thorsten van Ellen

    Please forward your answers to:

    mailto:EJBSingleton at web dot de

    This Email is valid only for 30 days (only this month).
  10. Hi, Thorsten.

    You are right. There are a couple of problems in my solution, and one of them is the scalability problem. One solution is having only one machine running the singleton, as you said. The other is synchronizing all instances of the singleton running on different machines. In fact, synchronizing different singleton instances, in this situation, involves only propagating finder method invocations, that is: when a finder method is invoked in one machine, it must be invoked in all machines, to guarantee each singleton gets up to date. Each propagated finder method call can go to the database or to the singleton originating it. Sure this means additional infrastructure, but...
  11. Remember Singleton means Single point of contention so you are not only having contention across the servers but within the same server. Which means at any given time you have just one request being processed.

    M