Building a fast access read/write in memory cache

Discussions

EJB design: Building a fast access read/write in memory cache

  1. Building a fast access read/write in memory cache (7 messages)

    Hi,

    I am new to EJB, so I apologize in advance for any disturbance.

    In my current project we have to build a very fast in memory clustered cache( the cache has to be replicated in all the nodes). This cache is a read/write cache (writes are more than reads), it will be populated at startup, and then it will be updated regularly, without the need to store this data back into the database.

    I am evaluating two approaches:

    1. Using a singleton. I am reluctant to use this approach for the following reasons:
         a. The literature I read recommends using singleton only in domains where it will be used for "read-only".
         b. I am expecting that code will be needed to distribute the events to all the cluster's nodes, so the singletons stay synchronized.

    2. Using a single instance entity bean (the find... methods always returns the same key). This bean will never store anything in the database, and it holds the cached data in a hashmap. As events are received, the hashmap will be updated. Clients will query the hashmap regularly. I feel that this approach will provide us with following benefits:

      a. Method synchronization will be done by the container.
      b. May be it is easier to replicate in the cluster.

    My question is: Can the entity bean used in this way? Are there any better approaches for designing read/write caches?

    I would appreciate any pointers to pattern or literature that tackles this design problem.

    Thanks in advance
    hassib
  2. I think it's going to be very difficult to write an efficient read-write cache (and clustered!) if it's write-mostly. The whole idea of caching is to reduce unnecessary database queries. Then again, if you're keeping all the stuff in-memory for the whole runtime lifecycle of the application, it's not really a cache, is it? It's an in-memory database...
  3. Hi lasse,

    Thanks a lot for your reply.

    May be it was not clear in my original question, but we are not actually caching the whole database. We are caching certain calculated data that don't to be persisted. In our problem domain there will be a high amount of received events that will be used in the calculation, and the processed data will be forwarded to the clients.

    Thanks again
    hassib
  4. I dont see why it should not work. But you might want to think over these points and how it affects your requirements.

    (a) Nothing forbids the container from loading multiple bean instances even for the same primary key. All that the container will ensure is that these instances are kept in sync by calling ejbLoad/store appropriately.

    (b) Entity Bean clustering is normally implemented by just invalidating all peer beans in the cluster when any one of them finishes a transaction. Their ejbLoad() is fired and they load the new state from database. Which means, for a frequently-updated entity bean, there will be a whole lot of database reads going on.

    - Ravi
  5. Hi!

    With the entity bean approach, I don't think you will have the cache distributed in the cluster. The reason is that you are using non-persisted fields in the bean implementation class. Then, data will not be distributed over the nodes in the cluster.

    Another approach is to use the JNDI tree to store the cached objects. (I have been thinking of caching of read-mostly data for long time, and I plan a test to see if that strategy will work.) My idea for WebLogic server 6.1 and later is:
    - [NOT J2EE STANDARD] Load the objects you want to cache in a start-up class. Bind the objects in the JNDI.
    - By using the same start-up class in all servers in the cluster, you will have the same cached data in the entire system.
    - Now the objects are accessable for all EJB:s (and Servlets?) in that server, and can be read. Every EJB can also update the JNDI tree if it wants to change the data in the cache.
    - Access can be wrapped in a nice way using a Singleton helper class. (This is OK as long as the helper class does not beak the spec.)
    - The Singleton can be synchronized. The will beak the spec., so you must really know what you are doing. This may result in bad performance if there are many clients.
    - With the Singeton approach, you will have one Singleton instance per ClassLoader, and that is exactly what you want! (The server is allowed to use many class loaders...) But there is still only one JNDI tree per server!!
    - Do not add an extra cache in the Singeleton. The the singleton instances in (potentially) different class loader will be out of synch.

    I have no code on this, but the issue is so interesting so I plan to test it soon. I will post the code if I do anything!

    And please report your future work!!

    /Tomas
  6. Hi Tomas,

    Thanks a lot for your reply.

    The JNDI approach looks attractive. While i have doubts in terms of performance. In our problem domain, there will always a sustained high influx of events that will be used to update the cache , and the new values in the cache will be sent to the clients. As i explained in my original question our case is "write-mostly".

    What is your thoughts in the performance issue?

    Thanks again
    hassib
  7. Hi hassib,

    What you are setting out to write is a VERY indepth and difficult. Trust me, I speak from experience (I work at Tangosol, which is the leading Java clustered caching software provider). Take a look at Coherence (our clustered caching solution) at http://www.tangosol.com/coherence.jsp, which sounds exactly like what you are looking for. We have a 100% reliable caching solution (both replicated and distributed options are supported) which includes a cluster-wide locking mechanism and a JavaBean event model for changes that occur to cached data.

    Regards,
    Rob Misek
    http://www.tangosol.com
  8. I have never seen the product at work but:

    "Enterprise JavaBeans access data from their local caches at in-memory speed instead of connecting to a back-end data server for each database request. This eliminates contention with other processes for server resources and avoids flooding the network with excessive data traffic. Additionally, EdgeXtend maintains data integrity by automatically synchronizing changes with the underlying database using separate transactional caches. "

    Seems to be exactely what you wnat to do. Take a look at www.persistence.com