Performance and scalability: Session bean with static HashMap cache - scalability problems.

  1. Hi all,
    We are facing severe scalability issues in using a "static HashMap cache" from application EJBs. The architecture is App SLSB -> Cache(SLSB) -> static HashMap (object graph keyed by some unique string), the first time around. Then it is AppSLSB -> static HashMap (we store a reference to the static HashMap to avoid repeated lookups to the Cache SLSB.)

    The static HashMap is loaded with data from a the database at the very outset. The data is essentially a large object graph representing some business data which is commonly shared by all users. After that first instantiation, the static data is all read-only.

    The way we are using it is : The application SLSB stores a reference to the static cache after the first usage. Then onwards, the app SLSB directly calls the get method on the static HashMap. So, we may have say 20 beans having references to the same static singleton HashMap & call get methods. The HashMap is not synchronized and the idea was to remove the remote calls via the Cache SLSB to get to the HashMap after the first call.

    With normal usage (in our case about 50 concurrent users), this seems to work okay. However, when the number of users increases to about 400, I see the "gets" take *huge* amounts of time. From the usual 9 ms it goes to about 90 *seconds*! And it is simply a get from a static HashMap we're talking about!

    For the time being, we switched the logic to use the database each time & surprisingly it uses much less time than the 90 *seconds*.

    I'm wondering what's the reason for this. Can someone please explain why this could be happenning?

    We have a Weblogic 5.1 container - EJB1.1.
  2. Hi Pratap,

    I don't have an exact answer for you, but I have seen something similar. The problem in WLS appears to be related to a resource synchronization problem with regards to SLSBs which is only evident under load. Because of this, under contention, you actually get worse performance than if you just directly used the DB. The best approach is to use a ejb examination tool (optimizeit) and confirm that you have a contention problem, and exactly where you have it. From there, best choices are try 6.1, clustering, or you may have to re-write your architecture to work around the lock.

    Also, since effectively you have a synchronization issue I wonder if HashTable might not in fact be faster. You might try it.

    It's hard to be more specific without code.

    Tim Stefanini
  3. HashMap is not thread synchronized. So it could not be the source of contention. What is your pool size for SLSB? The contention could be beacuse of a pool which is too small compared to the number of concurrent users.

  4. Pranab,
    We tried various numbers for the SLSB pool size. We actually increased it all the way to 250 beans (for 400 users). But even before we hit this max usage, the server ground to a painstaking pace.

    The number of application EJBs (the offending ones) never reached max capacity configured. I think it hit about 150 or so and then the delays began.

    Any other ideas?
    Thanks for your suggestions.
  5. Tim,
    Thanks for your reply.
    Is the WLS contention issue a known bug? I'm not sure if this was the reason - we monitored the WLS Console during the stress test and basically narrowed down the offending code to the call to the static HashMap from several application EJBs.

    When we replaced this cache code to go to the DB instead, the timings came straight down to 9ms territory.

    > Also, since effectively you have a synchronization issue > I wonder if HashTable might not in fact be faster. You
    > might try it.

    Hmmm ... I tried it with both a synchronized HashMap (using the Collections.synchronizedMap call) and also a regular user managed non-synchronized HashMap. The timings weren't significantly different.

    I guess to find the real cause, I'll need to get a profiling tool, huh?
    Thanks much for your suggestions.