Optimizing CMP EJB cache strategy in WebLogic

Discussions

News: Optimizing CMP EJB cache strategy in WebLogic

  1. Dmitri Maximovich has written a blog on optimizing CMP EJB performance in WebLogic, by addressing optimistic concurrency, along with some of the implications and problems caused by doing so. Other application servers might have similar capabilities.

    Read Optimizing CMP Performance in Weblogic with Long-term Caching.

    Threaded Messages (18)

  2. See http://www.jroller.com/page/maximdim/20050328#cache_between_transactions_flush_cache
    for some good tips on how to flush the cache
  3. I always see reference to how optimistic locking works, and I understand it at the fundamental level of detection and implementation.

    But I never see a good explanation of what happens when you DO detect a problem.

    In an interactive environment, you can bubble the transaction back to the user and, I guess, present them with the latest and greatest version of the data, expecting them to make their changes again. That can be a rather rude User Experience.

    You can abort the process, redo the process, etc.

    It just seems to me that structuring your application to redo transactions based on a concurrency exception VASTLY complicates the application. But maybe I'm mistaken.

    Isn't that what has to happen? It's one of those issues where when you get the exception you scratch your noodle and go "now what?", particularly for batch operations where there's no one to ask about correcting the problem.

    So, I'm curious how others structure their apps to respond to these kinds of exceptions as the response is typically handwaved away as an "exercise for the reader" in most discussions about Optimistic Locking that I've seen.
  4. More on EJB/CMP's[ Go to top ]

    Here is a link to another white paper on a similar topic, describing bean states and caches for EJB in general and for Borland Enterprise Server in particular.

    Read: Transaction Commit Options
  5. Will,

    I don't see no contradiction here, it's either your application need to detect concurrent updates or it doesn't. Basically you have a choice between three possible 'levels of isolation' if you want:

    - No locking, no optimistic detection. Recovery doesn’t needed. Should be ready to live with lost updates.
    - No locking, optimistic detection. Recovery is needed, lost updates is not possible. Good performance because records aren't locked in database.
    - Pessimistic locking. Records locked in database. Don't need to worry about optimistic locking and lost updates. Bad performance in multi-user environment, long living transactions etc.

    So choose what level is best for you and use it. Optimistic locking gives you possibility to detect (and recover) from concurrent updates but of course as everything else it comes with a price of need to code for recovery. Sometimes it's trivial, sometimes it's more complicated and there is no unsolvable problems here, in your example with User Experience better option would be for your application try to 'merge' changes (some columns might not be changed at all after all) and then present user with dialog where he can see merged changes and have an option to accept/decline/change etc.
    Is it more complicated that just overwrite someone else's changes with yours - sure is. Is it always acceptable to do so in any application - certainly not.
  6. When you get to higher transaction volumes (number depend on your hardware and configuration can be as low as 50 transactions per second), pessimistic update locking can be faster than optimistic detection. In these scenario the recovery process for optimistic locking which usually boils down to a retry in one form or another (i.e. by a user or the specific coding logic you mentioned) can lead to an avalanche effect as a result of the extra retries generated by failures increasing transaction per second, increasing failures etc.

    Most environments are read mostly so this usually not the case. I agree with Dmitri though, you need to look at the problem, requirements and evironment at hand. I have found optimistic detection powerful and simple for most scenarios, but there are exceptions. With regards to concurrency warnings, I have found that prompting the user about the concurrent updates can actually be a feature for quite a few environments. For example in CRM or ticketing type scenarios this usually indicates a duplication of work.
  7. When you get to higher transaction volumes (number depend on your hardware and configuration can be as low as 50 transactions per second

    How many TPS is considered high?
  8. Depends on the environment[ Go to top ]

    Its not really an absolute number, it depends more on what the code is doing, the database, the hardware it is running on, the network latency (if distributed) any several other factors. Its not a TPS limit, just a cross over threshold specific to an action in a given environment. You can estimate what that TPS is using a simple probability model based on the time it takes per request vs. number of request per second and get a collision probablity. Alternately the best approach is to do some good load testing to give you a number where optimistic detection is worse. You always need tolerances for other environmental factors.

    In a good design you can use optistmic detection for the majority of the transactions that are read mostly and pessimistic locking in high throghput sections (e.g real time stats). This usually gives you the best of both worlds.
  9. You really have long running, complicated transactions, which are not easy to rollback in the case of optimistic locking exception, and you really need better performance, I think the best solution is to implement in a stored procedure inside the DB. Enterprise DBs like Oracle do caching themselves and of course have matured transaction handling also. In my eyes this is the most straightforward, easy to implement, solution.
  10. I agree with Manuel, of course Optimistic locking is not always faster, everything is depends on your specific problem domain. By definition, locking is 'optimistic' because it assumes that lock collisions are rare which could or could not be true for your specific task. If your application mostly read data that's reasonable assumption, if data is mostly updated (OLTP) that could not be the case. Estimate collision frequency, run some tests and then decide what's best for you.

    Btw original post was about long term caching of data and when you do that you don't want pessimistic lock to be held in database so you pretty much have a choice between optimistic collision detection and no detection at all.
  11. There are no silver bullet[ Go to top ]

    The point of optimistic concurrency control is that the first time you run the transaction, you can use potentially stale data, and it doesn't muck up the database. Let me explain:

    1. App is deployed on 2 servers
    2. Server 1 runs a TX that includes an EJB with PK "A"
    3. Some time later, Server 2 runs a TX that modifies the EJB with PK "A"
    4. Some time later, Server 1 runs a TX that tries to modify EJB with PK "A"

    In this case, the Server 1 can use the cached EJB values for PK "A" b/c it has them from step #2. It doesn't care whether another server has changed the data or not, because it's going to verify that when it tries to commit the TX.

    What I showed above is the worst case scenario. The best case is when the data is read-intensive, and when the write TXs end up committing the first time (because the data wasn't stale).

    Implementing it with pessimistic transactions would mean both locking the data AND re-reading it. Doing that for every TX could increase the load on the database by 100x or 1000x quite easily.

    That's why WL supports optimistic transactions and the "cache between transactions" optimization. To clean up stale data, WL also has options for clustered messaging which will invalidate the stale data out pretty quickly and mostly reliably.

    (And of course if that isn't enough, you always have the option to use Coherence ;-)

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Cluster your POJOs!
  12. Why Coherence is better?[ Go to top ]

    Cameron,

    If WLS invalidating cache across the cluster already why would using Coherence is better option here? From what I know about Coherence (not much) is that it could/would invalidate cache transitionally (I think it's debatable if this is always superior). Or it distributes cache updates instead of just invalidation info? Would be interesting to know your opinion. Thanks.
    To clean up stale data, WL also has options for clustered messaging which will invalidate the stale data out pretty quickly and mostly reliably.(And of course if that isn't enough, you always have the option to use Coherence
  13. Why Coherence is better?[ Go to top ]

    If WLS invalidating cache across the cluster already why would using Coherence is better option here?

    Coherence can avoid the staleness issue altogether, and generally does so with less network traffic than WLS uses for non-reliable invalidation.

    The result is that instead of throwing out (invalidating) the data that your app is working with, that the cache will instead have the up-to-date values ready for the app to use.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Cluster your POJOs!
  14. Why Coherence is better?[ Go to top ]

    Coherence can avoid the staleness issue altogether, and generally does so with less network traffic than WLS uses for non-reliable invalidation

    For invalidation you just need to transport object identifier of some sort vs transporting whole object state so how is that possible to use less network traffic in this case?

    Moreover in WLS your transaction doesn’t have to block for cache synchronization, invalidation can happen asynchronously (doesn't mean it actually does in practice).

    Async invalidation outside of transaction with optimistic concurrency detection sounds like a good alternative and when implemented properly with everything else being equal should have better performance than coherent cache, don't you agree?
  15. Why Coherence is better?[ Go to top ]

    For invalidation you just need to transport object identifier of some sort vs transporting whole object state so how is that possible to use less network traffic in this case?

    e.g. Multicast vs. point-to-point. If you have to multi-cast the id, it uses a fixed length packet no matter what. Multi-cast across the whole cluster vs. updating just the node that manages (owns) the value within the cluster and point-to-point invalidating only the node(s) that have the stale data will use less network bandwidth.
    Async invalidation outside of transaction with optimistic concurrency detection sounds like a good alternative

    It is a good alternative.
    .. and when implemented properly with everything else being equal should have better performance than coherent cache, don't you agree?

    Every solution exists for a purpose; there is no "one size fits all". In a large-scale cluster, a coherent partitioned cache scales extraordinarily well compared to an invalidation approach because (with invalidations) you're just transferring the resulting load (cache misses) back to a single-point-of-bottleneck (the database).

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Cluster your POJOs!
  16. Why Coherence is better?[ Go to top ]

    Multicast vs. point-to-point. If you have to multi-cast the id, it uses a fixed length packet no matter what. Multi-cast across the whole cluster vs. updating just the node that manages (owns) the value within the cluster and point-to-point invalidating only the node(s) that have the stale data will use less network bandwidth.

    I assume that you referring here to Distributed cache where each piece of data in the cache is held on only one cluster node. In this case, speaking strictly about network traffic, you could (potentially) save on update operation (I guess only when number of nodes are quite high) but then on read each node need to transfer object over network N-1 times, very much the same number of network calls as with distributed cache with invalidation, so (in terms of network traffic) there isn't much savings here. Since they're requesting the same piece of data that data would be cached on database server as well so there going to be no disk reads in this case also. Probably when reading from database overhead is higher because of transactions etc. We could argue what would be faster but I guess it's depends on the task.

    Peace ;-)
  17. Why Coherence is better?[ Go to top ]

    I assume that you referring here to Distributed cache where each piece of data in the cache is held on only one cluster node.

    Owned by one cluster node. Other nodes could also have the same data cached, but the owner node would be aware of that, and would only have to communicate changes or invalidations to nodes that did have the data cached. (This is our LISTEN_PRESENT near cache setting.)
    In this case, speaking strictly about network traffic, you could (potentially) save on update operation

    Correct.
    (I guess only when number of nodes are quite high)

    I think three nodes would show it ;-)
    so (in terms of network traffic) there isn't much savings here.

    As I said, it depends. One size doesn't fit all, and I wasn't trying to convince you otherwise. That's why we support a large variety of caching options, including replication, partitioning, invalidation, etc.
    Probably when reading from database overhead is higher because of transactions etc.

    No, the overhead is higher because all the J2EE servers go to the same single database server. So even if it's a bigger server and it is very well optimized (both of which are generally true), it will still die under the load. Having personally witnessed some of the largest database systems in the world dying (more correctly, "struggling extremely slowly") under load from J2EE applications, I have some reason to believe this. In fact, it was a result of working with the Precise (now Veritas) SQL and J2EE monitoring tools at many different large organizations that we realized that there was a huge need for reliable coherent caches for J2EE applications, and so we wrote Coherence.

    However, I will never be able to convince you of something that you don't want to believe. Drop me an email (my first name at tangosol.com), and I can put you in touch with a reference customer using Weblogic that dropped their page gen time from over 15 seconds average down to 18 milliseconds. As they say, the proof is in the pudding ;-)

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Cluster your POJOs!
  18. Why Coherence is better?[ Go to top ]

    I will never be able to convince you of something that you don't want to believe.

    You will be able to convince me easily by presenting logical explanations :-). I take your point about spreading load between multiple computers versus hitting one on every request.
    Still have hard time to justify why multicasting couple bytes of Object id in local segment could consume more network traffic than transferring whole object state (couple hunderd bytes usually) even to one node (usually two because of backup storage) though ;-)
  19. Why Coherence is better?[ Go to top ]

    Still have hard time to justify why multicasting couple bytes of Object id in local segment could consume more network traffic than transferring whole object state (couple hunderd bytes usually) even to one node (usually two because of backup storage) though ;-)

    It may be hard to believe, but 1 byte or 100 bytes or 1000 bytes .. they all seem to use the same amount of network bandwidth in practice (one fixed length datagram packet).

    With two or three machines, relatively heavy multicast traffic on GigE isn't a big problem. However, that doesn't scale .. again, I've seen the results in the field (we fully support both point-to-point and multicast protocols and dynamically switching between them).

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Cluster your POJOs!