TSS Article: Enhancing Web Application Performance with Caching

Discussions

News: TSS Article: Enhancing Web Application Performance with Caching

  1. Effective caching strategies can both lower the memory footprint and speed up your application. In this article, Neal Ford shows you how to implement the Flyweight Design Pattern as a caching mechanism on a sample application. He tests the app to measure heap size and garbage collector activity both before and after implementing caching, using a combination of JMeter and the OptimizeIt Profiler.

    Read Enhancing Web Application Performance with Caching

    Threaded Messages (47)

  2. thread safety ?[ Go to top ]

    Is there any lock from moment of sorting
    till a List is used to populate output?
    If yes, it is no supposed to be scalable,
    if not, it is not thread safe...

    Just as a mind exercise, I guess time frame
    between sort and output in 10 ns, considering
    several simple in-memory calls. Take 20 ms of database
    access and 30 ms average overall responce time.
    Having 50 threads, we can get 150 TPS. Each request has
    10 ns of unsafe time. or 0.001% (call it "A")

    Next is classical calculation of chances NOT to have collision.
    Approximate situation by assuming random distribution:
    1 - 100%
    2 - 100-A chance to NOT collide
    3 - (100-A)(100-2A)
    ....
    N - ...... (100 - N * A)

    Since A is small -> we have final (aproximate) result as
    100 - A*(N**2)/2 = 100 - 11 = 89 %

    Now we see that A*N is nor SO small, thus 89% can be
    sligly revised to higher end, to, say, 92%.

    Any way , collision will be aproximately once in 12 seconds.

    Alex V.
  3. thread safety ?[ Go to top ]

    Is there any lock from moment of sorting till a List is used to populate output? If yes, it is no supposed to be scalable,if not, it is not thread safe
    One simple solution is never to invoke sort() on a shared reference. Instead clone the list whenever it's modified, sort the clone, and then assign the clone's reference to the shared reference. Only writers of the list must synchronize to preserve the order of writes. Reads of the shared list safely occur without the penalty of synchronization. For a read-mostly list this would happen infrequently, and so be more scalable than you suggest.
  4. thread safety ?[ Go to top ]

    Is there any lock from moment of sorting till a List is used to populate output? If yes, it is no supposed to be scalable,if not, it is not thread safe
    One simple solution is never to invoke sort() on a shared reference. Instead clone the list whenever it's modified, sort the clone, and then assign the clone's reference to the shared reference. Only writers of the list must synchronize to preserve the order of writes. Reads of the shared list safely occur without the penalty of synchronization. For a read-mostly list this would happen infrequently, and so be more scalable than you suggest.
    I use more trivial solution for this problem. Cached objects are always readonly. If I need to sort list in memory then i sort a clone, but I never assign this clone to the shared reference. I am not sure it is a very good way, but it works for me. It is possible to implement very "clever" cache this way (I have not tested this way at this time)

    /**
     * returns shared reference, any "write" on instance throws IllegalOperationException
    */

    Object getReadOnly(Object key);


    /**
     * returns copy of cached instance,
    */

    Object getForWrite(Object key);

    It is possible to implement some interface for all cachable objects, it can be implemented by AOP transformation or manualy for system classes as wrappers.

    interface Cachable extends Cloneable{
      
      void setReadOnly( boolen v );

    }

    It is not very "transparent", but It must be not a very big problem to lose cache transparency for performance.
  5. Thread safety[ Go to top ]

    As far as I can see there is a threading issue, at least conceptually.

    There is a fairly big gap between the exists check "if (productDb == null)" and the setting of the value. The code is in the init method so all is okay. The article mentions this also, but then suggests that the code could be moved to the doGet method, and then there is a problem if during the initialisation another request hops in.

    So IMHO either make the check really thread safe, or just leave it out. But do not suggest moving the code. Afterall, this is an educational article.
  6. Thread safety[ Go to top ]

    To build this type of list I would suggest a initialization servlet. You specify to the container that there should be only 1 object of this type of Servlet and that init should be called on deployment. This is a great way to initialize these types of caches.

    The doPost/doGet method can then be empty.
  7. Thread safety[ Go to top ]

    Why use an initialization servlet? Isn't this what ServletContextListeners are for?
  8. Thread safety[ Go to top ]

    Donald: Why use an initialization servlet? Isn't this what ServletContextListeners are for?

    You beat me to it ;-)

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Clustered JCache for Grid Computing!
  9. Thread safety[ Go to top ]

    I have been using this initialization method for a while and will look at the merits on using ServletContextListeners,

    thanks,
    David
  10. Thread safety[ Go to top ]

    Listeners are fine, but init servlets are nicer since
    * you can have multiple instances of the same class with different init parameters
    * they have a stable identity, so you can remember stuff in an instance variable from init() to destroy() instead of stuffing it into the application context.
  11. Thread safety[ Go to top ]

    To build this type of list I would suggest a initialization servlet. You specify to the container that there should be only 1 object of this type of Servlet and that init should be called on deployment. This is a great way to initialize these types of caches.
    See: ServletContextListener.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Clustered JCache for Grid Computing!
  12. What if we can't...[ Go to top ]

    Some of us are forced to use old app servers that only support the 2.2 Servlet spec :-(

    Ryan
  13. Web Caching!!![ Go to top ]

    The Web Caching explained here I feel is an OK technique for storing small amount of Data but I feel that this is of no help when it comes to caching requirements for an large enterprise application. Do you have any techniques where-in we can use something similar for large scale data caching.
  14. Web Caching!!![ Go to top ]

    What do you mean by "large scale"? Do you mean you want to cache much more data, or do it across many machines? If you can explain your requirements, I can suggest some approaches from what we've seen.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Clustered JCache for Grid Computing!
  15. Web Caching!!![ Go to top ]

    Hi Cameroon,

    Ours is a Huge trading application and we are in process of performance improvement process. The requirement is to process 500,00 trades in a given time window. These trades can be in any form... from Files, Messages, SWIFT etc...

    We perform lots of business validations which refers to our Datawarehouse Data which is dynamic in nature. But this is the master data against which we perform validations. Now my requirement is to store this data in a cache, which will be scalable, which will be neartime (frequently updated) and which will be fast. I want to in-turn use this cache data to perform my business validations.

    Sanjeev
  16. Web Caching!!![ Go to top ]

    The Web Caching explained here I feel is an OK technique for storing small amount of Data but I feel that this is of no help when it comes to caching requirements for an large enterprise application. Do you have any techniques where-in we can use something similar for <bold>large scale data caching</bold>.
    Offline operation is a requirement of my mobile web application. Large durable caching is unavoidable. When the Swing client is online it synchronizes its cache (an RDBMS) by a combination of database replication and resubscribing to feeds from non-relational legacy enterprise information systems. Database replication is a conquered problem, and read-only replication is trivial.
  17. This is an antipattern[ Go to top ]

    I think this pattern is useful only in very rare cases and the example the author gives (the product list) is not one of those cases. It just doesn't scale from a design point of view, that is, it gets very complicated and error prone as you add search and sorting criteria. And do you really want a pattern that makes you change your design completely if it turns out that the product list is updated more frequently than you thought? Do you really want sorting done in memory without indexes? What about filtering? Is sorting a single page really a useful thing? What if you want something like a list of all products in a certain price range sorted by category and name? Generally, it doesn't make sense to replicate the functionality of a DBMS in the application.

    I see that this pattern has advantages in comparison to holding a full product list for each session, which is clearly nonsense. But compared with simply retrieving the product list (or just the right page) from the database on each request, this pattern is uselessly complex and inefficient. The DBMS does caching anyway, so if the product list is retrieved very often, it will be in the cache and concurrency, updates and cache expiry are handled automatically. Yes there's a network roundtrip to the DBMS, but on the other hand, we know about the negative effects of in-memory caching on the garbage collector (a full collection takes increasingly more time).

    It should be clearly stated where a pattern like this is appropriate: It is when you have a complex fairly static object graph, that is expensive to recreate. Expensive to recreate means that it has lots of the kind of relationships that RDBMS are not particularly good at, like hierarchies or deep containment relationships. Lists of things in different sorting orders and views is exactly the thing you would _not_ want to cache in this way. Views, sorting, filtering and the like need indexes and those indexes are in the database, so it never makes sense to do these kinds of things outside of the DBMS process. If you find yourself using the subList or subMap or sorting methods in a data oriented application, this should be the point where you ask youself: Am I doing something wrong? And the answer will most of the time be: Yes, you're replicating DBMS functionality in your application and that's bad.

    But if you have such a static complex object graph that has only one useful view, then you should use a professional caching package, that takes care of cache expiry and concurrency.

    -Alexander
  18. This is an antipattern[ Go to top ]

    Yes it is beter not to use app level cache in the most of cases, but sometimes I need to do it for performance reasons and to simulate materialized views in client memory. Probably "product" is not a very good example, content cache must work better for it (return static pages from memory or file, OS caches files too). It is possible to find use cases for object cache too, but I agree, it must be better to find the way without workarounds first.
  19. This is an antipattern[ Go to top ]

    I completely agree with you. I'm not against caching views of complex objects, be that JSP pages or object graphs. What I oppose is creating those views in the application when it's clearly database task as in the case of sorted and filtered lists.
  20. I think this pattern is useful only in very rare cases and the example the author gives (the product list) is not one of those cases...I see that this pattern has advantages in comparison to holding a full product list for each session, which is clearly nonsense. But compared with simply retrieving the product list (or just the right page) from the database on each request, this pattern is uselessly complex and inefficient. The DBMS does caching anyway, so if the product list is retrieved very often, it will be in the cache and concurrency, updates and cache expiry are handled automatically.
    The decision as to whether or not the pattern (and caching in general) is useful depends a lot on the complexity of the objects to be cached and how they will be used by the application. I've worked on several "catalog" projects, and I think there is a lot of value in caching products. While first generation product catalogs were generally modeled on a simple database schema, current product "objects" may retrieve data from multiple databases, back-office systems, content management systems, etc. as well as have calculated values (both persisted and non-persisted) to help users find what they're looking for. I would contend that a product list is actually a good example of what to cache in most cases.

    Where things get complicated is in retrieving lists of products. Generally, there are times when you could define common lists of product keys in a cache and times when the list of product keys will have to be retrieved from the database. In either case, depending on the complexity of the product object, it still may make sense to use the product objects stored in the cache once the list of keys is created.
  21. The decision as to whether or not the pattern (and caching in general) is useful depends a lot on the complexity of the objects to be cached and how they will be used by the application. I've worked on several "catalog" projects, and I think there is a lot of value in caching products. While first generation product catalogs were generally modeled on a simple database schema, current product "objects" may retrieve data from multiple databases, back-office systems, content management systems, etc. as well as have calculated values (both persisted and non-persisted) to help users find what they're looking for. I would contend that a product list is actually a good example of what to cache in most cases.Where things get complicated is in retrieving lists of products. Generally, there are times when you could define common lists of product keys in a cache and times when the list of product keys will have to be retrieved from the database. In either case, depending on the complexity of the product object, it still may make sense to use the product objects stored in the cache once the list of keys is created.
    I agree that if you have a product list or product detail objects, that are expensive to recreate and you have only a single view on them, then an in-memory caching strategy might be the right solution. However, if you start adding searching, sorting, grouping and the like, it becomes inefficient and inflexible. Ineffiecient because you do it without indexes and without the help of an optimizer that calculates the right access paths.

    But performance is not my main concern since it might not be a problem if the product list is small. My main point is flexibility of change. In my experience, systems that hard code sorting, searching, grouping, views, etc. become incredibly resistent to change. And that's because it looks so simple in the beginning (just as in the article) but gets out of control very soon. The complexity of a SQL statement grows fairly smoothly as you add searching, sorting and grouping criteria. The complexity increase of procedural code that does the same is insane.

    Again, I'm not against caching as such (although I believe out of process caching makes more sense in most cases) but it should be done by comparing the SQL statement (or some other representation of the query) to see if the same query was submitted and cached before. If, as in your example, the data comes from various, possibly slow, backend systems, I would temporarily store it in the database that services the web application so I can use DBMS facilities to query it.
  22. I think it is better to use something like "city" or "country" as example, it is not readonly data, but I see no problems with transactions and concurency control for this kind of data and this is a good candidate to eleminate joins.
    I found complex queries is a good candidate for this optimization too (it is like materialized views), but I can not find a good and trivial example for this cache.
    Content cache is a very good optimization for web applications too, it is good for content management systems to cache "news", "articles", single user modifies this content (content manager) in most of cases and it is possible to generate this content offline and to upload it as static HTML pages too ( no concurency control and DB access for "read" ) this kind of "homepage" can scale on average PC.
  23. This is an antipattern[ Go to top ]

    Hi Alexander
    Generally, it doesn't make sense to replicate the functionality of a DBMS in the application.
    I agree with this sentiment - however caching that is invisible to the application (i.e. does not add any complexity ) is much faster than relying on the DBMS and a good idea IMHO.
    The DBMS does caching anyway, so if the product list is retrieved very often, it will be in the cache and concurrency, updates and cache expiry are handled automatically. Yes there's a network roundtrip to the DBMS
    In our experience caching in VM is at least an order of magnitude faster than the DBMS.

    In JDO Genie we automatically cache JDOQL queries involving classes that have caching enabled. The cached query results are automatically evicted when classes involved are changed. This is completely invisible to the application so the complexity and design costs you describe are eliminated.

    Cheers
    David
    http://www.jdogenie.com
  24. Performance difference[ Go to top ]

    David: In our experience caching in VM is at least an order of magnitude faster than the DBMS.

    Using our cache software, one of our [reference] customers dropped their dynamic page times from 15 seconds to 18 milliseconds (measuring "time to write last byte.") That's three orders of magnitude, without any content caching (i.e. 100% dynamic page creation with just data caching.)

    Most applications can benefit significantly just by caching reference data that changes rarely or not at all. As for the comments of "is caching the product catalog worth it," the question really is: "Do you want to be able to handle more than a few users on your site?"

    As far as the paper goes, it was explicit in that it was referring only to read-only data. That data is _very_ safe to cache, and you don't even need to worry about clustering concerns (since the data isn't changing.) Most of the time, your friendly neighborhood java.util.Hashtable will do the job quite acceptably.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Clustered JCache for Grid Computing!
  25. One technique I've used in the past is to implemnt a cache using a JNDI object with bind/rebind operations for locking when writing. that's worked very well for read-only and mostly-read data. When the object changes, it's replicated across the cluster. I have a manager class that centralizes locking of the object, and doesn't lock the object during reads.
  26. One technique I've used in the past is to implemnt a cache using a JNDI object with bind/rebind operations for locking when writing. that's worked very well for read-only and mostly-read data. When the object changes, it's replicated across the cluster.
    We're using this technique as well for implementing a cluster-wide read-mostly cache, but provide a version number per JNDI node to avoid reading complex object graphs from JNDI everytime, as the deserialization would take too long. This way, each managed complex object is cached JVM-locally and upon reading, it's version is compared to the one in JNDI. If it differs, the new value is read from JNDI, elseway the local value is returned. Updates are written to JNDI directly, the version is incremented, and the local cache is invalidated.

    Cheers,
    Lars
  27. Performance difference[ Go to top ]

    As for the comments of "is caching the product catalog worth it," the question really is: "Do you want to be able to handle more than a few users on your site?" As far as the paper goes, it was explicit in that it was referring only to read-only data. That data is _very_ safe to cache, and you don't even need to worry about clustering concerns (since the data isn't changing.)
    I agree with you in general, but have some points to add:

    The problem with the aforementioned preconditions is that you will not find many cases where they actually are met, as all instances to be cached must be deeply immutable and no modifications must be done via the database as well (the cache would not know when to refresh). On the other hand, a cache holding data that is subject to concurrent updates is not something you want to implement on your own more than once. :)

    I think the problem of these simplifying articles about performance issues is that if things _were_ that simple, performance problems would not appear at all.
    Most of the time, your friendly neighborhood java.util.Hashtable will do the job quite acceptably.
    Actually, HashMap should do better if you have a read-only cache, as you do not need synchronization. And if you want a simple LRU cache with a given maximum size, subclass java.util.LinkedHashMap like this:

    public class LRUCache extends LinkedHashMap {
        private final int maxSize;

        public LRUCache(final int maxSize) {
            super(maxSize, .75f, true);
            this.maxSize = maxSize;
        }

        protected boolean removeEldestEntry(final Entry eldest) {
            return size() > maxSize;
        }
    }

    Cheers,
    Lars
  28. This is an antipattern[ Go to top ]

    We have seen great improvements in scalability by caching our read-only data in memory and performing in memory sorting for simple things like columns or genres.

    We used to go to the database everytime for this type of data and we found that as our subscriber base increased passes the two million mark did not scale. On heavy days like Friday the application server would go into thread starvation as all the threads would get tied up doing database calls and the network trip.

    Creating simple read-only caches for our products pulled us back from the edge and allowed us to keep the server farm at its current size. We then use Quartz to trigger a refresh the caches at 3am. Strictly the refresh could cause a threading problem but since it is only a issue briefly at 3am (We are a US only company) I will take the hit. We also put a nice JMX interface on the caches to monitor there use.

    For the more complex type of searches/sorts yyes we go to the database. However since 90%+ of people generally just browse our catalog and don't do these types of searches then this still seams a good thing to do.
  29. This is (not) an antipattern[ Go to top ]

    Sorry meant to change the title
  30. This is an antipattern[ Go to top ]

    We have seen great improvements in scalability by caching our read-only data in memory and performing in memory sorting for simple things like columns or genres. We used to go to the database everytime for this type of data and we found that as our subscriber base increased passes the two million mark did not scale. On heavy days like Friday the application server would go into thread starvation as all the threads would get tied up doing database calls and the network trip.Creating simple read-only caches for our products pulled us back from the edge and allowed us to keep the server farm at its current size.... However since 90%+ of people generally just browse our catalog and don't do these types of searches then this still seams a good thing to do.
    I don't doubt for a second that caching can improve scalability. However as you say yourself, it only works for simple searches. The problem that I have come across frequently is that requirements change and what once was a simple search becomes a more complex search and nobody wants to redesign the system, so it degrades into a total mess over time.

    The reason why I think the pattern presented in the article is an antipattern is not so much that it does caching at all, but rather how it does the caching. The caching is not transparent. It hard-codes stuff in the application code that just doesn't belong there. It may make the system more scalable in some situations but it makes the development process scale worse.

    There are better ways to do caching. Why would you not cache query results keyed by query parameters? This way you reduce roundtrips to the DBMS but when the data is eventually refreshed, you pass the job on to the DBMS where it belongs.
  31. This is an antipattern[ Go to top ]

    I must admit I try and take the most simple approach as much as possible to get the deliverable done and meet my functional and non functional. I know this can go against the grain. I often sit there thinking about all the wonderful things the business could ask for and how I can do them if I implement some more design. Howeer I have been seldon successful guessing requirements successfully.

    "There are better ways to do caching. Why would you not cache query results keyed by query parameters? This way you reduce roundtrips to the DBMS but when the data is eventually refreshed, you pass the job on to the DBMS where it belongs."

    This is a good point. I would try and separate the difference in concerns between caching the product data and static category structure for simple browsing and trying to process/cache results of user entered queries. These I would hand off either to the database or I have been looking at Lucene.

    You make some good points :-)
    David
  32. In web apps, whether cache just data or cache entire pages is always a tricky question. Obviously page caching gives significant benifit (to get a feel for the performance gains- try accessing any web app- intranet or internet- with browser page caching off). Now, briwser's page caching is non-intrusive- happens without much work by the user nor by the server. But browser caching is at each user's end. And not all content is cacheable (most dynamic pages- JSPs- are not cached).

    For static pages (cached at the browser), the benifits of caching is the avoidance of network cost and the cost at the Web Server to serve the page- apart from any scaling costs due to increased loads. Now, teh costs (at the server) for serving a dynamic page request is teh actual cost of processing the page- servlet execution cost that includes the actual code executed, any EJBs accessed, data access (DBMS or any other) and probably accessing otehr systems.

    In the same catalog example used in the article, instead of caching the data, if the actual end-html-page (realised by executing the JSP/Servlet) is cached then apart from the avoidance of cost of data access (cached or not) one can also avoid/avert the CPU cost involved in executing the logic in the JSP/Servlet. This is easier said than done. A simple JSP that always has a static realised-page is an absolutely trivial cases. More often than not , the page would depend on teh data at that point in the DB, or depends on input parameters (query string ) in the URL, or session state or header fields or cookie fields. Any dynamic content cache has to effectively manage all the attributes of a request that identify a specific instance of teh realised page. Over and above all, smart and consistent invalidation strategies are needed to ensure no stale/incorrect page gets sent from the cache.

    <vendor>
    Pramati Server includes a very effective and powerful solution for Dynamic Content Caching. Covering all the above aspects. Check out the Tech Paper that briefly describes the Dynamic Content Cache capability.
    </vendor>

    Cheers,
    Ramesh
  33. In web apps, whether cache just data or cache entire pages is always a tricky question.
    Yes, it is a very tricky question. Very trivial feature like click counter can complicate things, if page doe's not scale without content cache and you need to add this feature after cache is implemented, cache strategies become more complicated than application itself. There are a lot of good ways, but it doe's not mean I use "good" way for evrything, it must be a good reason to use any workaround or optimization.
  34. Caches are not transactional[ Go to top ]

    Amidst the conversation around caching and its obvious performance gainst, lest we not forget that, for anything that is updated and not read-only, caches are not transactional.
  35. Amidst the conversation around caching and its obvious performance gainst, lest we not forget that, for anything that is updated and not read-only, caches are not transactional.
    Coherence supports transactional caching. You can access any Coherence cache in a transactional manner by either using CacheFactory.getLocalTransaction(NamedCache map) (in which case you will have to manage the transaction lifecycle) or by using our J2CA CacheAdapter (in which case the transaction lifecycle is "driven" by the container managed transaction).

    Coherence also provides cluster-wide locking which allows for the concurrent update of information in a clustered cache. This allows for read and write intensive data to be cached or managed in the application tier.

    Later,
    Rob Misek
    Tangosol, Inc.
    Coherence: It just works.
  36. InCoherence[ Go to top ]

    "Coherence supports transactional caching."

    yes, but I would never purchase a product from someone who thinks developers are paid too much :) j/k
  37. InCoherence[ Go to top ]

    "Coherence supports transactional caching.

    "yes, but I would never purchase a product from someone who thinks developers are paid too much :) j/k
    Very funny indeed. I had been meaning to jump in on that thread as well. Your comment prompted me to do so here

    Later,
    Rob Misek
    Tangosol, Inc.
    Coherence: It just works.
  38. Tangosol[ Go to top ]

    Where are you guys located?
  39. Tangosol (inBoston)[ Go to top ]

    Hi Rickson,

    We are located in Boston... Davis Square, Somerville to be exact.

    Later,
    Rob Misek
    Tangosol, Inc.
    Coherence: It just works.
  40. BOSTON??? Git a rope![ Go to top ]

    Rob, sorry, too cold for me :) Take care
  41. Coherence inCold[ Go to top ]

    Hi Rickson,

    You can run Coherence in cold weather areas as well ;-).

    Later,
    Rob Misek
    Tangosol, Inc.
    Coherence: It just works.
  42. In web apps, whether cache just data or cache entire pages is always a tricky question.
    Yes, it is a very tricky question. Very trivial feature like click counter can complicate things, if page doe's not scale without content cache and you need to add this feature after cache is implemented, ...
    Wasnt referring to a case where an application has to implement a page-caching solution. Surely this is very complex- and is easier done within the App Server. I was referring to transaparent page-caching solution- either by 3rd party vendors or by the App Server vendors themselves. Transparent to the application- should come with no change needed to the app.
    (Pramati's solution is part of the Web Containers caching solution- for static and dynamic content). More views..).

    Cheers,
    Ramesh
  43. I see no problems to plug open source or homemade filter and to have the same transparence, I do not think it is important feature for container.
  44. I see no problems to plug open source or homemade filter and to have the same transparence, I do not think it is important feature for container.
    It really depends on what it is you are trying to accomplish. For a small 99% static site with relatively few variables, yes you can do it easily. We looked at adding functionality like this, and (to put it simply) there were a lot of variables in the "caching function" that made it much more difficult as a general purpose solution than we originally thought it would be. Take the i18n/l13n issues such as character sets as just one example. Security issues are another example.

    In a way, I guess I'm agreeing with you -- a homemade filter is probably much easier by a long shot b/c it can address the 80/20 rule very easily. A general purpose approach that would suitably address most applications is a very difficult endeavor (IMHO).

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Clustered JCache for Grid Computing!
  45. Both caches ( content and data access level ) are good for some use case and both can be useless. First page in this site is a good example for content cache. i18n and authorization is not a problem (it is very trivial to generate to generate composite key). Probably some JSP tag is better for content cache than filter in more complex use cases :


    <cache:include page='myPage.do' keyGenerator='secureI18n' expires='2h' />

    or

    <cache:body key='$request.getParameter("id")' expires='2h' >

     <!-- JSP code -->

    </cache:body>



    It is more problematic to invalidate this kind of cache, "clearAll" can solve some problems. Cache all GET HTTP methods and clear all pages on any POST filter is the most trivial way for pseudo readony web sites like TSS.
  46. The idea of caching is that, I think, every java developer has sometime experimented with. And as I did. And my experiments result in approach, which I practice yet already for several projects. Though, my innovations targeted not on the performance by itself, but at first place - on scalability, extendability and development cost/speed.
     
    1. I use entity EJBs for data mapping and immutable Value Objects for inter-tier data transfer and as cahe entries.
    2. My VO Cache is (roughly) just a Hashtable, and PKs are the keys
    3. I always delegate sortings/filterings to underlying DB, because:
        - simply this is a kind of tasks that it can do best
        - having two different mechanisms in one app would cause lots of headache in the future and misused results

    I have special component - QueryEngine that performs querying of underlying database and passes back (on presentation tier) already filtered and sorted list of entity IDs - I call them ID Vectors. This vectors not necessary to be flat, i.e. one-dimensional, they also can be 2-dimentional arrays as well. For example if you need to select and sort or group one entities by another such (for example: product/supplier pair). Then at presentation tier this ID vectors are 'dereferenced' into detailed used-readable form by the help of special VOFactory objects which hide(wrap) EJB access. Switching this scheme to 'caching factories' should give the clue. I.e. instead of bothering EJB tier - these objects do cache lookup first.

    To achieve production quality and reuse in next applications there was need in different facilities: EJB-VOCache synchronization, cache overflow and so on - all them are not very complex but still big enough to explain in details here.

    --
    Mike Skorik
    J2EE Architect
    www.100kSolutions.com
  47. I have special component - QueryEngine that performs querying of underlying database and passes back (on presentation tier) already filtered and sorted list of entity IDs ... Then at presentation tier this ID vectors are 'dereferenced' into detailed used-readable form by the help of special VOFactory objects which hide(wrap) EJB access.
    Don't you fall into the n+1 trap here ? A query with, say, 50 results may lead to 51 db hits, depending on how well your cache is filled. This is IMHO the way you shouldn't do it UNLESS it is technically impossible to return the values within the query (non-relational query, thirdparty systems involved, ...).

    Admittedly, this is the way we do it too in the moment, but the technical limitations are obvious.

    Or didn't I understand your scheme correctly ?
    Matthias
  48. Matthias,

    Of course, this caching scheme works best with rarely updated lists or under conditions, where there is an 'active area' in a huge amount of data, for example - mostly retrieved fist pages of product catalogue. For tasks when 'random' accesses take place, for example - in reporting tasks (with lot of sorting, filtering and ordering options available) I slightly deviate from my canvas and obtain completely- or partially- detailed list from QueryEngine. It's possible to do w/o introducing any changes.