Java Development News:

Taking the load off: OSCache helps databases cope

By Andres March

01 Apr 2005 | TheServerSide.com

Do you think that you can solve your J2EE performance problems by just adding more application servers? Why? Chances are that adding more servers will simply stress your infrastructure more, making the problem worse, not better. More queries per second is the last thing the database needs. Your database will not scale as well because of its responsibility to handle data replication and consistency across the cluster. What you may need is a caching product in order to lessen processing requirements, such as querying the database on every request. One of these caches is the Open Source product, OSCache. It may not be the best solution in all cases but, as a maintainer of OSCache, it is the one I will be reviewing.

OSCache offers several types of caching:

  • POJO Caching
  • HTTP Response Caching
  • JSP Tag Library Caching
  • O/R Data Access Caching

POJO caching

POJO caching is the best place to start to learn how OSCache works. All the other types of caching build upon this API to provide some particular feature. Caching plain old java objects is the most flexible but it will not perform as well as other forms of caching that intercept request processing earlier. For example, POJO caching is not tied to any particular view technology; however, it does not eliminate any of the costly view processing that is alleviated by the HTTP response and JSP tag caches. I recommend using one of the specific cache implementations, if one exists.

Installation

When using the caching API directly, installation is trivial. Just download the jar and put it in your application's classpath. If you need a configuration other than the default, you will need a file named oscache.properties in your classpath as well.

Configuration

The default configuration will be sufficient to get started with most applications but nobody would want to go to a production environment with the defaults. The OSCache wiki defines all the configurable properties and when to use them. There are several important ones to note at this time.

cache.capacity

You cannot cache everything since you have limited heap space. Even with a very large heap, caching too much can degrade performance due to the need for the cache to manage concurrency. OSCache only synchronizes on cache writes, so performance will be better with a higher read-write ratio. cache.capacity sets the number of objects that can exist in the in-memory cache. The default value is unlimited, so this is definitely something you will want to change at some point. Unfortunately, the capacity cannot be specified in terms of heap usage. This restriction is usually not a problem but you may have to spend some time tuning this parameter to find the ideal value for your application.

cache.blocking

In a highly concurrent environment, multiple threads may try to update the same cache entry at the same time. OSCache synchronizes this operation, so that the updates happen serially. In order to update an entry, the thread first obtains a lock on that entry. This lock is obtained automatically when a thread retrieves a stale entry, so that the thread has a chance to retrieve the most current data and update the cache. cache.blocking dictates how subsequent cache requests for the same expired entry behave. The default of false is desirable in most cases. This setting means that the first thread will be responsible for refreshing the stale entry (the only one given the lock), while the others will just return the stale entry. While this setting allows for the highest concurrency, it may be desirable to set cache.blocking = true in order to prevent stale data from being served at all. As the name suggests, this setting would cause the other threads to block while the first is updating the cache.

cache.algorithm

Once the cache reaches its specified cache.capacity, the next entry added will bump an existing entry out of the cache. cache.algorithm allows you to specify the algorithm class that OSCache uses to determine which entry to remove. First In – First Out and Least Recently Used are two algorithms included in the distribution but you are free to provide your own. OSCache also provides an unlimited algorithm that never removes entries from the cache. This algorithm is always used when you do not specify a cache.capacity.

cache.persistence.class

In addition to the memory cache, a persistent cache can be configured as well. The original motivation for the disk cache was to make it available across application restarts. The benefit of this configuration would depend upon how expensive it is to repopulate the cache from the original data source. It is relatively expensive to serialize/deserialize the cache entries to and from a hard disk, so should be used with extreme caution. This feature is not intended to allow multiple cache instances to access the same disk repository. It is bad enough that the disk access is inline with the main thread. Multiple instances would cause high cache contention and possibly corruption.

To initialize a disk cache, add the following to your oscache.properties:

<code>
cache.persistence.class=com.opensymphony.oscache.plugins.diskpersistence.
  HashDiskPersistenceListener
</code>

cache.persistence.overflow.only

A new feature of OSCache makes using a disk cache slightly more feasible. When the property cache.persistence.overflow.only is set to true, the disk cache will only be used once the memory cache.capacity value has been reached. When objects are removed from the memory cache according to the algorithm configured, they are placed in the disk cache. This configuration still has a performance impact related to serialization costs but is still useful in some situations. For example, this setup might make sense if you want your memory cache capacity only big enough to handle average load but wanted to provide a larger buffer for the database under peak load.

Usage

The basic usage of the OSCache API is relatively simple. To create a cache you should construct an instance of the GeneralCacheAdministrator. This object can be configured in one of three ways:

  • with the default configuration.
  • with an oscache.properties file on the classpath.
  • with a Properties object passed to the constructor.

After an instance of the cache administrator has been created, you can use it to add, update, and flush entries in the cache. Consider the following:

<code>
String myKey = "myKey";
String myValue;
int myRefreshPeriod = 1000;
try {
    // Get from the cache
    myValue = (String) admin.getFromCache(myKey, myRefreshPeriod);
} catch (NeedsRefreshException nre) {
    try {
        // Get the value (probably from the database)
        myValue = "This is the content retrieved.";
        // Store in the cache
        admin.putInCache(myKey, myValue);
    } catch (Exception ex) {
        // We have the current content if we want fail-over.
        myValue = (String) nre.getCacheContent();
        // It is essential that cancelUpdate is called if the
        // cached content is not rebuilt
        admin.cancelUpdate(myKey);
    }
}
</code>

Cache Entry Expiration

As you can see, the main effort in using the API is dealing with the NeedsRefreshException. This exception is thrown when the entry is stale. The example above shows a typical way that the entry could be updated when it has been requested but is stale. Another feature to note is the second catch block which provides the failover feature. Upon an exception in updating a stale entry, the myValue reference is set to the stale entry value through a method in the NeedsRefreshException called getCacheContent(). This way a result can always be returned even when the database is not available. Finally, the most important aspect of this example is the call to cancelUpdate() at the end of the second catch block. This method removes the blocking of updates to the cache entry. If the update is unsuccessful and this method is not called, other threads could block indefinitely.

So how does a cache entry go stale? Entries are checked for staleness only at the time of retrieval. That is, there is no background thread tying up resources iterating the cache searching for entries to mark stale. The determination that they need refreshing is a product of how they were put in the cache and how they are being retrieved. Each cache entry can be placed in the cache with a custom EntryRefreshPolicy, if desired. The cache filter uses an implementation of this interface that examines the Expires header of the HTTP response. The retrieval of the entries can also include a refresh period in seconds and/or a cron expression. Here is the order of precedence in determining expiration of a cache entry:

  1. A refresh period of 0 will always require a refresh.
  2. The EntryRefreshPolicy is examined, if one exists.
  3. The refresh period is used in the absence of a EntryRefreshPolicy.
  4. The cron expression can force a refresh if none of the above has already.

The cron expression is a very powerful feature of OSCache. It allows very fine-grained control of the scheduling of cache entry expiration. It has a similar syntax to cron job specifications. If the expression matches a date/time between the last update of the entry and the current date/time, the entry will be considered stale at retrieval time.

Cache Entry Flushing

Of course, if you do not want to wait for an entry to expire, you can execute a flush directly on an entry. This could be tedious with many thousands of entries. To make cache management easier, OSCache has the ability to group entries when putting or updating them in the cache. It is possible to specify a string array of group names to which an entry belongs when placing it into the cache. That is a group can contain many entries and an entry can belong to many groups. This feature enables flushing by the group name instead of an individual entry's key. For example, you can group entries that must be flushed upon the execution of some business event that invalidates the cache. The Hibernate plugin described later also uses the group functionality to manage the entries for each domain class.

HTTP Response caching

An HTTP response cache is the earliest content can be cached server side. This type of cache is suitable when all information needed to in the cache key can be expressed in the URL including request parameters. For example:

http://www.example.com/myPage.jsp?test=value

An important determinant to when this cache becomes less suitable is when the key or URL parameters start to become so diverse such that a response needs to be generated per user. For example, a site containing static HTML could cache all the page responses indefinitely in memory using an HTTP response cache and avoid most of the request processing for the site. However, if this site is made up of JSPs and one requirement is to display the user name at the top of each page, the previous scenario is less than enticing. Each page of the site could end up being cached per user, thereby eliminating most of the benefits of a cache. As you will see in the next section, the cache tag libraries avoid this problem by allowing developers to specify which parts of a JSP should be cached.

OSCache implements HTTP response caching as a servlet filter. The cache filter abstracts usage of the OSCache API in order to make this type of caching simpler.

Installation

Installation of the filter adds only one more step to the setup of the OSCache. All that is required is an entry for the filter and filter mappings to the web.xml. Here is an example :

<code>
<filter>
    <filter-name>CacheFilter</filter-name>
    <filter-class>com.opensymphony.oscache.web.filter.CacheFilter</filter-class>
    <init-param>
         <param-name>time</param-name>
         <param-value>600</param-value>
    </init-param>
    <!-- do we really want a cache per session -->
    <init-param>
         <param-name>scope</param-name>
         <param-value>session</param-value>
    </init-param>
</filter>
</code>

Configuration

As you can see by the example, some of the configuration of the filter is also done in the web.xml. The init-param elements are optional. The first allows you to specify a time in seconds for the refresh period that is used when getting the cache entry, as described in the previous section. The default is one hour. The second init-param sets a scope for the cache. The default scope is a single cache for the entire application and is recommended. As I have previously stated, I think there are only small number of special use cases for a cache per user configuration.

Usage

Usage of the filter is declarative and also resides in the web.xml. The url-pattern elements are the standard method of indicating to the container when to apply a servlet filter. You just need to create filter-mapping elements like below:

<code>
<filter-mapping>
    <filter-name>CacheFilter</filter-name>
    <url-pattern>*.jsp</url-pattern>
</filter-mapping>
</code>

JSP caching

As mentioned above, a JSP tag library cache can be used to avoid caching user specific or highly dynamic parts of a page. The cache tags are set around only the parts of the page to be cached. This functionality allows parts of the page to be generated dynamically for each request and other parts to be read from the cache. The key can be automatically generated or one can be specified.

Installation & Configuration

There are no additional installation steps to use the JSP tags if you are running in a servlet API 2.3 container. If you are running in an earlier version, you will need to copy the oscache.tld to a directory in your web application and declare it in you web.xml. Here is an example:

<code>
<taglib>
    <taglib-uri>oscache</taglib-uri>
    <taglib-location>/WEB-INF/classes/oscache.tld</taglib-location>
</taglib>
</code>

The tag library is made up of only 4 tags: <cache> , <usecached> , <flush> , and <addgroup> . Before you start using these tags, you will need to declare the tag library in your JSP. Below is the declaration to use in a 2.3 container.

<code>
<%@ taglib uri="/oscache" prefix="oscache" %>
</code>

Starting with OSCache 2.1.1, the URI has been changed to http://www.opensymphony.com/oscache . For pre-2.3 API servlet containers, the URI is simply the one you declared in the web.xml.

Usage

The <cache> tag allows you to wrap the content you want cached. The simplest of examples shows how easy it is to use the tags but not the flexibility that they offer:

<code>
<cache:cache>
    ... some jsp content ...
</cache:cache>
</code>

Notice that no key is specified in the example. OSCache use the request URI and any query parameters to generate the default key. If you will have more than one set of tags on a single page, it is wise to specify a key attribute. Although the tag will give distinct keys to multiple instances within the same page (by appending an integer to the key), data could be retrieved incorrectly if the flow of execution of the page is not the same every time. The other attributes available for this tag enable you take full control of the refresh policy, groups, and scope.

As with the OSCache API usage, the tag library enables pages to still be executed successfully in case of an error in retrieving current data. Here is another example :

<code>
<cache:cache key="$product.id" time="-1" group="currencyData, categories">
     <% try { %>
      ... some jsp content ...
     <% } catch (Exception e) { %>
              <cache:usecached />
     <% } %>
</cache:cache>
</code>

The <usecached> tag handles replacing the body content with the expired content from the cache by accessing the NeedsRefreshException similar to the example above that uses the API directly. This gives the user the greatest control over exceptional conditions but it would be nice if, in the future, OSCache provided TryCatchFinally handling in the <cache> tag itself that would serve the expired content in exception cases.

O/R data access cache

Really a subset of POJO caches, data access caches are available in several O/R mapping libraries for objects returned from the db (sometimes referred to as domain objects). Some, like Hibernate, also allow you to plug-in third-party cache products, such as OSCache. If HTTP response and JSP tag lib caching are insufficient, a data access cache may be your only recourse. POJO caching may be possible above the data access layer but if you are using a tool like Hibernate, there is good reason to use its cache facilities. First, domain objects retrieved through Hibernate (and other tools like it) can be really proxies that need to be disconnected before being stored across requests. This means you will need to initialize the domain object attribute that will be used by the rest of the application. Second, tools such as Hibernate provide a pluggable architecture that allows different cache implementations to be used within their established (and well-tested) framework. This fact also keeps the details of cache usage abstracted from core development tasks.

Installation

Setting up OSCache in Hibernate is as simple as specifying an adapter class in the configuration file of Hibernate. The only difficulty at this time is that the version of this adapter (called a CacheProvider) which is bundled with Hibernate is not suitable for clustering. See the Jgroups section for more on this topic. I recommend using the patches provided by OSCache on the wiki. After they have been downloaded, place an entry in the hibernate.cfg.xml of your application:

<code>
<property name="hibernate.cache.provider_class">my.
  patched.provider.package.OSCacheProvider</property>
</code>

Configuration

When using OSCache with Hibernate, you have access to all the configuration properties available when using the API directly. In addition, the CacheProvider has the concept of regions which allow you to declaratively specify different refresh periods or cron expressions for each Hibernate domain class. The property names to specify are by default the names of the domain classes. For example:

<code>
com.mypackage.domain.Customer.refresh.period = 4000
com.mypackage.domain.Customer.cron = * * 31 Feb *
</code>

Usage

When using Hibernate with OSCache, you delegate all responsibility for cache management to Hibernate. While many of your data access operations have an effect on the cache, you do not access the cache directly. Be aware that you are giving up fine-grained cache control in this situation.

Clustering w/ JGroups

In conjunction with determining the type(s) of caching you will need, you will also need to consider how many instance of your application will be running. Since the application will sometimes pull from the cache instead of hitting the database, multiple instances may have to communicate in a cluster to keep their data synchronized.

Running a clustered application can offer some unique challenges of its own. Some cache products try to solve the issue of state management across a cluster. Complete cache management in a cluster would need to include transactional replication of state. The granularity of this replication determines how much data will need to fly around your network. OSCache does not offer this degree of state management in a cluster. OSCache only permits coordinating flushing at this time. This feature allows you to invalidate an entry in the cache and have that invalidation propagated to other nodes in the cluster. Once the entry has been invalidated, the application will go to the database on the next request for that data. While clustered invalidation is more manual and lacks many of the features of a replicated cache, it does have one advantage. Clustered invalidation only necessitates the sending of the cache entry's key, rather than the entry's data, to all the nodes of the network. This means less data and ultimately less processing that the cluster needs to perform.

Conclusion

Caching can be sure to give your database a little breathing room when scaling a large web application. As this article has shown, OSCache is a generic caching product that has been specifically tailored towards common caching issues that arise in enterprise development. It focuses on ease of use and next to no integration effort when it comes to web content. Give it a shot and judge for yourself!

Biography

Andres March is currently a Senior Software Engineer for Sony Online Entertainment. He most recently led the design and development of EQ2players.com, the leading community site for Everquest2 which was built with Spring, Hibernate, and OSCache. Besides filling the role of lead developer for OSCache, he spends his spare time trying to lose his geek tan, playing sports, and spending time with his family.