Using Terracotta DSO

Java Development News:

Using Terracotta DSO

By Joseph Ottinger

01 Jan 2007 | TheServerSide.com

Terracotta DSO is an open source technology created by Terracotta, meant to provide clustering to Java at the virtual machine level. It does so by weaving code around specific classes, which will communicate with a specific server process to retrieve and update data as needed.

If this sounds somewhat like what JavaSpaces does... it's not. For one thing, DSO doesn't rely on an API to manipulate the clustered data. For another, DSO is configured slightly differently, with participating JVMs needing to know where the DSO hub is. DSO is also far easier to test from a client-side perspective, as we'll see later in the article.

DSO is deployed as a set of modules that run atop a regular Sun JVM1. A set of scripts to manage the cluster and start the JVM are provided as part of the distribution. The first step DSO users should execute after downloading DSO is running the $DISTRIBUTION/dso/bin/make-boot-jar.bat, where $DISTRIBUTION is the home of the Terracotta installation. This will create a set of modules for use with the current JVM. (Any update to your JVM will require re-running this file.)

The next step is to have something worth clustering, which will also show off one of the biggest advantages of DSO: codeless clustering.

Reference data isn't a problem for most applications, until it turns out to be megabytes in size. Once it's large enough, there's a huge advantage in preloading it, and calculating slices of cached data at runtime to cut down on how long it takes to access it2. However, in typical deployments, each runtime application has to calculate the reference data and appropriate subsets of information.

What would be really useful is if we could load the data in one application and have it available in all of the JVMs -- and if one JVM memoized a structure, it'd be nice if that memoization was available to all of the other JVMs, too.

The best feature of DSO for this structure is that we can code with our normal, non-clustered, JVM, with our regular suite of tools and tests... in other words, as if we weren't clustering the application at all.

What we'll do is fairly simple. This isn't useful in the real world, but we'll build an application that returns sets of words based on strings passed in. At the start, it'll query a URL for the list of words (which is copied from /usr/share/dict/words on my Linux server), then ask for various subsets of words, which it will manually calculate and cache as they're requested.

We'll then run this process in more than one JVM, to show the clustering in action. The second JVM to run as a DSO process won't have to hit the URL, and won't have to build up every sublist of data, either.

The basic test harness looks like this:

 public static void main(String[] args) {
  WordList list=WordList.getWordlist();
  System.out.println(list.getList("").size());

  Random r=new Random();

  runTest(list, r);
  runTest(list, r);
 }

 private static void runTest(WordList list, Random r) {
  List<String> l=new ArrayList<String>();
  for(char i='a';i<'z'+1;i++) {
   l.add(Character.toString(i));
  }
  while(l.size()>0) {
   String prefix=l.remove(r.nextInt(l.size()));
   long starttime=System.currentTimeMillis();
   int s=list.size(prefix);
   long endtime=System.currentTimeMillis();
   System.out.printf("%s: %d (%d ms)n", prefix, s, (endtime-starttime));
  }
 }

This isn't horribly exciting: it basically runs two loops, selecting letters at random and showing how long it took to retrieve the sublists from the WordList object.

The WordList's relevant code looks like this:

    public List<String> getList(String prefix) {
        List<String> list = null;
        boolean hasKey;
        synchronized (wordlists) {
            hasKey = wordlists.containsKey(prefix);
        }
        if (hasKey) {
            System.err.println("'" + prefix
                    + "' was requested, and it's been built already.");
        } else {
            buildList(prefix);
        }

        synchronized (wordlists) {
            list = wordlists.get(prefix);
        }
        return list;
    }

There's nothing earth-shattering about this code, either. The synchronization needs a little bit of explanation, though. For one thing, we need to synchronize wordlists to see if it has a key -- simple enough, and very fast (relatively speaking.) If it doesn't find the key, it builds the list, which is a slow process. The buildList() function updates the word map like this:

            synchronized (wordlists) {
                if (!wordlists.containsKey(prefix)) {
                    wordlists.put(prefix, w);
                }
            }

So what's happening here? If two threads look for the key at the same moment, they both might discover that the word list doesn't have their prefix, so they both build it. When they try to add the list to the map, they check to make sure the prefix is still missing. If it's not missing, then we had a thread contention; both processes built the word list, but only one stores it into the map. This has a lot of advantages over being more aggressive about synchronization -- and the cost is relatively small.3 We could have built more precise thread control, but it's not necessary. (Well, not for this explanation. In production, you'd probably want better mutex control.)

The truly interesting part about this code is that there's absolutely nothing making it look like it's clusterable. It's a simple, single-JVM memoization application. It can be tested in a single JVM, with nothing else involved. Junit or TestNG tests could be written to exercise the cache and the word lists themselves. At no point is any real care taken, beyond the simplest thread safety.

However, if two of these are run at the same time, each will have to build its cache manually, which isn't acceptable for data that can and should be shared between tasks. This is where DSO comes in, by providing a transparent cache mechanism.

DSO requires a "host process," the actual DSO server itself. DSO clients run using a specific batch file provided by Terracotta DSO, "dso-java.bat". So we have four tests ahead of us: one is to run in single JVMs, the next is to run under a single client JVM with the DSO instance, then a run with multiple clients at one time, and then -- lastly -- a run with two clients, running sequentially.

Before we can do that, though, we need to build our DSO client configuration. DSO configuration is through a very simple XML file that lists servers and ports used, client log location, instrumented classes, the paths of variable instances that are managed by DSO, and lock conditions.

The relevant (client-specific) section of the configuration file for this article looks like the following:

  <application>
    <dso>
      <instrumented-classes>
        <include>
          <class-expression>com.wordcalc.data.WordList</class-expression>
        </include>
      </instrumented-classes>
      <roots>
        <root>
          <field-name>com.wordcalc.data.WordList.instance</field-name>
        </root>
      </roots>
      <locks>
        <autolock>
          <method-expression>java.util.List com.wordcalc.data.WordList.getList(java.lang.String)</method-expression>
          <lock-level>write</lock-level>
        </autolock>
        <autolock>
          <method-expression>void com.wordcalc.data.WordList.buildBaseWordlist()</method-expression>
          <lock-level>write</lock-level>
        </autolock>
        <autolock>
          <method-expression>void com.wordcalc.data.WordList.buildList(java.lang.String)</method-expression>
          <lock-level>write</lock-level>
        </autolock>
      </locks>
    </dso>
  </application>

Most of this is very simple, with only the autolock using "special syntax," and in this case, even the autolock uses something straightforward, mapping to the signatures of the methods that update the word list (and thus need locks that are cluster-aware.) An autolock is "is meant for the case were the code is written with multiple threads in mind, using the synchronized keyword to demarcate protected areas. A synchronized block or method denotes a section of control flow that will be serialized with respect to thread access. Only a single thread can enter the block at a time.4" The autolock syntax uses AspectWerkz5 syntax for method specifications.

Back to our code! The first test has already run; on a single (non-DSO) JVM on my workstation, it reports a runtime of 49587 milliseconds.

Now, let's start up the DSO server ("start-tc-server.bat") and run the Main class under DSO (with "dso-java.bat -cp . -Dtc.config=../tc-config.xml com.wordcalc.Main"): this affects the runtime, boosting it up to a whopping 49667 ms. (Note the sarcasm; this is just as likely to be attributed to network issues retrieving the word list, or task priority issues. This is not a significant difference.)

We can now say our second test is complete. Now we need to shut down the DSO server, and repeat the process with two client instances. (We need to shut down the server for reasons that you'll see in our fourth test, which is two client instances running sequentially.)

To run the third test, we start up the DSO server again (remember, we killed it after the second test), and then start up two DSO clients as quickly together as possible. The runtime sped up to forty-one and forty-two seconds (including the time to fetch the base list itself for both processes, because they both initialized at the same time), a decent improvement - and the most interesting thing is that only one of the processes showed a message building the list for any given prefix in most cases.6 (During the tests, finding a collision where the same list was built twice was uncommon.)

Now for the fourth test: shut down the DSO server again, and run only one client. After it finishes, run one more client before shutting down the DSO server. The first client run takes 49597 ms, which is equivalent to our original run. The second run, though: 1432 ms. This is a huge increase over the first run, and we had a 100% hit rate on our memoization, even though we had no memoization code for the first set of strings checked.

So what happened? It's pretty simple. The configuration file for DSO specified that the object root for the word list was to be managed by DSO, so when the word list is used by a client JVM, it checks to see if it has the current data in "local view." If it doesn't, it pulls what it needs from the DSO server; if the DSO server doesn't have it, then the local client builds it, and updates the DSO-managed instance.

This makes the data available to every JVM that attaches to that DSO server -- even if the original JVM that built the data isn't running. It only pulls the data it needs, so even if our list is large, the check to see if the word list contains a given prefix is pretty small (and depending on the implementation of the size() method, might continue to be small, checking only the size of the keyset.) Every DSO client is a peer with every other, so standard JVM multithreading semantics apply to the client code -- there's absolutely nothing in our word list client that has to change to leverage DSO as a cache.

This can be very useful. In one case, an application used 31MB of raw data for a rules engine, which it then sliced into memoized pieces (much as our test has, except in more dimensions and with far more data.) Each application startup used roughly a minute and a half on initializing this data, with more time spent at runtime memoizing common requests.

If this were to be made DSO-aware, initialization would happen one time. Updates could be made on the fly, storing the changes to a database as well as directly to the runtime dataset, where clients could automagically use the correct (updated) data. There are other distributed models in which a specific, dedicated client could be the only one that accesses the entire dataset at once (or slices could be pulled from the database, instead of generating subsets of data from an in-memory cache), which would lower end-process memory requirements.

This is one of the primary focuses of DSO -- providing clustered cache capabilities with no client-code alterations -- and it does it very well.

Footnotes

1Given the nature of what DSO does, it's not very surprising that not only is it currently bound to a specific JVM vendor, but should also be bound to the specific JVM version before running.
2This is called “memoization;” see http://en.wikipedia.org/wiki/Memoization for more details.
3Note that it's silly to expect that every situation is the same! This code works, on the assumption that building a list has no side-effects, and with the knowledge that building the list once per process (instead of "just once") is an acceptable cost. This is probably a correct assumption most of the time, but your mileage may vary.
4Page 39, http://terracottatech.com/product-docs/TerracottaDSOGuide.pdf .
5AspectWerkz' site: http://aspectwerkz.codehaus.org/
6This can still be improved, but this is left as an exercise to the reader. Note that DSO is fully able to leverage the concurrency capabilities of the JVM, so as your code gains fine-grained control for multiple threads in a single JVM, it'll gain the same kinds of benefits for multiple DSO clients.