External Cache vs. In-Process Cache

Discussions

News: External Cache vs. In-Process Cache

  1. External Cache vs. In-Process Cache (20 messages)

    There is an interesting article in the latest issue of JDJ that compares performance advantages of using external cache vs. in-process cache. Author, Helen Thomas, makes an interesting argument that using external cache may decrease frequency and duration of major GC cycles, therefore, improving CPU utilization and overall performance.

    Read the article at Debunking the Myth of In-Process Application Layer Caching in J2EE Architectures

    Regards,
    Dmitriy Setrakyan
    Fitech Labs, Inc.

    Threaded Messages (20)

  2. I read this article and realized once again why JDJ is a freebie. That magazine has too many articles written by vendors that are thinly veiled marketing pitches for their products.

    I have a novel idea for caching large datasets with very fast performance: add RAM to your box, increase the footprint of your JVM, and cache the data locally.

    If you need a cache that is shared between JVMs then you will need to explore more complicate options. But a local cache will work in most situations.
  3. I read this article and realized once again why JDJ is a freebie. That magazine has too many articles written by vendors that are thinly veiled marketing pitches for their products.


    I agree. Caching is mostly about reducing IO and serialization overhead, which can provide a 2-4 order-of-magnitude speed improvement. Reducing GC overhead due to a cache is an interesting academic exersize, but is relatively unimportant on the scale of things. If you care that much about speed you shouldn't be using Java in the first place (stick to C or assembly which again can give you a 1-2 order of magnitude speed improvement). People use Java so that they don't have to worry about GC.
  4. |I have a novel idea for caching large datasets with very fast
    |performance: add RAM to your box, increase the footprint of
    |your JVM, and cache the data locally.

    That works well on average, but if and only if your measurement interval is sufficiently long in duration. With recent Sun JVMs and moderate heap sizes (i.e. <256MB), I've seen that typical GC intervals are <30ms and take <5% of processor cycles, but once every 60s there is a GC pause of around 1s. For some latency-sensitive applications that is unacceptable.

    What causes the 1s pause? If RMI is being used then the JVM periodically executes a full mark-sweep GC in order to remove stale objects that haven't been released normally by the "other" client JVM (presumably due to a JVM or network crash).

    The only work-arounds that I'm aware of are to (a) reduce the heap size by one of the techniques mentioned elsewhere in this thread, or (b) use the relevant command line argument to increase the interval between the "RMI collection" GCs to more once per hour, or whatever.

    To summarise: increasing heap size isn't a panacea; other techniques can be necessary.
  5. I have seen the problem with RMI GC that you describe. (full GC happens regularly at that 60s interval) You can effectively disable RMI GC by setting

    -Dsun.rmi.dgc.client.gcInterval=360000000
    -Dsun.rmi.dgc.server.gcInterval=360000000

    or some other suffciently large number.

    As far overall heapsize, you don't want it to get too big. Full GCs do happen -- you want to try to avoid them as much as possible. Add RAM to your box but divide the RAM among multiple JVMs. This will approximate the "parallel GC" that is promised for JDK 1.4.
  6. RMI is evil[ Go to top ]

    I don't understand what you are talking about.

    If the GC takes time to compute over RMI then turn the cache on. Why the point is not seen by more people in the industry proves they are ignorant.

    A cache is an XML thing if you keep on writing back and forth over the sockets and in memory. Therefore caches are evil.

    And the point of the articles is that by separating your cache you can slow down your application? great give me some more. Please.
  7. What the article shows is that with in-process caching done right, you can get response times down to 10ms, and with external caching done right, you can get response times down to 40ms. That additional 30ms external access time hardly seems to be a compelling argument for switching to external caching.

    On the other hand, there are some very compelling arguments for both external and off-heap caching. For example, if you have too many cached objects on the heap, move them off the heap by serializing them and writing them to a Java NIO direct buffer or a memory mapped file. Or if you want to have a very large cache, run multiple servers in a cluster and spread the cache over the entire cluster. The net result is that you can have (e.g. with Coherence) an in-memory cache of virtually unlimited size without affecting the JVM heap size, and thus not affecting GC time.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Easily share live data across a cluster!
  8. move them off the heap by serializing them and writing them to a Java NIO direct buffer or a memory mapped file.


    Cool idea although I wonder if you would lose any advantages gained due to serialization and object creation overhead. Personally I suspect that you would, but it would be interesting to see benchmark numbers.

    I did some playing around with serialization perfomance a while ago and was shocked to see how slow it was. For instance (if I rememeber correctly) unserializing a java DOM object took just as long as creating the DOM by parsing XML from scratch. That was surprising to me. Maybe the situation has changed recently, but either way it impressed on me to be wary of serialization overhead. Also I assume the serialized object would take up more space in serialized form then as instantiated in memory (probably only slightly, depending on the object), so you would lose out there a bit too.
  9. cost of serialization[ Go to top ]

    I made some comments on serialization in my blog:

    Incidentally, the biggest cost of accessing Java objects over the network is typically serialization/deserialization -- not network time! This is a great reason to try to move up to JDK 1.4.1, which has improved the performance of serialization rather dramatically. In a lot of cases, it used to actually be faster to serialize as XML and deserialize by parsing it! Now, JDK 1.4.1 runs within about 25% of hand-coded custom serialization speed, which is pretty impressive.

    Additionally, we found that ObjectInputStream is the #1 reason why deserialization is so slow, so we introduced a DataInputStream-based optimization in our 2.1 release which has improved application performance by 30% and dropped memory utilization (for external cache servers) by over 50%! We call it ExternalizableLite, and it is optimal for tree (not graph) based objects, like XML.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Easily share live data across a cluster!
  10. More on serialization...[ Go to top ]

    ... our product xNova™ comes with a service that allows marshalling/de-marshalling of metadata-based objects up to 50x times faster than a native serialization for Java or .NET and generating payloads 5x times smaller than native serialization routines. So, if we hypothetically reduce network overhead to zero, you can exchange data object between Java and .NET, .NET and .NET or Java and Java up to 50x times faster and reduce your network traffic up to 5x times comparing to serialization of JDK 1.4.x, for example (let alone .NET serialization)...

    Thanks,
    Nikita Ivanov.
    Fitech Labs, Inc.
  11. JCACHE: cost of serialization[ Go to top ]

    Cameron,

     Any thoughts on when JCACHE PFD will be available?

    -- Kumar.
  12. JCACHE: cost of serialization[ Go to top ]

    Kumar: Any thoughts on when JCACHE PFD will be available?

    A first cut ref impl is available internally. I can't project though when the jCache spec will appear. I hope sooner than later and continue to push for it.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Easily share live data across a cluster!
  13. cost of serialization[ Go to top ]

    any chance you can share the ExternalizableLite?
  14. cost of serialization[ Go to top ]

    Hi Matthew,

    The interface itself is very basic. If you've registered at our site, you can see the online doc. It is designed to support simple object serialization (e.g. simple value object) and tree-based serialization (e.g. XML), but not graph-based (e.g. A refs B refs C refs A). If you have used the Externalizable interface, then this interface should be very easy to understand, having two similar methods:

      void readExternal(DataInput in)
      void writeExternal(DataOutput out)

    We didn't intend it so much for hand coding, since that is error prone and annoying. We implemented it on our XML objects and our XML-bound value objects (our XmlBean framework).

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Easily share live data across a cluster!
  15. I certainly agree that more memory, JVM tuning or effective cache spooling will provide better solution in most cases. However, what did strike me as an interesting point is that external cache can be effectively used for the device that are naturally limited in CPU/Memory department such as PDA or palm devices.

    It is a still open question though whether or not trip to external cache would be faster than trip to data store (especially for simple queries). Most of the commercial databases do very comprehensive caching on their own.

    Anyway, it was interesting to see that there could be a valid (but rare) case when GC overhead is bigger in general then a network trip + serialization.

    Regards,
    Nikita Ivanov.
    Fitech Labs, Inc.
  16. Why either/or?[ Go to top ]

    I wasn't too impressed with this article. The main point is that in low memory environments, in-memory caches don't work well. For server-side programming, low memory isn't an issue.

    Also, why either/or. We use both an in-memory cache and an external cache. This is the best of both worlds.
  17. Thanks for the discussion. I read the article but wasn't quite sure to think about it. It did seem like a little bit of an advertisement, but it brought up some interesting issues.

    Steve
  18. Were the tests valid?[ Go to top ]

    The test with hard-reference cache shows steady CPU utilization and fastest response time up until application starts running out of memory and CPU utilization gets to 100% because GC never stops.

    Although this scenario worked well to strengthen the point presented in the article, it can hardly be used when discussing real life applications utilizing hard-reference caches. Vast majority of hard-reference caches offer efficient LRU and Aging policies to negate the very problem depicted in the test.

    On the other hand, in-process caches would offer little value in certain number of cases where an application frequently requests large chunks of data (e.g. hundreds of megabytes). In such cases caching data externally on local or network hard-drives (note that external cache is not necessarily remote) may prove to be more useful.

    Regards,
    Dmitriy Setrakyan
    Fitech Labs, Inc.
  19. They're cheating![ Go to top ]

    Hey, they're cheating!

    The external cache runs in another machine, not present in the other tests. Despite the fact that the app server runs on a four-CPU machine, the clocks of each one are much lower than the external cache machine (450 Mhz -> 900 Mhz). I know that the clock isn't the best indicator of performance (like Intel x AMD), but the difference is too big (2x!). Who studied a little of computer science knows that 4 parallel CPUs doesn't mean 4x performance (Amdhal's law), so 4x450Mhz < 1800Mhz (MUCH less).

    Besides that, the GC of the 1.3.1 JDK doesn't run well in a multiprocessor environment, only since JDK 1.4 the parallel GC was introduced.

    There's just too many flaws in the methods of the test for it to be valid in any manner...
  20. Well, what about if memory is not an issue. It seems that 64-bit processors and OSs are more and more popular. Probably if there is enough demand, 100GB desktop machines could be around within 6-12 months. That probably would make servers with 1 TB a commodity ... Is object prevalence the ultimate solution to persistence? (www.prevayler.org)
     -- Krzysztof
  21. If you have infinte heaps then you will have infinite pauses for GC. See my other posting about RMI-induced full mark-sweep GCs.

    There's a classic networking paper (J. Nagle, "On packet switches with infinite storage", IEEE Transactions on Communications, COM-35(4):435-- 438, April 1987) that points out that if you attempt to overcome network delays by increasing the size of network FIFO buffers beyond a certain point, the end applications will timeout then die and/or increase the load on the network until all packets are dropped.

    Long GC pauses will have the same effect.