With Coherence 3.7, Oracle Continues to Drive Innovation in the Data Grid Market

Discussions

News: With Coherence 3.7, Oracle Continues to Drive Innovation in the Data Grid Market

  1. Today, Oracle releases the latest evolution of Coherence, their flagship, distributed in-memory data grid product. Oracle Coherence 3.7 becomes available for eager adopters today.

    For those watching this space with interest, the basics of the Oracle press release, which boasts ‘dramatically more data storage capabilities’ and ‘intelligence and dynamic load balancing of client connections' likely won’t capture anyone’s imagination. That’s the typical marketing hype we expect to hear from any of the players in and around this data grid space, be it Oracle, Gigaspaces, GridGain or Terracotta.

    However, more interesting about this latest product release, which doesn’t necessarily make it into the marketing hype, is the work and effort the Coherence team seems to be putting into efficiently managing the capacity handling of the underlying Java Virtual Machine infrastructure upon which coherence lives. “With this release of Coherence, we’re making incredibly more efficient use of memory, both inside and outside of the JVM. And we’re managing that memory space with a level of efficiency the industry hasn’t seen before. With Coherence 3.7, we can manage four times as much data in the same sized heap space.”  Says Cameron Purdy, Vice President of Development at Oracle.

    Of course, this does beg the question as to how they’ve quadrupled the size of the object graphs they can stuff into a finite sized memory space.  The Coherence team throws around all sorts of terms that both sound scientific while smelling like Haitian Voodoo: real time dynamic de-douping in memory,  data storage compression, but using technology that goes far beyond the concept of simple zipping up of data, intelligent data structures optimized to for dynamic compression.

    And, like the other vendors in this space, Oracle is continuing to push through the shortcomings in the garbage collection algorithms that plague applications that must achieve massive scale. But again, announcements of these types of victories are becoming more and more standard in this sector of the industry. You always have to dig deeper to find out what these press releases are really saying about the state of the industry.
    Some good news that those worrying about the future of the various application servers that have found themselves under the control of Oracle Corporation is the fact that Coherence continues to promote integration with Glassfish and the WebLogic application servers. “Oracle Coherence 3.7 introduces native integration with Oracle GlassFish Server via the Coherence Web SPI for GlassFish, providing 'no code change' installation and configuration of Coherence Web, making it dramatically easier for Oracle GlassFish Server users to scale their applications.”

    This is a very interesting segment of the industry to be watching. It would be expected that some of the smaller companies in this segment should be able to move with much greater speed and agility as they introduce new features and functionalities into the market before a giant like Oracle could be able to catch up. But the release of Coherence 3.7 shows that Oracle is just as adamant about leading and innovating in this sector as any of the other key players in the industry.

    http://www.oracle.com/us/products/middleware/coherence/index.html

  2. Hi Cameron -

    For those watching this space with interest, the basics of the Oracle press release, which boasts ‘dramatically more data storage capabilities’ and ‘intelligence and dynamic load balancing of client connections' likely won’t capture anyone’s imagination. That’s the typical marketing hype we expect to hear from any of the players in and around this data grid space ..

    It's pretty easy to demonstrate. For data capacity, with no extra config, start up a couple command-line Coherence nodes, and in each type:

    cache ram-test

    Assuming you have a flash drive with some space on it, you can now start adding data, well in excess of the heap size.

    Regarding dynamic load balancing, check out this presentation (posted today):

    http://www.youtube.com/watch?v=z4FAs-FXPO0

    There's an entire channel on Coherence-related topics: http://www.youtube.com/user/OracleCoherence

    Of course, this does beg the question as to how they’ve quadrupled the size of the object graphs they can stuff into a finite sized memory space.  The Coherence team throws around all sorts of terms that both sound scientific while smelling like Haitian Voodoo: real time dynamic de-douping in memory,  data storage compression, but using technology that goes far beyond the concept of simple zipping up of data, intelligent data structures optimized to for dynamic compression.

    The answer is pretty simple: Java provides some very nice APIs (like the Collections APIs) and most of the implementations are very general-purpose. When we build Elastic Data, we focused on reducing memory utilization for the memory-resident portion of it, and as part of that we focused on reducing object count (since Objects may have an automatic 32- or even 64-byte overhead on large JVMs). The net effect of this work is that we average just over one object per cache entry in a large system, which is far less than even something as simple as a HashMap (which in our case would be around five objects per entry, since both the key and value are of type Binary, which holds a reference to an immutable byte[] in a way that is similar to String holding a char[]). Note that the number of objects is almost identical whether or not the data is on heap or on disk!

    Some good news that those worrying about the future of the various application servers that have found themselves under the control of Oracle Corporation is the fact that Coherence continues to promote integration with Glassfish and the WebLogic application servers. “Oracle Coherence 3.7 introduces native integration with Oracle GlassFish Server via the Coherence Web SPI for GlassFish, providing 'no code change' installation and configuration of Coherence Web, making it dramatically easier for Oracle GlassFish Server users to scale their applications.”

    Not just that .. we also continue to support JBoss, WebSphere, and other application servers -- in case any of their users want high-speed, high-scale session management ;-). The difference is that it's easier to use Coherence with Weblogic and Glassfish because it plugs in through an SPI.

    This is a very interesting segment of the industry to be watching. It would be expected that some of the smaller companies in this segment should be able to move with much greater speed and agility as they introduce new features and functionalities into the market before a giant like Oracle could be able to catch up. But the release of Coherence 3.7 shows that Oracle is just as adamant about leading and innovating in this sector as any of the other key players in the industry.

    It is a very exciting part of our industry, and there's a lot of great competition. Fortunately for us, Oracle Coherence (back then Tangosol Coherence) was the first to market with clustered caching (2001), partitioned caching (2002), map/reduce style functionality (2005), parallel query (2002), and many other features. It may be a little harder (being a big company like Oracle) to get stuff quickly into the market -- Elastic Data is almost a year old and it's just now being released! -- but there are also positive aspects, such as working closely with the language and JVM teams, working on extremely cutting-edge systems like Exalogic, and being able to influece (and be influenced by) the most popular commercial application server (WLS) and the most popular open source application server (Glassfish).

    Peace,

    Cameron Purdy | Oracle Coherence

    http://coherence.oracle.com

  3. Of course, this does beg the question as to how they’ve quadrupled the size of the object graphs they can stuff into a finite sized memory space.  The Coherence team throws around all sorts of terms that both sound scientific while smelling like Haitian Voodoo: real time dynamic de-douping in memory,  data storage compression, but using technology that goes far beyond the concept of simple zipping up of data, intelligent data structures optimized to for dynamic compression.

    The answer is pretty simple: Java provides some very nice APIs (like the Collections APIs) and most of the implementations are very general-purpose. When we build Elastic Data, we focused on reducing memory utilization for the memory-resident portion of it, and as part of that we focused on reducing object count (since Objects may have an automatic 32- or even 64-byte overhead on large JVMs). The net effect of this work is that we average just over one object per cache entry in a large system, which is far less than even something as simple as a HashMap (which in our case would be around five objects per entry, since both the key and value are of type Binary, which holds a reference to an immutable byte[] in a way that is similar to String holding a char[]). Note that the number of objects is almost identical whether or not the data is on heap or on disk!

    In the SunOracle x64 JVM objects seem to have an overhead of only 16 bytes: 8 for the object header + 8 for the reference (http://stackoverflow.com/questions/1425221/how-much-memory-does-a-hashtable-use/1425455#1425455). So I don't think just serializing everything in a byte array is enough to "quadruple the size of the object graph". It does look like an interesting thing to do though, especially if your your key and value types are fixed ahead of time and you know deserialization is cheap.

  4. Hi Dan -

    In the SunOracle x64 JVM objects seem to have an overhead of only 16 bytes: 8 for the object header + 8 for the reference (http://stackoverflow.com/questions/1425221/how-much-memory-does-a-hashtable-use/1425455#1425455).

    The explanation that you referenced is imcomplete and often incorrect, although there are configurations of specific JVMs that will cause those JVMs to work as he described. There is object header size, reference size, field alignment and object alignment. As the heap size grows, to keep the reference size smaller, the object alignment is increased. That means that with a larger heap and 32-bit compressed pointers, alignment will be some power-of-two bytes, such as 8, 16, 32, or 64 bytes. (I'm unaware of implementations that go larger than that, but there likely are.) Basically, a certain number of the 32 bits in the pointer are reserved for special uses (such as encoding an Integer object into a pointer), and then the remaining bits represent the 64-bit address (of which at least 33 bits are implicitly zero, thanks to a left shift of the compressed pointer value). To accomplish this, particularly since some of the LSDs of the address are zero, the object must be located on a power-of-two boundary.

    So I don't think just serializing everything in a byte array is enough to "quadruple the size of the object graph". It does look like an interesting thing to do though, especially if your your key and value types are fixed ahead of time and you know deserialization is cheap.

    Since the data are already flowing over the wire, the format is already serialized. At any rate, considering the post you linked to:

    So, putting it together (for 32/64 bit Sun HotSpot JVM): HashMap needs 24 bytes (itself, primtive fields) + 12 bytes (slot array constant) + 4 or 8 bytes per slot + 24/40 bytes per entry + key object size + value object size + padding each object to multiple of 8 bytes

    OR, roughly (at most default settings, not guaranteed to be precise):

    • On 32-bit JVM: 36 bytes + 32 bytes/mapping + keys & values
    • On 64-bit JVM: 36 bytes + 56 bytes/mapping + keys & values

    Since we know the keys and values are binary data, we can do a lot better than a hashtable, which is what we generally have used in the past (albeit a more highly concurrent implementation). We also employ data de-duping, although the 4:1 ratio was accomplished without de-duping.

    Peace,

    Cameron Purdy | Oracle Coherence

    http://coherence.oracle.com/

     

     

  5. Check out Inifnispan[ Go to top ]

    Where I work, we've been having great luck with Infinispan.  I really like the direction they're taking it, they have responded promptly to any bugs we've filed, and of course it's free.

    It's hibernate (and now Hibernate Search) integrations are seamless, and so far the performance is great.   I'd recommend checking it out - you should only spend the big money on Coherence if an open source solution like Infinispan doesn't your needs.   It certainly seems to meet ours.

  6. More to cost than license[ Go to top ]

    Hi Jeff -

    We all like stuff that's valuable and free. Unfortunately, there's often not much of a business model there. Even RedHat -- with its large sales team and excellent brand -- hasn't been able to recoup its initial investment (let alone ongoing costs) of JBoss. While it's nice to be able to give stuff away for free, I'm sure that your employer Attensa, for example, charges for the products and services that they provide. It's a normal part of business.

    Nonetheless, your point is quite valid: If you can find what you need for free, why pay? That's why the investments that we continue to make in Coherence are focused on making the software easier to use, more reliable in production, more cost-efficient to operate, etc. We do have to continue to add significant value to the product in order to sell it when its competitors provide software for free. The good news for us is that it is a healthy business, and Coherence is used in a large number of systems that don't have any desire to swap out a reliable piece of software that does its job well -- and helps those companies solve their own unique business problems.

    At any rate, if you ever decide to look at the total costs of ownership and not just the license costs for the software, then I hope you'll include Coherence in your analysis. Similarly, if you find yourself looking for high end capabilities, such as synchronous replicas across data centers (for true HA systems), ultra-low latency implementations, ultra-large capacity implementations, etc., then I hope you'll consider Coherence.

    Peace,

    Cameron Purdy | Oracle Coherence

    http://coherence.oracle.com/

  7. Congrats![ Go to top ]

    I like the focus on easy of use. 

    Nikita Ivanov.

    GridGain Systems.

  8. Congrats![ Go to top ]

    Thanks Nikita .. as you know, it's a lot harder to make something usable than just to make it work. I'm glad we're finally taking the time to make some of these capabilities more accessible .. I'm wishing we would have had more time to do some of this earlier ;-)

    Peace,

    Cameron Purdy | Oracle Coherence

    http://coherence.oracle.com/

  9. Congrats![ Go to top ]

    Why you waited to version 3.7 to realize a method loadAll(...)?

  10. Congrats![ Go to top ]

    Why you waited to version 3.7 to realize a method loadAll(...)?

    There are now two ways that loadAll() ends up getting called:

    1) When a getAll() comes to the Read/Write Backing Map, and

    2) When a Bundler (multi-threaded read-coalescing) is used, and there is more than one key.

    The reason all this wasn't done sooner is just a matter of many other items that got prioritized higher in our project queue; for example, we give priority to any bugs or resiliency issues. Also, any time that there's a multi-entry operation, the complexity of optimizing the implementation increases, because first you have to ensure that the entire pipeline (from client API all the way through to the backing map) is conducive to passing through the operations as multi-entry operations, and secondly the logic on the backing map itself becomes much more complex.

    The good news is that the applications don't have to change any code to take advantage of it -- the APIs remain unchanged, but the implementations are now more efficient.

    Peace,

    Cameron Purdy | Oracle Coherence

    http://coherence.oracle.com/