Discussions

News: Tangosol Announces Coherence 1.2 Clustered Cache

  1. Tangosol today announced the 1.2 release of Coherence, its clustered cache product. The new version provides distributed caching and automatic cluster restart, support, HTTP Session replication for Websphere, and more. Built in pure Java, Coherence adds clustering to any J2EE server.

    Check out Tangosol Coherence 1.2.


    Press Release
    -----------------------
    Tangosol adds Distributed Caching and robust automatic cluster restart capabilities in Coherence 1.2 for better memory utilization across the cluster and fail-over/fail-back support. New features reinforce Tangosol Coherence as an essential component for scaling up J2EE applications to run on multiple servers. Built in pure Java, Coherence adds clustering to any J2EE server and delivers instant access to data in a WebLogic, WebSphere or other J2EE server cluster while ensuring data integrity, managing failover and drastically reducing end-user wait times.

    Tangosol, Inc. today announced the 1.2 release of its enterprise-class clustered cache product. With support for the Distributed Cache and Automatic Cluster Restart, the new version builds on the rich feature set, rock-solid reliability and amazing scalable performance of the 1.1 release and addresses the needs of customers with very large caches that have a need to minimize the cache's memory footprint and recover seamlessly from physical network failures. For IBM WebSphere 4 customers, Coherence 1.2 includes an in-memory HTTP session replication module that provides improved performance, scalability and reliability to high-scale WebSphere applications.

    Additionally, Coherence provides:

    HTTP session replication for Apache Tomcat 4

    Support for sticky and non-sticky load-balancing

    Based on a peer-to-peer clustering model that eliminates single points of failure

    Provides both fully replicated and partitioned data storage in a cluster

    Provides the only clustered size-limited cache implementation

    Gracefully handles physical network interruption

    Detects and recovers from dead and unresponsive cluster nodes


    Download a free evaluation today at: http://www.tangosol.com/coherence.jsp

    Source code for the HTTP session replication modules for IBM WebSphere 4 and Apache Tomcat 4 is available with a FREE development license. To request your free development license, email sales at tangosol dot com.

    Threaded Messages (44)

  2. Interesting. I always thought that clustering is in the realm of the App Server vendors (As part of their value add and differentiation).

    Didn't know there's a marketplace for 3rd party AppServer clustering software. Am I missing some subtle points here ?
  3. Harold: "Interesting. I always thought that clustering is in the realm of the App Server vendors (As part of their value add and differentiation)."

    It is a value-add for some vendors. BEA (for example) provides a peer-to-peer clustering model with clustering support for JNDI, HTTP sessions, JMS, and stateful EJBs. Probably 50%-60% of our customers are deployed on BEA WebLogic, yet they needed the additional features and reliability that Coherence provides.

    We have had several discussions with J2EE app server vendors about providing our clustering functionality to them. We don't have plans to add J2EE app server functionality -- the margins in that market are too small unless you are one of the top players. (That's why there has been a recent exodus of players; if they were making money, they wouldn't be leaving the market.)

    Harold: "Didn't know there's a marketplace for 3rd party AppServer clustering software. Am I missing some subtle points here?"

    We do a lot more than clustering app servers, although we do that quite well for Apache Tomcat 4 and IBM WebSphere 4. Our product serves as an in-memory data store in a clustered application tier. If you have experience with high-scale J2EE applications, you'll probably have experienced what I call the "single point of bottleneck": the database. For some of our customers, we've reduced their database usage by 99.9%+ and their overall infrastructure costs by over half. With the replicated cache, each application server has all the "cached" data locally, so access is instantaneous. With our concurrency features, the data integrity is maintained, whether on a single server or replicated across 100 servers. With our distributed cache, if you have 100 servers each caching a half GB of data, you have a 50GB cache!

    We've focused heavily on making sure that our clustering is fault tolerant and extremely reliable. With the improvements that we've made in our 1.2 release, we expect that our customers will _never_ have cluster downtime except for scheduled maintenance and upgrades. (In our 1.1.3 product, unexpectedly long GCs and other interruptions, including physical network cabling issues, could cause cluster connections to be lost. We added fault tolerance solutions for both of these classes of problems.) We stress and regression test each release on up to 24 cluster nodes for long durations.

    Our software is not for everyone. Clustering works best when you have more than one JVM for example ;-) ... and if you don't need data integrity in your caches you can use asynchronous JMS messages to invalidate singleton caches across a cluster for example (Dimitri's Sepuku pattern). If you do require highly available clusters and application-tier in-memory data stores with data integrity, then I'm certain you'll choose Coherence. (It also has the added bonus of being this highest performance clustered cache product and the only synchronized replicated cache for J2EE.)

    Frankly, anyone who's ever had to scale up a J2EE application to more than one server knows how nice it would be to have some way of maintaining some data in the application so that the data is live (not having to be read from the database), replicated (to avoid single-point-of-failure) and always in sync across the servers. That's what Coherence does, and it does it really well.

    Thanks for your questions, and I hope this answers them.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  4. <quote>If you have experience with high-scale J2EE applications, you'll probably have experienced what I call the "single point of bottleneck": the database. </quote>

    Cameron,
    although I agree with you that the database can be a very serious bottleneck in J2EE applications, it is my experience that this is also the part of the multi-tiered j2ee applications that scales the best. Database vendors like Oracle/IBM/Sybase have decades of experience in making their DB products as performant as ever, and the results are (you will agree) impressive. They have a lot more experience in clustering, caching algorithms, etc than for example the young and still fairly immature (although things are changing) J2EE vendor community. In my experience with high-volume J2EE apps, it has always been the J2EE application (most cases, and sometimes the cause was related to the messy DB queries that junior J2EE programmers sometimes produce) or the J2EE container (sometimes) that caused problems.

    Any thoughts?

    Rik
  5. Hi Rik,

    Rik: "In my experience with high-volume J2EE apps, it has always been the J2EE application (most cases, and sometimes the cause was related to the messy DB queries that junior J2EE programmers sometimes produce) or the J2EE container (sometimes) that caused problems."

    We've worked with Java performance and scalability problems on over 100 different companys' (and gov't) J2EE applications -- some of the largest in the world. The statistics below are my best guess summary of what we've seen:

    10% -> the app server is the bottleneck because of poorly built Java
    5% -> the database is the bottleneck because of poorly built SQL or poorly designed schemas
    5% -> the configuration of the app server (e.g. connection pool is too small) is the bottleneck
    75% -> the database is the bottleneck because it can't handle the load
    5% -> other

    J2EE apps scale almost linearly, but the scale of the app causes an almost linear (and in some cases super-linear) growth of load on the database. While a 64-CPU e10000 running a popular RDBMS can handle an amazing number of requests, you'd be surprised how many requests a cluster of 4-CPU e450s can throw at it and bring the application to a crawl. Another thing to consider is the cost of scale. A fully configured Sun e15k will cost you about US$80k per CPU to run Oracle (including h/w and software). We save some of our customers _obscene_ amounts of money, while drastically improving response times and providing some room for growth on the scalability side.

    Caching data in the application tier makes perfect sense if you can do it without losing data integrity. Think about it: Where is the data used? It's used a little bit in the database (joins, constraints, etc.), but much more importantly, it's used in the application tier -- for everything! You've got presentation, business logic, security processing, etc. etc. etc. all going on in the application tier, and it's having to go to the database umpteen times to get the data.

    Look at the ecPerf results. Notice the CPU balance between app servers and databases. Especially in EJB apps, the database tends to get absolutely hammered. (I gave a presentation on this last month at WBUG. You can find the slides on our site in the Coherence download section at http://www.tangosol.com/coherence.jsp).

    Rik: "although I agree with you that the database can be a very serious bottleneck in J2EE applications, it is my experience that this is also the part of the multi-tiered j2ee applications that scales the best. Database vendors like Oracle/IBM/Sybase have decades of experience in making their DB products as performant as ever, and the results are (you will agree) impressive."

    Now I can agree with you ;-) ... if you get rid of all the wasted usage on the database for non-transactional data (like HTTP sessions, like lists of stuff loaded to put HTML pages together, like basically everything presenation-related, like security information if you keep it in the database, etc.), then the database is a very scalable, excellent resource for transactional operations.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  6. "10% -> the app server is the bottleneck because of poorly built Java
    5% -> the database is the bottleneck because of poorly built SQL or poorly designed schemas
    5% -> the configuration of the app server (e.g. connection pool is too small) is the bottleneck
    75% -> the database is the bottleneck because it can't handle the load
    5% -> other "

    The "database can't handle the load" seems to be a somewhat meaningless statement. Not handling load means one of two things:

    a) there's not enough hardware
    b) it's being accessed poorly.

    Usually the case is (b), which correlates to your observation of the "the database is the bottleneck because of poorly built SQL or poorly designed schemas". This includes physical data design, locking contention analysis, transaction decomposition, etc. I tend to agree with Thomas Kyte (http://asktom.oracle.com) -- most application failures are a result of PEOPLE not understanding the database and not using it effectively. Such as those that see Entity Beans or O/R mappers as a way of "black boxing" the database by just throwing objects into it.

    Obviously an object cache is a performance enhancer for certain classes of systems (read mostly, browse-oriented, like web or e-commerce sites). Any tiered cache usually provides performance enhancements -- with correctness tradeoffs, usually due to the limitations of most cache coherency and concurrency control mechanisms.

    But I would suggest that improving database access is the first goal of an application. Adding an object cache because of a lack of good-DBAs onboard is really just wallpapering over the problem -- it might work now, but it's just temporary glue.
  7. Well said.

    A prerequisite for a good EJB devloper is to know his/her database inside out, or at least know how to tune SQL statements. A tendency in the OOP world is to grab all data from database and do the manipulation (join, filter) in the application tier code. This can lead to poor use of database resources. Stored procedures should have their places in today's J2EE/EJB(especially CMP)-centric applications, although the current CMP spec does not support them.
  8. Stu

    You have many good points that I agree with and champion on my projects. The only point I would like to comment on is that some high volume transactional appls, the database can be the major source of bottlenecks. Agree that you have to ensure good schema, good access, efficient SQL, etc designs. But beyond that, once you get to high levels, you can run into:

    - network interface (even with aggregated gigabit nics)
    - high levels of processor interrupts due to TCP/IP handling at the database server node partly due to
       - chattiness of the OCI protocol
       - mismatch of the default SDU/TDU (2K) vs TCP/IP's MTU (1.5K) which you cannot override with JDBC Type IV
    - contention for latches to critical sections such as cache buffer chains, sharedpool, etc
    - bottlenecks at the redo logs
    - and many others that come with high transactional volumes

    An object cache is for queries against read-only tables can easily shed off 1/3-1/2 of all SQL to the database. An object cache that can replicate updates to the cluster can get eliminate another 1/6-1/3 of the other SQL thus reducing network and db loads.

    My only problem is that (in some cases), over caching on the mid-tier can cause JVM heap bloat. But I digress.
  9. There is no disagreement : both database and "application" must be optimized. Good DBA can do a lot! But it will be data-store-access-integrity optimization. Cashing system, in general, belongs to application level. Extend, duration of caching are more "business rules" than "data-integrity" aspect of database level.
  10. stu, Talk to your DBA's because what they will disagree with you most strongly... when you look at products like Coherence, you are only adding additional functionality to YOUR java code... cache is good... and its better over
    all of the tiers !

    we have a large system that has been developed that would send over 1.2 million SQL statements for 15,000 transactions an hour... It is trully about persistance between the database and your application... any caching you add can only help... bad design + bad coding = database issues

    I think you should try to be a team player... look at the whole picture.. they are bad dba's abound.. but they are also some very bad development managers and lead developers.
  11. Hi Stu,

    Stu: "The "database can't handle the load" seems to be a somewhat meaningless statement. Not handling load means one of two things:
    a) there's not enough hardware
    b) it's being accessed poorly."

    While both are true (you can solve most database problems with more hardware and you can improve the database access efficiency of most applications) there comes a point of diminishing returns and a point of no return.

    I gave the example of the e10k for a reason -- to upgrade an e10k at the time was impossible. It took 64 CPUs and then you're done. (The 128 CPU Primepower from Fujitsu was not an option at the time for them.) So there comes a point where you basically can't scale up the hardware. There were options such as OPS (already in use) but that only allowed two servers. So once you exhaust US$10MM on just the database hardware and software and another similar amount on database consultants, how do you scale up the hardware?

    Regarding the "poor access" of the SQL database, yes you are right that there are many applications that should be improved in this regard. However, if there is data that is commonly involved in business operations in the application tier, even if the access is 100% optimized, it can still bog the database server! And it can still cause poor performance of the application! Oracle spends more CPU cycles on the client side alone than what it costs to grab something from a cache on the app server! (i.e. Oracle uses more cycles just in the JDBC driver running in the JVM that is running the J2EE app server.) _Then_ after burning those cycles, it has to wait on a round trip network request to an Oracle server -- minimum network time of 2ms. Under reasonable load, Oracle on big hardware (e.g. an e10k) cannot respond in less than 10-12ms. Once the statement cache latches are saturated, it can't respond in less than 50-60ms!

    Those are just base latency figures. You have to add the cost of the actual query processing on top of that.

    From my personal experience, I think it's better to avoid getting into that situation in the first place. J2EE application servers scale horizontally very well and have predictable costs (e.g. the n+1 clustering model). Relational databases scale very poorly horizontally and thus are limited by the limit of vertical scale. While they scale very well vertically (Oracle, Sybase, even IBM UDB), vertical scale is naturally more expensive (a 64-CPU machine is much much more than 32x the cost of a 2-CPU machine), and it has a limit (128 CPU on Fujitsu, 72 CPU on Sun, etc.)

    One way to make more effective use of the scale of the J2EE tier and more effective use of the database is to cache in the J2EE tier. It's the same reason that CPUs in high-end servers have huge caches even if their memory subsystems are awesome! It's the same reason that database servers have incredible cache systems even if their I/O subsystems are awesome! For high-scale applications, and for cost-effective scale of not-so-high-scale appliations, caching is _always_ the most effective solution.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  12. Cameron,

    Something interesting popped up from your texts. So can somebody safely infer that you can get the same speed from a combination of cheaper database/cache servers as from a big(huge) iron? imho that's the main sale point of your solution then.
    DODO
  13. Hi DODO:

    DODO: "Something interesting popped up from your texts. So can somebody safely infer that you can get the same speed from a combination of cheaper database/cache servers as from a big(huge) iron? imho that's the main sale point of your solution then."

    Often that is the case. However, it is just as often a need for faster response times (less latency on data accesses). For example, if you are checking security information that is stored in the database, each HTTP request may have to hit the database 1 or more times (BTW - we've actually seen this in a 15,000+ concurrent user application) and that causes a lot of unnecessary requests to the database and also adds a measuable amount of time to the HTTP request processing. Using caching can significantly lessen the database load (often by 50% or more) and drop the application response times way down, all without losing any functionality!

    So I'd say we see customers selecting Coherence for several reasons: about 30% of the decisions are cost-driven, about 40% are for scalabile performance reasons and maybe 30% for reliability (failover) reasons.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  14. Cameron,

    "For example, if you are checking security information that is stored in the database, each HTTP request may have to hit the database 1 or more times[...] Using caching can significantly lessen the database load.."

    Wouldnt App Server caching achieve the same result? (ofcourse, App Server caching generally kicks in only when exclusive database access mode is enabled).

    Cheers,
    Ramesh
  15. Hi Ramesh,

    Cameron: "For example, if you are checking security information that is stored in the database, each HTTP request may have to hit the database 1 or more times[...] Using caching can significantly lessen the database load.."

    Ramesh: "Wouldnt App Server caching achieve the same result? (ofcourse, App Server caching generally kicks in only when exclusive database access mode is enabled)."

    The answer is, "It depends."

    1. It depends on the data access logic. If the application is loading the data via JDBC, the app server will not cache it, neither clustered or not. If it is an entity EJB that is accessing the database, then the app server might cache the EJB.

    2. It depends on the application server. Some application servers do no caching of entity EJBs. Some cache only read-only data. Some cache only if there is exclusive access from one JVM to the database (i.e. they do not cache in a cluster). The latest releases (e.g. Weblogic 7.0) now even have some caching options (optimistic) that can work for read-mostly data in a cluster.

    The main value of caching in a cluster is identical to the value of caching when there is only one JVM. The difference is that in a cluster, you don't have the singleton cache pattern (etc.) that allows you to accomplish the caching. That is what Coherence provides: the ability to share live data among any number of JVMs, just like a singleton cache can share data among threads in a single JVM.

    For caching at the JDBC level (almost transparently), there are several options too. There's a company called TimesTen that makes FrontTier which is a JDBC driver that goes to an in-memory database server (it's a separate non-clustered database server) that sits in front of your regular database server. There's an Oracle appliance cache company -- I can't remember the name but they were at eWorld and they had a rack-mounted appliance that you could use to query against Oracle and the appliance would rarely have to go to the Oracle box.

    For maintaining a coherent cache in the cluster, there's only our Coherence product.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  16. Hi Cameron,

    <quote>
    For maintaining a coherent cache in the cluster, there's only our Coherence product.
    </quote>

    As much as I like your product (and I really do. Your personalization product is my all-time favorite), I find this statement very hard to believe. Do you plan on publishing any feature-comparision papers, and/or performance comparision scenarios vs Gemstone/Persistence/etc. Do you plan to publish scenarios and results using Coherence and built-in 7.0 optimistic caching? Or Coherence vs PCA? I can imagine that the cache performance will totally depend on the usage scenario.

    --
    Dimitri
  17. Hi Dima,

    Dima: "As much as I like your product (and I really do. Your personalization product is my all-time favorite)"

    Thank you ;-) ... we've got some improvements coming there too.

    Dima: "I find this statement very hard to believe. Do you plan on publishing any feature-comparision papers, and/or performance comparision scenarios vs Gemstone/Persistence/etc. Do you plan to publish scenarios and results using Coherence and built-in 7.0 optimistic caching? Or Coherence vs PCA? I can imagine that the cache performance will totally depend on the usage scenario."

    The statement on performance, you mean? (Or that we have the only coherent cache?) As a disclaimer, you are absolutely correct that performance depends on the usage scenario. Our customers are caching and/or storing data in a cluster of machines (typically multiple server boxes, sometimes multiple JVMs per box). In our replicated cache, the data is always local and in the JVM (in Java format, so to speak) and the access path is only a little more complicated than a Hashtable, so it is obvious that for reads, our cache is as close to the "theoretical maximum performance" as possible. For concurrency operations, we use a series of optimizations that ensure that communication has to occur with at most one other JVM (which typically takes one round trip, less than 2ms), and often is done locally, so again its difficult to consider an implementation that would be theoretically faster (lower latency). For update operations, we use multicast (if there are multiple other members to update), and our TCMP protocol can provide throughput pretty close to the theoretical maximum of a network, and it uses burst (no sync ack/nack required) so it can reliably transfer (for example) megabytes of update data too all other cluster members in less than a second. So at the theoretical level, it is difficult to imagine any other implementation providing a replicated data store with any higher level of performance.

    As I pointed out earlier in this thread, there are solutions for multiple JVMs on one machine (not a cluster), such as GemFire, which operates with much less overhead than a network protocol (no data to move -- it is in shared memory). So there are obviously situations in which we are not the highest throughput / lowest latency -- even for doing what we do best (sharing data among JVMs).

    Regarding Gemstone, we have not benchmarked it (Facets or PCA) in house. Some customers and potential customers have. Regarding Persistence, my experience with it is pre-EdgeXtend (they had no clustering story). We've never run into a customer using or evaluating EdgeXtend. When we looked at the documentation, we could not determine it's underlying implementation, but it appears to be related to the PowerTier O/R engine and maybe it is built on JMS. I'm guessing it is more like ObjectStore (aka Javlin at http://www.exln.com/) and enJin (http://www.versant.com/products/enjin/). We've been benchmarked by eval customers against both (to share objects in a cluster) and came out ahead, but take it with a grain of salt because I don't have the details (either what the tests were or what the result numbers were).

    However, and I think more importantly, a customer quote was "Oh my God, is that all I have to do to integrate this? I'm already done." And, another, "I did some research and i always got the same products : Facet, Javlin ... I think that there are two big problems with these products :
    - They are not easy to develop with (I'm mainly a developper and I hate working with "not funny" API)
    - They are very expensive.
    So finally, I found a thread on http://theserverside.com which talks about your product and it made me happy !"

    There are certainly newer entries into the clustering/caching space, including some of the O/R, JMS and OODBMS vendors, but we're holding our own on performance, reliability and scalability, and so far we are winning on ease of integration and price.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  18. DODO

    There is obviously a diminishing point but yes, in our experience, the initial set of caching bought lots of savings at the database server end. Let me explain.

    Starting with a baseline of no caching, we need XX processors on a 10K or 15K. Scaled the appl until a breaking point - e.g., network or database. We then cached heavily read-accessed reference tables (after doing the due-diligence and analysis). That immediately reduced a lot of traffic to the network and db and allowed further scaling.

    As we started to cache the other tables, the cost/pain of doing so becomes more and the benefits less. Typical diminishing curve.

    That's led to our next round of optimization to API/Servlet/JSP response caching. Again, first round would be for high volume, high likelihood of hitting same requests, reference caching.

    Again, watch out for heap bloat (e.g., large old heap sizes and GC costs).

  19. Cameron,

    Excellent response. I agree with your points. Here are a couple of comments.

    - Obviously there are production databases that run tremendous OLTP loads without a middle-tier cache. How have they done this? With optimized application access paths and hardware. I think there's a debate about the cost-effectiveness of this approach, but not about the viability of the approach. I would note that many OLTP databases aren't CPU-bound, they're I/O-bound. This is really a balance of RAM vs. disk heads, not the 64-CPU limit of an E10k.

    - Most JDBC CPU problems are driven from drinking up too many results from the database instead of chopping the cursor off appropriately. I'd wonder whether the CPU involved with Oracle's JDBC driver would be any different than with just accessing a cache. It's mainly iteration-driven load, not computational. How would it be different in a middle-tier cache, when you have to iterate over the same amount of data to ship it to the client?

    - The new version of OPS (9i RAC) has real potential for horizontally scaling databases. Compaq/Tandem NonStop SQL also is well-known for its horizontal scalability, it's used at cell-phone switches to store call data (which you can guess is, uh, high volume).
  20. Stu,

    Stu: "Obviously there are production databases that run tremendous OLTP loads without a middle-tier cache. How have they done this? With optimized application access paths and hardware. I think there's a debate about the cost-effectiveness of this approach, but not about the viability of the approach. I would note that many OLTP databases aren't CPU-bound, they're I/O-bound. This is really a balance of RAM vs. disk heads, not the 64-CPU limit of an E10k."

    The "olden" rule applies: Every application is different. Oracle and Sybase are completely I/O bound for some applications, and for others, it is possible to have all the data or the most-used data in memory (and in Oracle's case you can actually specify what data to keep in memory) and avoid most I/O bound issues.

    Similarly there are applications that don't need more than one or two CPUs on the database server, but need an almost infinite amount of I/O bandwidth. Data mining comes to mind.

    Most of the work that we see is with OLTP ... most of our customers are deploying web-based J2EE applications that use Oracle, Sybase, MS SQL Server, UDB. We do, quite often, see virtually 100% CPU utilization on the database server, and using Precise's Indepth for Oracle products it's possible to break out the time between query processing, I/O waits, etc.

    Stu: "Most JDBC CPU problems are driven from drinking up too many results from the database instead of chopping the cursor off appropriately. I'd wonder whether the CPU involved with Oracle's JDBC driver would be any different than with just accessing a cache."

    Oracle's driver actually parses the SQL on the client side before sending it off. It uses an unbelievable amount of cycles. I wasn't even talking about the cost of processing the result sets ;-).

    Stu: "It's mainly iteration-driven load, not computational. How would it be different in a middle-tier cache, when you have to iterate over the same amount of data to ship it to the client?"

    The data is already in memory and in Object form. No stream reading or deserialization. Compare a 10000 row iteration from JDBC vs. an entrySet().iterator() over a Hashtable. On my notebook, the Hashtable takes 30ms total! (Code below)

    for (Iterator iter = map.entrySet().iterator(); iter.hasNext(); ) {
      Map.Entry entry = (Map.Entry) iter.next();
      Object key = entry.getKey();
      Object val = entry.getValue();
    }

    Stu: "The new version of OPS (9i RAC) has real potential for horizontally scaling databases."

    I've seen it. You're absolutely right, it has real potential, particularly to solve parts of the scalability problem, but it won't do anything about the latency problem (e.g. the overhead of accessing a database), and it doubles the cost of Oracle ($80k/CPU).

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  21. "I've seen it. You're absolutely right, it has real potential, particularly to solve parts of the scalability problem, but it won't do anything about the latency problem (e.g. the overhead of accessing a database), and it doubles the cost of Oracle ($80k/CPU)."

    I am glad this point came up - I mean the cost. To me this should be the main point when marketing a distributed cache. In brief:

    N * cache-license + db-license < N * db-license

    However, having a cache is not the same as having a true parallel database, like DB2 on parallel sysplex or Oracle RAC, at least in the general case. A case in point is concurrency control: a cache typically relies on rollbacks, whereas a database server relies on blocking.

    The fact that JDBC has high-overhead can be solved at the spec level. Also, latency grows with applied load, and in the general case you will probably get better latency at full load by using parallel sysplex or RAC than by using a software cache. This is probably true even with JDBC factored in. Again, I am talking about the general case. That's because you get to a point where your network buffer is overflowing with invalidation messages and your transactions have to run many times because they keep getting rolled back. At that point the latency goes out the window.

    Also, if you need speed, consider replacing HashMap with TreeMap ...

    Guglielmo
  22. Hi Guglielmo,

    Thanks for your comments!

    Guglielmo: "having a cache is not the same as having a true parallel database"

    Not yet. ;-)

    Guglielmo: "A case in point is concurrency control: a cache typically relies on rollbacks, whereas a database server relies on blocking."

    Our "Synchronized Cache" service and our "Distributed Cache" service both offer full pessimistic concurrency control through a locking/leasing API. (Despite the name "Synchronized Cache", our TCMP protocol and all of our cluster services are completely asynchronous. The "Synchronized" refers to the ability to do a synchronized update, i.e. lock-read-write-unlock.)

    We also offer a rules-based optimistic resolution (our "Optimistic Cache" service), which is what you describe with the "rollbacks". It uses a combination of factors, including cluster membership identity and data versioning, to determine when to accept and reject cache updates.

    Guglielmo: "The fact that JDBC has high-overhead can be solved at the spec level."

    Yes, and of course the drivers could be tighter. In the case of the Oracle driver, caching prepared statements returns up to a 30% performance increase in some apps because of the particular inefficiency that I described.

    Guglielmo: "latency grows with applied load, and in the general case you will probably get better latency at full load by using parallel sysplex or RAC than by using a software cache."

    That is certainly true for a replicated cache, whose scalability will degrade as a function of the frequency*size of cache updates.

    OTOH, Our distributed (paritioned) cache scales linearly far beyond our means to test it, but since only some percentage of the data is local to any cluster member, there is an additional latency for read operations on the cache. (We show customers how to handle this by layering a size-limited local cache with auto-expiry on top of the distributed cache for read-only purposes.)

    Guglielmo: "That's because you get to a point where your network buffer is overflowing with invalidation messages ...."

    Unlike other implementations, Coherence does not use invalidation messages; rather, it actually maintains a coherent cache. In our tests, we were actually able to reliably replicate data in a cluster faster than other products could invalidate data over JMS.

    Guglielmo: "Also, if you need speed, consider replacing HashMap with TreeMap ..."

    In our tests, the HashMap was usually about 3x faster than TreeMap, but YMMV. The advantage of TreeMap is a guaranteed worse case (I think: 2 * (n log n)) and it has ordered content. The performance is often based on the cost of hashCode()/equals() vs. the cost of compareTo(), and in larger maps it depends on the effectiveness (the "perfectness") of the hashCode(). In a map of 100 items, for example, you would average about 6 compares (5.5 for found item, 6 for not-found item) in a TreeMap, vs. exactly one hashCode() and on average just over one equals() in a HashMap.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  23. "Thanks for your comments!"

    You too for your details ...

    "Our "Synchronized Cache" service and our "Distributed Cache" service both offer full pessimistic concurrency control through a locking/leasing API. (Despite the name "Synchronized Cache", our TCMP protocol and all of our cluster services are completely asynchronous. The "Synchronized" refers to the ability to do a synchronized update, i.e. lock-read-write-unlock.)"

    It helps to be asynchronous when the communication takes a long time, but with a fast network (1Gb or more, say) you can also do synchronous updates using virtualized communication (i.e. without going through the kernel). Any comments on this?

    "Unlike other implementations, Coherence does not use invalidation messages; rather, it actually maintains a coherent cache. In our tests, we were actually able to reliably replicate data in a cluster faster than other products could invalidate data over JMS."

    Do you use multicasting, or point-to-point, or some combination of both? What do you do about flow control?

    "In our tests, the HashMap was usually about 3x faster than TreeMap, but YMMV. The advantage of TreeMap is a guaranteed worse case (I think: 2 * (n log n)) and it has ordered content. The performance is often based on the cost of hashCode()/equals() vs. the cost of compareTo(), and in larger maps it depends on the effectiveness (the "perfectness") of the hashCode(). In a map of 100 items, for example, you would average about 6 compares (5.5 for found item, 6 for not-found item) in a TreeMap, vs. exactly one hashCode() and on average just over one equals() in a HashMap."

    That's the reads. On the other hand, the memory performance of a tree (on writing) is superior to that of an array that needs to constantly be resized.

    Guglielmo
  24. Guglielmo: "It helps to be asynchronous when the communication takes a long time, but with a fast network (1Gb or more, say) you can also do synchronous updates using virtualized communication (i.e. without going through the kernel). Any comments on this?"

    I'm curious as to what you mean by "synchronous updates using virtualized communication". I don't know the term.

    We use asynchronous communication for several reasons:

    1) We support massive parallel processing using multiple threads. Multi-threaded communication is best scaled over asynchronous channels. (There is no blocking in the communications protocol.)

    2) We support WANs. Latency is a leading indicator of performance degredation with synchronous communications.

    3) With n-point communication (peer to peer clustering), particularly with respect to unpredictable Java GC, asynchronous communication is not impacted by route times and unresponsive servers.

    In other words, it's all about avoiding blockages in the clustering and communication protocols so that we can scale significantly better yet without paying any latency penalty.

    In our internal testing, we will often see 60-80 round trips (req/resp, polls, etc.) before we even see the first ACK. (Control messages are queued, throttled _and_ bundled to help with flow control. That helps significantly with frame relay.) What that means in reality is that we can sustain (on some higher latency networks) over 60 times the throughput that for example a TCP/IP connection would support.

    Guglielmo: "Do you use multicasting, or point-to-point, or some combination of both? What do you do about flow control?"

    We use both multicast and unicast datagram sockets. Traffic is sent on the most effective route based on rules.

    We have several flow control options, including general burst mode (configurable) and ACK and NACK throttling (each individually configurable).

    Since Coherence is designed for switched LANs (and private WANs) it does not use dynamic loss-rate and RTT-based flow control, but it does provide a healthy set of options for tuning. See the tangosol-coherence.xml file in the coherence.jar file and the documentation in the accompanying DTD.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  25. "I'm curious as to what you mean by "synchronous updates using virtualized communication". I don't know the term."

    The term "virtualized communication" is from the book "In Search of Cluster". You probably know this but somehow we are using different terms. There is a spec called VIA (Virtual Interface Architecture) which specifies a standard interface for sending packets without going through the kernel. The basic idea is to use the network card without making context switches and unnecessary copies of data (ideally, no copies,) while still preserving memory protection.

    The cache used in parallel sysplex is accessed synchronously using this type of mechanism, and I believe RAC works the same way, and that its late arrival is due to the fact that Oracle was waiting for standard VIA drivers to become available so they could write a portable implementation.

    Clearly this approach is best only on really fast networks. If you have gigabit ethernet (because that's the traffic you are expecting) you shouldn't be doing a system call to send and receive each packet, because that will limit your throughput.

    Out of curiosity, did you find it necessary to implement any particular ordering for your multicast protocol? I wrote a Java implementation of a totally-ordered, reliable multicast protocol, and I managed to get a throughput of about 75Mb/s. I'd like to use the work I did somehow, but I recently concluded that there's not a strong demand for this in your average OLTP system. But it was fun to watch it run ...

    Guglielmo
  26. Hi Guglielmo,

    Guglielmo: "The term "virtualized communication" is from the book "In Search of Cluster". You probably know this but somehow we are using different terms. There is a spec called VIA (Virtual Interface Architecture) which specifies a standard interface for sending packets without going through the kernel. The basic idea is to use the network card without making context switches and unnecessary copies of data (ideally, no copies,) while still preserving memory protection."

    I haven't seen or read it. I'm going to have to do some research. Apparently, I've been spending too much time on TheServerSide and not enough time reading ;-). Remember though that we are only using capabilities exposed to pure Java.

    Guglielmo: "Out of curiosity, did you find it necessary to implement any particular ordering for your multicast protocol? I wrote a Java implementation of a totally-ordered, reliable multicast protocol, and I managed to get a throughput of about 75Mb/s."

    Yes, our protocol is ordered and reliable, although underneath (at the wire level) there is no ordering or reliability ;-). Our protocol excels in a chaotic environment for several reasons:

    1) it is a constant-order O(K) implementation, so performance (CPU) for any given operation does not degrade with load

    2) it is relatively memory efficient; we avoid memory copies whenever possible, and other stuff like that

    Regarding your 75Mb/s, is that on 100Mb ethernet or on 1000Mb ethernet? If it's on 100Mb ethernet, that is very impressive. (You supposedly _cannot_ get higher than 89.)

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  27. "I haven't seen or read it. I'm going to have to do some research. Apparently, I've been spending too much time on TheServerSide and not enough time reading ;-). Remember though that we are only using capabilities exposed to pure Java."

    It's a wonderful book. Be sure to read the author's own "dead computer sketch", inspired by Monty Python's "dead parrot sketch"...

    "Regarding your 75Mb/s, is that on 100Mb ethernet or on 1000Mb ethernet? If it's on 100Mb ethernet, that is very impressive. (You supposedly _cannot_ get higher than 89.)"

    You are right - I got it wrong - it's been a while. The throughput was 5000 messages per sec, times 1500 bytes, times 8 bits, which is 60 megabit-per-sec. Sorry about that. That was on fast ethernet, not gigabit. The protocol is a token-based reliable multicast protocol with total ordering. The test I was referring to used two 1.4Ghz p4 machines with 400Mhz rambus memory, on fast ethernet (on a hub), and intel 10/100 nics, jdk1.3.1_02 (server jvm, I think.)

    Guglielmo
  28. Guglielmo: "The throughput was 5000 messages per sec, times 1500 bytes, times 8 bits, which is 60 megabit-per-sec. Sorry about that. That was on fast ethernet, not gigabit. The protocol is a token-based reliable multicast protocol with total ordering. The test I was referring to used two 1.4Ghz p4 machines with 400Mhz rambus memory, on fast ethernet (on a hub), and intel 10/100 nics, jdk1.3.1_02 (server jvm, I think.)"

    Between 2 notebooks, Pentium III 600mhz, 256MB RAM (100mhz FSB), 100Mb integrated ethernet, 100Mb hub, JDK 1.3.1_01 with 64MB heap, Coherence (which operates at a slightly higher level than our TCMP protocol) was transferring at 8.1MB/s, which is an effective 65Mb/s. Overall traffic (including our TCMP overhead etc.) would have been over 70. (That was Coherence version 1.1.0 if I remember correctly. We have added a few improvements since then, but not any that would significantly increase throughput.)

    We do not use total ordering because of unpredictability and duration of GC and because of scalability concerns. There is an almost-total ordering alternative based on whisper numbers (last known packet ids), but none of our cluster services (the cluster service itself, also the replicated cache, optimistic cache, and distributed cache services) require total ordering.

    Have you had a chance to use Coherence yet?

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  29. "Between 2 notebooks, Pentium III 600mhz, 256MB RAM (100mhz FSB), 100Mb integrated ethernet, 100Mb hub, JDK 1.3.1_01 with 64MB heap, Coherence (which operates at a slightly higher level than our TCMP protocol) was transferring at 8.1MB/s, which is an effective 65Mb/s. Overall traffic (including our TCMP overhead etc.) would have been over 70. (That was Coherence version 1.1.0 if I remember correctly. We have added a few improvements since then, but not any that would significantly increase throughput.)"

    Cool. Thanks for this little bit of data, which is a bit of sanity check.

    "but none of our cluster services (the cluster service itself, also the replicated cache, optimistic cache, and distributed cache services) require total ordering."

    I think that total ordering is not really needed unless you are doing replication. Otherwise you can just use fifo order. I was building a symmetrically replicated oltp system, but as I said there's not much of a demand for it.

    "Have you had a chance to use Coherence yet?"

    Not yet. But I think after all these details I have a pretty good idea of what your product does. I will definitely keep in mind if I need to buy a cache ...

    Guglielmo
  30. "I haven't seen or read it. I'm going to have to do some research. Apparently, I've been spending too much time on TheServerSide and not enough time reading ;-). "

    In search of clusters 2nd edition is also probably one of the most lucid & humorous highly technical books you'll find out there

    it's on my listmania list
    http://www.amazon.com/exec/obidos/tg/listmania/list-browse/-/55ZNUPHBP0P1/
  31. Thanks for the link!

    We have the Transaction Processing (J.Gray) book. Very good read for someone implementing a recoverable TM.

    I'm planning to get the cluster book. Looks interesting and it comes highly recommended ;-)

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  32. "Oracle's driver actually parses the SQL on the client side before sending it off. It uses an unbelievable amount of cycles. I wasn't even talking about the cost of processing the result sets ;-). "

    If you're using PreparedStatements with bind variables, probably lesson #1 in Oracle performance management, the driver just stores a hashtable of the PStmts, and there's no need for a soft-parse in the database. And even for drivers that DON'T do this, you can use session cached cursors to reduce shared pool & library cache latch contention significantly.

    Sorry, that was a complete Oracle geek-out. I'll behave now. :)

    As for the hashtable Iterator vs. JDBC ResultSet iteration, yes I'm with you on that benefit, it's what I call "cache bias".. the data is shaped into an in-memory format / access paths that are appropriate to your application's "read mostly" data areas.
  33. Stu: "If you're using PreparedStatements with bind variables, probably lesson #1 in Oracle performance management, the driver just stores a hashtable of the PStmts, and there's no need for a soft-parse in the database. And even for drivers that DON'T do this, you can use session cached cursors to reduce shared pool & library cache latch contention significantly."

    Correct. We actually market a custom driver ("N2O J2EE Accelerator for Oracle RDBMS") just for this purpose (caching Oracle prepared statements). Other products (like BEA WebLogic starting with 5.1sp9) have included that functionality since then.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  34. Hi Cameron,
    <quote>
    Now I can agree with you ;-) ... if you get rid of all the wasted usage on the database for non-transactional data (like HTTP sessions, like lists of stuff loaded to put HTML pages together, like basically everything presenation-related, like security information if you keep it in the database, etc.), then the database is a very scalable, excellent resource for transactional operations.
    </quote>
    Isn't this the sort of thing that DB vendors have recommended a middle tier DB for?
    What is the relationship with a TopLink (or JDO) type cache?
  35. Hi Adam,

    Adam: "Isn't this the sort of thing that DB vendors have recommended a middle tier DB for?"

    I would guess so, but you still have latency (ms) and cost ($) issues.

    Coherence _is_ an application-tier data store. Our 1.x products are non-transactional, but that is perfect for things like HTTP sessions and caching. Using the concurrency control API, you can accomplish a lot of things in the app tier that previously you had to use a database for. Our 2.x product (currently in pilot) is fully transactional (local/global, 1pc/2pc, JTA, readcomm/repread/serial, opt/pess) and that will fill in the few remaining functional areas that customers are looking for when it comes to app tier data stores and caches.

    Adam: "What is the relationship with a TopLink (or JDO) type cache?"

    TopLink has a good single-VM cache, and they use RMI for multiple VMs, so they can support two VMs reasonably (RMI, which is point to point, doesn't scale well; consider a fully connected graph at n*(n-1)/2 connections for n points). I haven't used the latest TopLink, but this was still an issue fairly recently. We don't have any customers yet in production using our cache with TopLink, but I know of a few in evaluation or integration stages, and a few more TopLink customers that we've worked with that we're working on convincing. Now that Oracle owns TopLink, perhaps they'll license a clustered cache for it ;-).

    We've also looked at integrating with Cocobase (http://www.thoughtinc.com) ... currently they use hsql for in-cluster sync/cache. Right now, TopLink probably has more market share by revenue but Cocobase probably has more market share by number of customers. Both seem to be very good products for OR mapping.

    I haven't had a chance to look at the specific JDO implementations yet. I think JDO has a huge amount of promise; we'll see.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  36. Hi Cameron,

    <quote>
    It also has the added bonus of being this highest performance clustered cache product and the only synchronized replicated cache for J2EE
    </quote>

    Do you have any performance (and feature) comparisions to Gemstone PCA (Facets?), or to Javlin (Excelon), or to Persistence (PowerTier)? They all offer distributed transactional caching solutions.

    <quote>
    We've focused heavily on making sure that our clustering is fault tolerant and extremely reliable.
    </quote>

    That's a fact. I was able to brutalize (with considerable effort) previous versions, but not anymore ;-)

    --
    Dimitri


  37. Dimitri: "Do you have any performance (and feature) comparisions to Gemstone PCA (Facets?), or to Javlin (Excelon), or to Persistence (PowerTier)? They all offer distributed transactional caching solutions."

    No, but note that our transactional solution is not yet commercialized (although the pilot software is included in the 1.2 release ... if you look in the lib directory you'll see the RAR).

    Our performance information is based on our own internal testing and from customers who evaluated our software with respect to other solutions. According to the customers that have tested our software, for what they were doing, we were the best performing solution, and in most cases we were the least expensive solution.

    One reason for the performance may be that our solution is pure Java, so there is no translation phase, no special JVM, no JNI, etc. There are undoubtably faster products for certain point solutions, like Gemstone's shared memory model for multiple processes on a single server. Also, some of those other products are full-blown object-oriented databases, so looking only the distributed cache aspect is a major disservice to them.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  38. <quote>
    some of those other products are full-blown object-oriented databases, so looking only the distributed cache aspect is a major disservice to them.
    </quote>

    Why not? They re-focused on distributed caching aspect themselves - Gemstone for example, after exiting J2EE server business.

    --
    Dimitri
  39. I have some questions, if anybody cares to answer them. Mostly it's just curiosity, so it's not urgent ...

    1) What is the CPU overhead of using Tangosol 1.2? In the case of optimistic concurrency control? In the case of pessimistic concurrency control?

    2) How does Tangosol 1.2 handle updates to the database (Oracle and Sybase)? Does the database update the cache?

    Guglielmo
  40. 1) What is the CPU overhead of using Tangosol 1.2? In the case of optimistic concurrency control? In the case of pessimistic concurrency control?

    The CPU overhead is generally very low, but it is relative to the amount of work that you are asking the cache services to do. Our implementation of message handling in the cluster is a "constant-order" implementation, which ensures that the CPU utilization does not grow more than linearly with additional load and number of nodes in the cluster.

    2) How does Tangosol 1.2 handle updates to the database (Oracle and Sybase)? Does the database update the cache?

    Tangosol Coherence is an in-memory cache. The cache services expose events (insert, update, delete) that notify for all changes to the cache. The synchronization with the database, if necessary, is accomplished by the developer.

    Our 2.0 release is transactional, which means that if you work with the caches in a transaction, the server will use a two-phase commit protocol with the database(s) and the cache(s) to roll forward or back the transactions, keeping the cache in perfect sync with the database. Note that the updates to the cache are still being done by the developer.

    The 2.0 pilot (using standard CCI interfaces) is actually included in 1.2. Email our support at tangosol dot com address if you are interested in how to use the pilot or to get the necessary Weblogic 7.0 patches for Weblogic support.

    Gene Gleyzer
    Tangosol, Inc.
  41. Interesting. I always thought that clustering is in the realm of the App Server vendors (As part of their value add and differentiation).

    Didn't know there's a marketplace for 3rd party AppServer clustering software. Am I missing some subtle points here ?
  42. Interesting. I always thought that clustering is in the realm of the App Server vendors (As part of their value add and differentiation).

    Didn't know there's a marketplace for 3rd party AppServer clustering software. Am I missing some subtle points here ?
  43. Interesting. I always thought that one post was sufficient. Am I missing some subtle points here? ;-)
  44. Yes, you really have missed main point! it was coherent replication!
  45. Coherence 1.2.1 has now been released with transparent redundancy of distributed data and immediate distributed failover without any data loss.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Easily share live data across a cluster!