Tangosol Releases New Version of Coherence

Discussions

News: Tangosol Releases New Version of Coherence

  1. Tangosol Releases New Version of Coherence (22 messages)

    Tangosol has released the latest version Coherence (1.2.1), its pure Java Distributed Cache product for J2EE AppServers. The new version adds configurable redundancy of data and configurable compression of its cluster communication protocol.

    Check out Coherence.

    Press Release
    --------------------------
    Somerville Mass -- September 19, 2002 -- Tangosol, Inc. announces immediate availability of Coherence 1.2.1, its enterprise-class clustered data management and caching product. With new support for data redundancy managed by its Distributed Cache service, Coherence is the only Java solution that provides support for massive data caching and management in a clustered J2EE application. By splitting up a massive cache of data among the servers in a cluster, Coherence allows high-scale enterprise applications to increase their effective cache potential through cost-effective horizontal scale. By automatically balancing data management across the cluster, and ensuring that each piece of data is stored redundantly, Coherence-enabled applications can survive server failure, immediately failing over data management responsibilities to available nodes. By leveraging an asynchronous peer-to-peer based clustering protocol, Coherence scales far higher than competing solutions and eliminates all single points of failure.

    Also new in release 1.2.1 is improved WAN clustering support through configurable compression of the cluster communication protocol. Coherence is the only clustering software designed to work effectively and efficiently on high-latency low-bandwidth networks, and now it can efficiently communicate higher volumes of data over low-bandwidth networks.

    Coherence 1.2.1 provides a new filter API for snapping in network filters, such as encryption filters and compression filters. Coherence customers now have the widest variety of options for managing the security and bandwidth utilization of their clustered applications.

    Additionally, Coherence provides:

    - Coherence HTTP Session Clustering replicates HTTP sessions in-memory for BEA WebLogic 7, IBM WebSphere 4, Apache Tomcat 4, and all Servlet 2.3 compliant application servers with support for both sticky and stateless load-balancing.

    - Coherence is now certified by IBM as a "Powered by WebSphere" application (new!)

    - Coherence Clustered Data Stores provide coherent replicated and distributed (network-partitioned) data storage in a cluster using the same standard Java API.

    - Coherence Size-Limited Cache provides the only clustered size-limited cache implementation with automatic timed expiration of data, scheduled cache flushes and a combination Most Recently Used + Most Frequently Used algorithm.

    - Coherence Network Fault Tolerance gracefully handles physical network interruption.

    - Coherence Death Detection automatically and quickly fails over dead and unresponsive servers.

    Download a free evaluation today at: http://www.tangosol.com/coherence.jsp

    Tangosol Coherence is licensed per CPU for production use, and is available for research and development use for no charge.

    Source code for the Coherence HTTP session replication modules for BEA WebLogic 7, IBM WebSphere 4, Apache Tomcat 4, and all Servlet 2.3 compliant application servers is available with a FREE development license. To request your free development license, email sales at tangosol dot com.

    Don't get fluster-clucked. Get Coherence.

    Threaded Messages (22)

  2. Hi,

    i'm glad to see this product evolve. However, i wonder if one could replace the low level protocol by the one we use : tibco RV.
    Tell me if i'm wrong, but i think powertier comes with such a feature....

    Thanks,

    Laurent.
  3. Laurent: "i'm glad to see this product evolve."

    Thanks. Have you had a chance to look at any of the recent releases?

    Laurent: "However, i wonder if one could replace the low level protocol by the one we use : tibco RV."

    No, unfortunately not. The underlying protocol in Coherence is cluster-aware, and if there had been a pre-existing reliable cluster-aware protocol, we probably would have used it. I think Tibco had the same problem too, which is why they ended up buying Talarian. With respect to their Rendezvous product, you can use it via JMS but you have to go through Tibco Enterprise for JMS (last time I checked).

    Also, we don't have the same federated store-and-forward approach, since all the end points in a Coherence cluster can message any other node (or collection of nodes) in a Coherence cluster directly and efficiently.

    One of our customers, before evaluating Coherence, evaluated JMS-based caching solutions (both their own implementation and one from a JMS vendor called SpiritSoft) and found Coherence to be significantly more reliable and scalably performant.

    Laurent: "Tell me if i'm wrong, but i think powertier comes with such a feature...."

    I don't believe so, but it has been a while since we've run into PowerTier. I know that some of the engineers at Persistence Software (maker of PowerTier) had managed to replace the default Persistence messaging framework with Rendezvous, but I think that was done through their professional services organization for a specific customer. That status could have changed though, particularly since PowerTier has been de-emphasizing their application server role and working to show their product integrating with other vendors' solutions (such as BEA WebLogic).

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Easily share live data across a cluster!
  4. Laurent : However, i wonder if one could replace the low level protocol by the one we use : tibco RV.


    SpiritCache also implements a distributed cache, but is based on the JCache JSR, using any network topology you wish from 1 tier to N tier using peer based, hierachial and clustered configurations with pluggable eviction, storage and pooling strategies.

    SpiritCache uses the JMS standard to distribute messages around the network, so any JMS provider can be used. The use of JMS provides a standards based approach to distributed caching, allowing the best of breed messaging to be used on your network with fault tolerance, loose coupling between your caches and high performance transport. JMS Queues provide load balancing across loosely coupled tiers, great for loading caches while JMS Topics provide efficient data distribution, great for distributing updates.

    Finally, to actually answer your question :-) SpiritSoft also has a free JMS provider for RV, so SpiritCache will happily work with RV as the underlying messaging transport - for example to cache market data or arbitrary data being sent on your RV message bus.

    If you want to read more about using JCache and JMS to implement distributed caching you could try reading our JavaOne presentation.

    There's also some white papers

    James
  5. Hi James,

    Thanks for your comments. I really enjoyed your JavaOne presentation. Can you comment on when JSR 107 will be published? We've been waiting over a year for the API (it was supposed to be published mid-June 2001). Unfortunately, due to how the JCP works, there is no public access to the API until the JSR is submitted for public comments, and strangely, my application to join the JSR was rejected.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Easily share live data across a cluster!
  6. JCache JSR[ Go to top ]

    Hi Cameron

    I'm glad you liked the JavaOne presentation :-)

    Unfortunately, like a lot of JSRs, JSR107 seemed to get a bit dormant due to various political issues mostly due to licencing.

    Thankfully now that things have cleared (and there can be an open source reference implementation etc.) There's now a much greater impetus in the Expert Group, so expect much quicker movement from here on in. So hopefully we can make significant progress on getting a public draft real soon.

    But alas I'm not the spec lead of this JSR and do not determine deadlines so I cannot give a firm date by which the first public draft will be out - but we all hope it will be soon.

    James
  7. I just wanted to add a note as a happy Coherence user. We've embedded the product into Jive Forums for quite some time now and have never had a significant issue with it. Tangosol has been excellent to work with. Just as importantly, the technology is very solid. We did an extensive evaluation of different solutions such as JavaGroups, building our own cache on top of JMS, SpritCache, and JavaSpaces. Coherence was clearly superior to the other solutions in terms of stability, speed, and ease-of-use.

    - Matt Tucker
    Jive Software
  8. If you have ever asked questions like, "How do I implement a cluster-wide singleton?", or, "How do I avoid making database round trips for shared, cluster-wide data?", you owe it to yourself to download, install, and give Coherence a test drive.

    Downloading, installing, and getting going is a snap. It literally takes 10 minutes.

    One of the most interesting features of Coherence, is that it runs in a separate jvm from your app server. This non-intrusive model allows Tangosol to use the JDK 1.4 java.nio features to create file-backed buffers to manage shared cache. Very cool, and allows you to still run your app server cluster in a different jvm version.

    Coherence has demonstrated replicating up to 10G of cache data and has been deployed successfully in 24 node clusters. It has the ability to monitor and detect network failures vs. jvm death and allows for a jvm to dynamically be removed from or join a cluster.

    But, please don't take my word for it. Go download it and see for yourself.

    Bill
  9. One of the most interesting features of Coherence,

    >is that it runs in a separate jvm from your app server.

    Hrm. Cameron, can you elaborate please?

    --
    Dimitri
  10. One of the most interesting features of Coherence,

    >>is that it runs in a separate jvm from your app server.

    >Hrm. Cameron, can you elaborate please?

    Coherence has a "replicated" and a "distributed" service (compare). The replicated service replicates data across all cluster nodes that sign up for that service. The distributed service (sometimes we call it "partitioned") spreads the data across the cluster so that each server holds 1/n of the data in a cluster of n servers. In other words we "load balance" the data management across the cluster.

    There are two main options with the distributed service. The first is the configurable number of backups, which is typically 0 or 1 but can be greater. With 1 backup, you can pull the plug on any server in the cluster and the service fails over without losing any data (including "in flight" data from the other servers). With 2 backups, you could lose two servers without losing any data (and so on).

    The other option is called "local storage enabled". It specifies whether or not a JVM is actually going to store any data at all. With the application server running on 1.3, you can set all the application server JVMs to local-storage-enabled of false. Then you can run a separate set of "cache server JVMs" on JDK 1.4 that run the distributed cache with local-storage-enabled of true. Doing so will let you take care of our cache using memory mapped files and direct buffers (both new 1.4 features). Here's a bit of a blurb from an email that I wrote on it recently: "Per named cache, it supports up to 2GB off heap (either direct buffers or memory mapped files). It features transparent incremental prime-modulo expansion (rehashes one bucket per cache access until the entire modulo is covered). It also features linear compaction, so on a memory mapped file, the "garbage collection" is tuned for disk I/O, and again, it is incremental, so no one operation is penalized unduly.
    We expect that the performance will be similar to on-heap data access. Plus, it can cache a couple GB of data with just a 16MB Java heap."

    As a BIG DISCLAIMER, I will say this, after working with NIO on 32-bit Windows, you will _certainly_ want to use a 64-bit OS to host those large caches using NIO. Windows just can't handle anything close to 2GB in a process address space without getting its knickers in a twist, and frankly I doubt that any other 32-bit OS is significantly better.

    We're also looking at support of up to 4 petabytes per cache -- still using pure Java (maybe NIO, maybe not). It's a question of diminishing returns, of course. Memory mapped files only support up to a 2GB window on the file, and the same is true for all Java NIO buffers (including direct buffers).

    BTW - I personally worked with a customer using WebLogic and our distributed cache service, and they were able to cut their app server heap from 800MB to 40MB. ;-)

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Easily share live data across a cluster!
  11. We have been following coherence since 1.1.x and have implemented a prototype of our new vehicle search engine product with it to prove to ourselves that it was a viable solution. After loading over 500,000 items into the cache and being able to retrieve thousands per second, let me just say that the coherence cache is top shelf. I don't even want to think about the amount of time I almost wasted trying to implement our own home-grown caching solution.

    We're so happy with how our initial testing went, we are definitely purchasing licenses for our search engine and we are considering applying the coherence cache as a replacement for JMS in our batch processing system.

    The people at Tangosol are great about lending a hand with your evaluation questions as well. I look forward to "playing" with this new release. Keep up the good work guys!
  12. Good news about the new release. We have been using Coherence in a clustered application server environment. We ended using Coherence for several portions of our distributed application...

    - for a notification system instead of JMS. Our notification system is blazing fast.

    - for a distributed task queue, where distributed servers can execute a task based on how busy they are.

    - for distributed synchronized state with very high performance.

    Overall, Coherence is easy to learn, works great and is blazingly fast for us.

    Dorothy
    Informative, Inc.
  13. Hi all, i am wondering to which extend the fact that the programmer is aware that there is a cache impact his data access methods.

    Obviously, when i need some data from a database, when ever possible i will do some bulk load however this strategy will not work well if there is a cache since the cache can not be used in that case. If i know there is a cache, i will fetch object one by one when ever possible to maximise the use of the cache.
  14. Hi Christian,

    Christian: "i am wondering to which extend the fact that the programmer is aware that there is a cache impact his data access methods."

    I'd like to point out first that Coherence is much more than a cache. You can use Coherence out-of-the-box to do HTTP session replication (WebLogic 7, WebSphere 4, Tomcat 4, plus all Servlet-2.3 containers like Resin, Orion, etc.) Coherence also provides cluster information (JVM membership, when they started, what machine they are on, etc.) and cluster-wide concurrency (locking/leasing) which allows you to coordinate tasks across the cluster.

    Regarding the impact of caching on how data access is done, the programmer is definitely aware that there is a cache, and it does impact the data access methods.

    In our 2.x product (coming in Q4), in addition to what Coherence already provides, we will add transactional CMP EJB caching without any code changes. (We can accomplish that because the beans and the transactions are fully descibed by the XML descriptors.)

    Christian: "Obviously, when i need some data from a database, when ever possible i will do some bulk load"

    Agreed. Relational databases are well-tuned for set-based access. Forcing a relational database to do multiple "reads" using a primary key to load a row is disastrous from a performance point of view and has single-handedly given CMP EJBs a bad name (mostly due to poor CMP engine implementations).

    OTOH, there are many times that an application is loading one piece of data, and cannot be set-optimized. In those cases particularly, a clustered coherent cache makes perfect sense.

    Christian: "If i know there is a cache, i will fetch object one by one when ever possible to maximise the use of the cache."

    Exactly. However, you can do much more than that. Several of our customers have developed mechanisms for selecting data directly from the cache, and we're working with one customer to add XQuery-style selection to their distributed caching (which allows the XQuery processing to be load-balanced across the entire cluster!)

    So some caches are read-only, some are read-write, some are even read-through/write-through (interposed on top of an existing data access mechanism). Some are just partial caches, and some will hold entire data sets. Some are accessed just by key access, and some are accessed by advanced selection methods.

    For most applications, caching provides the biggest bang-for-the-buck from a price/performance point of view. How much you take advantage of is completely up to you.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Easily share live data across a cluster!
  15. Distributed cache...[ Go to top ]

    Thanks for this quick reply Cameron !

    I have another question regarding the distribute cache.

    When we have a farm of dedicated application server machine where each machine has more than 2Go ram, the ideal solution appear to be able to distribute cache between all the jvm running on a single host ( typically one jvm for the application server+cache and other dedicated jvm for the cache) and then replicate this "global cache" among the cluster.

    I wonder if coherence can provide such feature and in that case if the communication between jvm on the same server are optimized (jvm on same host).
  16. Distributed cache...[ Go to top ]

    Hi Christian,

    Christian: "I have another question regarding the distribute cache. When we have a farm of dedicated application server machine where each machine has more than 2Go ram, the ideal solution appear to be able to distribute cache between all the jvm running on a single host ( typically one jvm for the application server+cache and other dedicated jvm for the cache) and then replicate this "global cache" among the cluster. I wonder if coherence can provide such feature and in that case if the communication between jvm on the same server are optimized (jvm on same host)."

    Actually, with a farm of machines (which Coherence makes into a cluster), the distributed cache sums the resources from those machines, so with 16 machines with 2GB of cache each, you could have 16GB of cache with a full level of redundancy (so losing a machine would not lose any data).

    While going across a switched backbone is not quite as fast as a loopback, it is very close. To drop latency even further, we provide a "near cache", which is a size-limited local write-through cache that sits on top of the distributed cache.

    Also, to minimize heap size with JDK 1.3, some of our customers run multiple distributed cache JVMs (sans app server) per machine, so that they can use up gigabytes of memory without suffering noticeable GC pauses. We will largely address that problem with our NIO implementation (v2.0 / Q4 '02) that allows a cache to sit on a memory mapped file or direct buffer. Also, the JVMs are improving with each JDK, so it is less an issue with each release.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Easily share live data across a cluster!
  17. Distributed cache...[ Go to top ]

    Cameron: "Also, to minimize heap size with JDK 1.3, some of our customers run multiple distributed cache JVMs (sans app server) per machine"

    Yes that was my main concern, to be able to leverage server that have much more memory than the jvm in which the application server runs and the kernel could use.

    Entry intel server based on the E7500 chipset can go up to 12Go ECC registered for example, before i would say that 3Go is enough (1Go kernel + 2Go jvm). To be able to run a "cache store" in other jvm(s) sounds pretty neat ;)

    Cameron: "We will largely address that problem with our NIO implementation (v2.0 / Q4 '02) that allows a cache to sit on a memory mapped file or direct buffer."

    That sounds better and better.
  18. Hi Cameron,
    Couple of questions regarding coherence....

    " You can use Coherence out-of-the-box to do HTTP session replication (WebLogic 7, WebSphere 4, Tomcat 4, plus all Servlet-2.3 containers like Resin, Orion, etc.) Coherence also provides cluster information (JVM membership, when they started, what machine they are on, etc.) and cluster-wide concurrency (locking/leasing) which allows you to coordinate tasks across the cluster....
    In our 2.x product (coming in Q4), in addition to what Coherence already provides, we will add transactional CMP EJB caching without any code changes. "

    In what situations does coherence provide advantages over these similar features in WebLogic?
    1. in-memory session replication (httpsession and stateful EJBs)

    2. Clustered JMS with transparent loadbalancing and failover.

    3. Cluster-wide Entity (CMP or BMP) Bean caching for read-only and read-mostly cases (arguably, read-write doesn't benefit much from caching)


    What types of environments do your customers have? (non-WebLogic?)

    regards,
    Matt


  19. Hi Matt,

    Matt: "In what situations does coherence provide advantages over these similar features in WebLogic?"

    First, to be clear, we don't compete directly against BEA ... in fact, about 60% of our customers run on the clustered BEA WebLogic platform.

    What we do offer are a few overlapping features and several features that are not found anywhere else, including with BEA WebLogic clustering.

    "1. in-memory session replication (httpsession and stateful EJBs)"

    Ours is a slightly different approach regarding HTTP session replication. Our replication is based on Coherence, which is a coherent data store or cache. Our "out of the box" solution uses a replicated cache, which means that every session is on every server, which would conceivably scale less well (e.g. a cluster of 8 or more servers). It does mean that every session is up to date on every server at all times, which means that any server (or group of servers) can die without losing an HTTP session. We also provide custom solutions for session management through our professional services organization (these typically combine replicated caches, distributed caches, and a backing JDBC store for overflow), and some very large WebLogic clusters are running Coherence, with several more scheduled to go live in Q4.

    OTOH, WebLogic replicates on changes, and only on access, and only to a secondary. That means with WebLogic that if you are cycling the entire cluster, one machine at a time, you will still lose sessions (this was verified by a BEA engineer on their newsgroups) because if a secondary goes down, the new secondary doesn't actually get the session data until that session is used by a new HTTP request, so if you cycle it 5 minutes later and the user hasn't hit a page in that period, the user loses their session.

    WebLogic also clusters stateful session EJBs (same primary/secondary approach), which we do not do.

    The main reason that people use Coherence is that the quality is extraordinarily high, and the availability levels are simply unmatched by any other clustering solution that is available, either an integrated one or an add-on option. Our performance and scalability are also extraordinarily high, as our customers have previously attested to. Lastly, we do cost less, but that's usually not a deciding factor for our customers, since most of them already have WebLogic clustering licenses.

    "2. Clustered JMS with transparent loadbalancing and failover."

    Coherence is not a JMS server. Some of our customers use it for things that only JMS has been able to solve in the past, and I think you can see that in some of the discussion above. Since Coherence provides a locking API and a coherent data store (with event notification) in the cluster, it's obvious that it can be used for scheduling background tasks, monitoring for changes, etc. However, we don't pretend that it is a JMS server. That's a crowded market with some good solutions already available, albeit none with true load balancing and few with transparent failover.

    WebLogic does _not_ have transparent loadbalancing and failover for clustered JMS queues, although in 7.0 it introduces features that are close to transparent failover.

    "3. Cluster-wide Entity (CMP or BMP) Bean caching for read-only and read-mostly cases (arguably, read-write doesn't benefit much from caching)"

    People use our software for entity bean caching today, but until we expose transaction support, it's not a perfect fit. (Currently we ship an adapater (connector architecture) with Coherence that works with WebLogic 7 and supports 2-phase commits via JTA, but it is not yet commercialized.)

    Right now, caching is usually used for non-transactional purposes (e.g. security data, display-only data) and the transactional work is reserved for EJBs that work with the database.

    You can define performance gains in terms of several metrics:

    rd = average database read cost
    rc = average cache read cost
    wd = average database write cost
    wc = average cache write cost
    w = percentage of total operations that are writes (0.0 <= w <= 1.0)
    h = cache hit percentage: the probability that a read will hit cache (0.0 <= h <= 1.0)

    To calculate the cost of an average database operation:
    cd = rd * (1-w) + wd * w

    To calculate the cost of an average database operation that may be cached (using read-through/write-through caching):
    cc = (rc * h + rd * (1-h)) * (1-w) + (wd + wc) * w

    As you can clearly see ;-) the cost can only increase if the cost of a cache read is higher than the cost of a database read or if the the cost of a cache write times the probability of a write exceeds the benefits of reading from the cache.

    With the replicated cache, rc=0ms (i.e. rounding error). That's a pretty compelling argument, since rd averages around 10ms for simple lightly-loaded applications and an order of magnitude worse for many scenarios. Note that w is usually under 0.02, even for intensive transactional appliations!

    We typically see a reduction in database reads well over 90% (I'd give you the real figures, but you wouldn't believe me ;-). Also, if writes are frequent, using a write-behind caching scheme will cut the database updates by a significant margin too, although that is application- and usage-pattern-specific. (Write behind caching is feasible because the unpersisted data will not be lost even if a server goes down.)

    "What types of environments do your customers have? (non-WebLogic?)"

    We see about 60% BEA WebLogic, 25% IBM WebSphere and 15% other (Oracle, Sun, HP, Caucho, Orion, Borland, Tomcat, and even non-J2EE). Traditionally, we've had a great relationship with BEA and are well known by many WebLogic customers. On the database side, we see about 70% Oracle and 30% other (Sybase, DB2/UDB, occasionally MS SQL Server). For platforms, we see (in this order) Sun, IBM, HP, Intel (Windows and Linux about the same %). For JMS, a lot of customers use whatever comes with the server, but we see just about everything (IBM MQS, SwiftMQ, SonicMQ, Fior., Tibco, SpiritSoft). There's a big performance delta (almost unbelievable) between some of these offerings. That would be a good article in and of itself!

    There is no "typical" cluster size, rather it is a curve with the biggest lump between 2 and 4 servers and it decreases from there. That's because clustering is often used for availability reasons, even when scalability is not required, so we see a lot of 2-server clusters. With an n+1 approach, if you need 3 servers, you'll have 4, so we often see customers with 3 or 4 servers in the 4xCPU size (e.g. Sun e450). Clusters bigger than 16 servers are rare, but those are the ones that we like the best ;-), because the software really shines there. With the distributed cache, we're starting to see the cluster architecture being designed around the Coherence capability, such as a large number of small (1x or 2xCPU) RAM-heavy servers. By the end of Q4, there will be several multi-multi-GB caches in production on our distributed cache implementation, which isn't bad considering it was just released in Q3!

    I wasn't sure exactly what info you were looking for, so if I didn't answer the question fully, let me know.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Easily share live data across a cluster!
  20. Thank you, Very thorough answer! But I'm still a little confused:

    Since the cache is normally non-transactional, then am I not correct in presuming that for normal OLTP transaction costs "read-write data" (w >= 0.5)would not benefit much from caching (since rd is low to start with):

    cc = (rc * h + rd * (1-h)) * (1-w) + (wd + wc) * w
     For transactional cache, really becomes
    cc = (rc * h + rd * (1-h)) * (1-w) + wd

    and the delta becomes ( which would be small if rd is small to start with):

    delta_cc = (w-1)*(rd-rc) or (less than half)*(rd-rc)

    Obviously, If every two reads were separated by at least one write the cache would not add any value.

    So cache-benefit is proportional to rd (assuming rc is small) and limited by the chance of a repeated reads before a write. Is this interpretation correct?


    Thanks,
    Matt
  21. Matt: "Thank you, Very thorough answer! But I'm still a little confused:"

    Me too now ;-)

    Matt: "Since the cache is normally non-transactional, then am I not correct in presuming that for normal OLTP transaction costs "read-write data" (w >= 0.5)would not benefit much from caching (since rd is low to start with):"

    I'll give you real-world numbers from an enterprise application with high end (64xCPU) database server:

    rd = 11ms (single row select, pk access)
    wd = 78ms (single row insert/update, pk access)
    rc = 0ms
    wc = <2ms

    Matt: "Obviously, If every two reads were separated by at least one write the cache would not add any value."

    We have never seen a case, other than logging (write-only), that caching did not cut the average latency. That is because rc and wc are tiny, and rd and wd are (relatively) huge.

    Further, as you cut the database usage, the "important" database stuff will actually run faster, so even processes that don't benefit directly from cache will be faster.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Easily share live data across a cluster!
  22. Cameron,

    What type of collection do you use for large sets of cached data?

    Does data set size not affect cache read cost at all?

    matt
  23. Matt: "What type of collection do you use for large sets of cached data?"

    By default we use SafeHashMap, which is a high-concurrency thread-safe hashed Map implementation. See com.tangosol.util.SafeHashMap (or its subclass, com.tangosol.util.ObservableHashMap).

    Matt: "Does data set size not affect cache read cost at all?"

    On the replicated cache, since the data is local, it is a function of read access on a hashed data structure whose modulo is typically higher than its size, which (theoretically) provides O(k) access times (constant order implementation). That's not necessarily true with huge data sets because hash functions are notoriously clumpy (basically the "harmonic" of a function in number theory).

    On the distributed cache, we use the same underlying data structures, but each server only manages 1/n of the data (for n servers in a cluster) so although there is the potential for one round trip on the network to do the get, the actual data access inside the data structure is still O(k). For the distributed cache, 99% of the latency (typically 1-2ms) is in the network itself.

    Have you had a chance to download and test it? I'm curious what your first impressions are. Thanks!

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Easily share live data across a cluster!