Home

News: Q&A with Tangosol's Cameron Purdy and Peter Utzschneider

  1. This short Q&A with Tangosol's Cameron Purdy and Peter Utzschneider describes what Tangosol's flagship Coherence product is: a data grid with notification, clustering, routing, and transactional capabilities. Q: Where did the term "Enterprise Hashmap" come from? What types of problems does Tangosol Coherence help developers solve? A few years back at a TSS Symposium conference in Vegas, Geir Magnusson Jr. was joking around and said, "Tangosol Coherence is the world's most expensive enterprise hash-map." Before working on the Geronimo and Harmony Apache projects at IBM and Intel, Geir was writing high-volume trading systems in New York, so he is very well acquainted with the requirement for scalable, reliable, performant data management. That's the problem that Tangosol Coherence solves: How to manage large amounts of live, transactional data that is being simultaneously accessed and manipulated by any number of servers; how to have that data instantly available to those servers; how to be notified when any of that data changes; how to efficiently route and respond to those changes; and how to make the data management infrastructure completely resilient to server failure. Q: Coherence was originally called a "clustered cache". Is Coherence an actual datastore? What is a Data Grid and how does Coherence fit in a grid? When we first released Coherence, there were no terms to describe it, so we made up the term "clustered caching." It's a good term to describe what we were providing: An in-memory cache of data managed by a server cluster. If you run an application on ten servers, Coherence makes that application’s live data available to that application – the same live data on all ten servers. Tangosol Coherence is often used as a data-store for things such as user and security data for implementing single sign on (SSO), user session data for JEE application servers, and monitoring and statistical information for managing running applications. However, Coherence is not a database, so the master copy of data is usually managed persistently in a database. In concert with our cache partitioning technology, we pioneered clustered read-through, write-through, write-behind and refresh-ahead caching technologies to efficiently provide persistent data integration. In addition to supporting much higher performance and scalability, Coherence also provides an option that insulates against database failure by queueing transactions for write-behind when the database is unavailable. Regarding the term "grid," it is definitely over-used, but the goal is clear: Applications, services and components all need to be dynamically deployable, relocatable and scalable. Their functionality and the data that they depend on must all be continuously available, surviving failures of servers, networks and even data centers. Infrastructure capacity needs to be dynamically manageable, growing to meet the demands of the business. Data Grid technology is addressing the data access and management concerns that businesses are facing as they move toward a more dynamic and virtualized infrastructure. Unlike caching, a Data Grid provides data as a service to many applications across an enterprise, and has to provide security and management capabilities as part of that. A Data Grid not only manages data, but it also manages the processes that are permitted against that data, and the automated event routing and responses for that data. As an enterprise service, a Data Grid must also be language and platform neutral. Q: Tell me about how you are starting to open up Coherence to other platforms beyond Java. How are you doing this and what is the plan for the future? The short answer is that our data management uses a platform- and language-neutral Portable Object Format (POF), and our clustering uses a platform- and language-neutral Portable Invocation Format (PIF). The combination is called PIF/POF, and allows us to talk to and from almost any language, platform, Operating System and hardware. Today, customers of ours are using Microsoft C# with Coherence.NET to build rich client applications for Windows. We can support literally hundreds of thousands of concurrent desktop applications, each with a secure, live, shared, real-time view of data. And this PIF/POF technology can support C++, C#, Ruby, Python, VB, Excel, PHP, Perl and just about any other language. Q: I understand that in this 3.2 release you have enhanced a lot of your server clustering technology. Can you explain to us what you have done and what the actual customer benefits are? In this release, we have replaced major portions of our clustering technology, and the results are amazing. We can now detect a remote garbage collection within milliseconds, and dynamically reshape the cluster's traffic to avoid overheating that node. Our flow control is so precise that it will automatically achieve the theoretical maximum throughput on a ten-gigabit network as easily as on a ten-megabit network, and will automatically tune itself to balance that throughput against packet loss for optimal results. We have cut the number of context switches dramatically, and we now dynamically adjust the level of algorithmic parallelism for each node to optimize for the server hardware given the current load factors. The self-tuning algorithms, flow control and traffic shaping features have propelled us years ahead of the technology curve, and help to explain how Coherence is the lowest latency, highest throughput and most scalable clustered data management system in existence. For our customers, their applications will run unchanged, but they will run even faster and more reliably on their existing hardware. This 3.2 release is also notable because it is the first release that we have focused specifically on cluster latency, largely related to specific requests from telecom and financial services customers. With these improvements, on commodity hardware and under relatively heavy load, we are able to handle well over 99% of typical operations in under two milliseconds – and that's with "five nines" resiliency! Q: What are some of the technical challenges that customers are using Coherence for today? We talk about the general uses as being distributed caching, analytics, transactions and event processing. I think that the caching is fairly well understood, while some of the work our customers do with analytics and transactions is pretty advanced. For example, Coherence provides the ability to parallelize operations across the entire cluster, including standard aggregations and more involved calculations such as Monte Carlo simulations and complicated compliance checks. All of this is delivered with a simple programming model, as Coherence handles both distribution of processing and recovery from server failures. Customers are using this to scale out analytics using commodity hardware and their existing skill-sets. Likewise, years of investment in hardened fault-tolerance and data integrity features mean that transactions can be processed entirely in-memory, a key competitive differentiator for us. Customers have started using our core data grid technology as a transaction processing platform in combination with our Coherence*Extend technology as a service interface. This allows customers to implement SOA architectures without the requirement for cumbersome interchange technologies. Applications on any platform simply use the appropriate Coherence platform technology to use the data grid as a service – all of the data and interfaces appear and function natively. Our customers are already working with our .NET implementation and our cross-platform C++ implementation will be available in the market early next year. Q: How do you see the direction of your market space evolving moving forward? The push toward SOA and virtualization is driving many of the investments that our customers are making. In the past year, industry press has covered announcements from Wachovia, Wells Fargo, Putnam Investments, Starwood Hotels, GEICO Insurance, each of which has publicly adopted Tangosol Coherence as a cornerstone of their grid, SOA and virtualization initiatives. For us, this means we have to make adoption faster, development simpler, operations more transparent and access universal. Q: So when is the "Enterprise Hashmap" book coming out? Very funny, Joe. If you’re interested in the online version of the book, just take a peek at our Wiki, located at http://wiki.tangosol.com.

    Threaded Messages (21)

  2. Performance expectations[ Go to top ]

    Good interview. Cameron presented Coherence at the Software Developers Forum (http://www.sdforum.org) last year and the distributed hashmap idea was something I could get my head around quickly. To developers given a choice of making a call to a hashmap or a Hibernate-JDBC driver interface to get at transactional data, I imagine the hashmap is easier to maintain. What do you think? Secondly, how should someone considering Coherence evaluate performance and scalability? For instance: - In a Coherence system with say 50 servers connected by gigibit ethernet, what is the latency between when one node updates and value and when the value is available to the other 49 servers? - How does one measure network overhead for Coherene's updates? - How does Coherence perform over long periods of time? For instance, is there a performance hit if my transactions take 2-3 hours each? - Some of the data I'd like to store is in XML form and I'd like to use an XPath or XQuery expression to query the hashmap to find my data. Is this possible using Coherence? -Frank http://www.pushtotest.com http://www.xquerynow.com
  3. Re: Performance expectations[ Go to top ]

    I have used Coherence on two projects and loved it. In my very personal opinion, it's a graceful product: very well thought of, lovingly engineered, carefully polished. For me, it's a Swiss Army knife-and I am no Jack the Ripper. Coherence is easy to use, but make no mistake about it: professional tools require professional handling. To answer Frank's questions, from my poking around... (i) Coherence is no replacement for a relational database, but it gives you a chance not to use database for something it wasn't really intended for (cf, the CAP theorem.) (ii) depends on your topology and what you are trying to persist. For a common case of a *partitioned* cache on a blade, may be below 1 ms and doesn't have to be related to the number of nodes. (iii) The latency is OK; quite good actually if you don't use transactions. Their proprietary transport would try to adjust to the QOS of the actual network, but basically it's UDP underneath. There are better products if all you care of is latency. Coherence gives you high scalability, intelligent state partioning, and excellent data coherence (doh!). In my experience, switching to Infiniband form GigE is a big win. But also, I've seen a strictly linear, by-the-ruler, scaling to nearly 400 cache nodes. YMMV. (iv) you might feel the pain, not that it has anything to do with Coherence. :-) The cluster will be there when you are back. (v) no idea
  4. Re: Performance expectations[ Go to top ]

    (iii) The latency is OK; quite good actually if you don't use transactions. Their proprietary transport would try to adjust to the QOS of the actual network, but basically it's UDP underneath. There are better products if all you care of is latency. Coherence gives you high scalability, intelligent state partioning, and excellent data coherence (doh!). In my experience, switching to Infiniband form GigE is a big win. But also, I've seen a strictly linear, by-the-ruler, scaling to nearly 400 cache nodes.
    Infiniband works fine (we've done extensive testing with 3.2) but we're not seeing the benefits that we would expect with it. We are working closely with the Infiniband vendors and the JVM vendors to figure out why, since Infiniband should _theoretically_ spank all the other options. (It should have all of: less context switches, higher bandwidth and lower latency.) We are running full duplex 20Gb Infiniband, which should give us 2.5GB/s in each direction. It could be the OS (Linux experimental kernel build from Intel), drivers (Mellanox I think), JVM (Sun in this case), etc., since they all have to work together to do the routing through Infiniband instead of via the entire IP stack. At any rate, these companies have been great to work with and we'll get to the bottom of it with them, which will mean that Java developers will be able to push data around using the current datagram sockets in Java, while actually having the data fly over raw Infiniband (no IP stack emulation!) We also are testing on 10G Ethernet, and pushing ~400MB/s per node through Coherence (which is a lot of data getting pushed out of one Ethernet port on a commodity server). With our "pure Java" simple test framework (not the guaranteed-delivery high level functions in Coherence) we can push 600MB/s over the network (!!!), which is the theoretical limit of the PCIx I/O bus in that server, so you can see that Java has come a long, long way. At the other extreme, we still do lots of testing with 100Mb Ethernet, and even 10Mb Ethernet (to test the self-tuning features). Peace, Cameron Purdy Tangosol Coherence: The Java Data Grid
  5. On Infiniband[ Go to top ]

    Cam, Sergey, We met with Cisco about Infiniband and GigE offerings and although they were talking about a theoretical ms latency of 4ms if you used MPI, if you layered IP over Infiniband you were looking at about 14ms latency. Their opinion was that if you were looking at around 8-10ms for GigE then running IB everywhere might not really help much. Which would kind of sit with Cam's experiance. What sort of improvement did you see Sergey? Who'd you buy yer kit off? What was the cost in relation to typical GigE kit? Regards, Max
  6. sorry[ Go to top ]

    I mean micro seconds
  7. Re: On Infiniband[ Go to top ]

    I am not ready to provide full technical details, partly for non-technical reasons. Suffice it to say, all of them were somewhat typical financial risk systems, not Google or anything :-) I tested on at least 4 different blades. One of them was Dell, another, EGenera. All but one ran dual Xeons, Linux 2.4 (large memory build--ie. not so ancient). The worker thread count was set to a minimum. Cannot quite recall the IC make, 'cause "that's what you got, son", I was customarily told. Please keep in mind that a 200-machine grid IS expensive, and not everyone needs it. A nice toy, that Lear Jet!, Dad would say (he was a civil engineer). The numbers are, of course, very app specific. Let's say I have a message processing time of 17 ms over GigE (with the 15% confidence, if you insist). A profiler would show at least 85% of CPU is in the Coherence code. Switch to IB: get 7 ms. Switch to IB + competition: get 2 ms (at 25% a redelivery rate!, Coherence has almost 0). Scale the competition out to 16 nodes--dies; Coherence was actually hyperlinear between 4 and 14 nodes. @Guglielmo: But, of course, RDMA is faster. Not that I care. The fastest array processing alogithm, especially on HotSpot: void handle(int[] data) { return; } Don't pick nits. A break would be helpful. :-)
  8. Correction[ Go to top ]

    I was multitasking, typed into a wrong edit box. :(
    with the 15% confidence, if you insist
    Sorry, that was really emabarassing... The numbers I presented were quite accurate, ie small std dev. Processing a message involved 2 cache reads and 2 writes (in a TX), but almost no calculations.
  9. Re: On Infiniband[ Go to top ]

    But, of course, RDMA is faster. Not that I care. The fastest array processing alogithm, especially on HotSpot:
    void handle(int[] data) { return; }
    I don't understand what you mean by bringing an example of a trivial workload. As for me, what I was saying is that the most effective design for a system with virtualized communication is different from the most effective design for a system with ethernet, which relies on context switches. Of course, you should see a big improvement in latency and throughput, but if you solve the problem again top down by assuming virtualized communication you will probably get even better.
    Don't pick nits. A break would be helpful. :-)
    "I never gave them hell, I just tell the truth and they think it's hell." Truman, 1956 http://home.att.net/~howingtons/dem/truman.html Seriously, I think we would all benefit from having a java api for RDMA. Specifically, it should be modeled on UDAPL: http://www.datcollaborative.org/udapl.html Then we could use infiniband like it was meant to be used.
  10. Re: On Infiniband[ Go to top ]

    [..] I was saying is that the most effective design for a system with virtualized communication is different from the most effective design for a system with ethernet, which relies on context switches.
    It does not "rely" on context switches per se; it just so happens that all the major implementations of the driver stack tend to require ring changes and thread changes for certain operations. It's typical hardware driver stuff on the ring side, and obvious requirements for thread switches due to the IP ports being spread across any number of user mode applications.
    "I never gave them hell, I just tell the
    truth and they think it's hell."
    Truman, 1956
    That's an odd selection for a quote in this conversation. I prefer this one:
    Good breeding consists in concealing how much we think of ourselves and how little we think of the other person. - Mark Twain
    Guglielmo:
    I think we would all benefit from having a java api for RDMA. Specifically, it should be modeled on UDAPL ..
    That sounds reasonable. We were willing to invest in native RDMA bindings (to skip MPI, which is not well regarded outside of academia). However, the IB vendors themselves counciled strongly against it, and directed us to the solution that we are currently working on.
    Then we could use infiniband like it was meant to be used.
    Herein is the problem: We can choose to be the slave to technology or the master of it. I can certainly appreciate the brilliant engineering behind IB, but when it comes to _using_ it, I would prefer to use it in the manner that actually helps our customers, rather than to use it in the manner that makes IB itself most content.
    I am the master of my fate: I am the captain of my soul. - Henley
    ;-) Peace, Cameron Purdy Tangosol Coherence: The Java Data Grid
  11. Re: On Infiniband[ Go to top ]

    Good breeding consists in concealing how much we think of ourselves and how little we think of the other person.
    - Mark Twain
    I think highly of you, so this quote doesn't bother me in the least.
    We were willing to invest in native RDMA bindings (to skip MPI, which is not well regarded outside of academia). However, the IB vendors themselves counciled strongly against it, and directed us to the solution that we are currently working on.
    That's good advice for them. They prefer to build one API instead of two. And they would like their API to have the greates possible market penetration. Solution: just stick a socket api on top of infiniband. But for you that is not good advice. You care about what the api does, not about Mellanox's market penetration.
    Herein is the problem: We can choose to be the slave to technology or the master of it. I can certainly appreciate the brilliant engineering behind IB, but when it comes to _using_ it, I would prefer to use it in the manner that actually helps our customers, rather than to use it in the manner that makes IB itself most content.
    It might make _you_ more content if you use IB using UDAPL because you'll probably get more out of the hardware. Once you have IB, putting the data where it's going to be used is not such a good fit. I know it works, and you are getting great performance because IB is so good, but I am saying that unless you have had enough of innovating you can go beyond. But I am saying that to the techie, not the businessman.
    I am the master of my fate:
    I am the captain of my soul.
    - Henley
    Somebody else could build a competing product which _assumes_ rdma. So, yes, you can make your own choices, and you should consider going after this because there might be something here. Guglielmo
  12. Re: On Infiniband[ Go to top ]

    Good breeding consists in concealing how much we think of ourselves and how little we think of the other person.
    - Mark Twain


    I think highly of you, so this quote doesn't bother me in the least.
    Nor was it meant to. I was going for the ironic angle. It is one of my favorite quotes, although with Mark Twain, it's incredibly hard to pick only a few to have as favorites.
    We were willing to invest in native RDMA bindings (to skip MPI, which is not well regarded outside of academia). However, the IB vendors themselves counciled strongly against it, and directed us to the solution that we are currently working on.


    That's good advice for them. They prefer to build one API instead of two. And they would like their API to have the greates possible market penetration. Solution: just stick a socket api on top of infiniband. But for you that is not good advice. You care about what the api does, not about Mellanox's market penetration.
    Very true. If our performance is not what we expect with the UDP API to IB translation (which is supposed to have very little overhead), then we'll revisit it. Like I said, we were more than willing to do the hard work in the first place.
    It might make _you_ more content if you use IB using UDAPL because you'll probably get more out of the hardware.
    Right. As above.
    you should consider going after this because there might be something here.
    Believe it or not Guglielmo, your previous posts on IB were what made me start investigating supporting IB in the first place, so perhaps Mellanox has you to thank ;-) Peace, Cameron Purdy Tangosol Coherence: The Java Data Grid
  13. Re: On Infiniband[ Go to top ]

    Believe it or not Guglielmo, your previous posts on IB were what made me start investigating supporting IB in the first place, so perhaps Mellanox has you to thank ;-)
    In case they approach you trying to give me a large cash award let me know and I'll give you my account number and routing number ;-) On a serious note, thanks for saying that. It was nice to hear.
  14. Re: On Infiniband[ Go to top ]

    Sorry for posting this in the wrong forum. Being new to the concept of XTP, I 've been fascinated at the capabilities of Coherence. And whereever I 've searched on the web, it 's all giving how great the product is and how amazing Cameron is :) But I wanted to know if there are any downside in introducing Coherence into the Enterprise.. Like more diskspace required on each of the clustered server equivalent to that of N% of the database size etc.. Appreciate any response to this.. and forgive me for posting a novice question. ~~Karthik
  15. Apologies[ Go to top ]

    Thank you Cameron, for bringing up that quote from Mark Twain. I admit my previous post was arrogant and, with copy-and-paste-errors and all, comically pompous. Guglielmo, I apologize for that posting. I simply misunderstood what you were saying, but even then, I should've chosen a more civilized tone. I deeply regret turning a useful discussion into a rude and meaningless flame. I am sorry, guys.
  16. Re: Apologies[ Go to top ]

    Thank you Cameron, for bringing up that quote from Mark Twain. I admit my previous post was arrogant and, with copy-and-paste-errors and all, comically pompous.

    Guglielmo, I apologize for that posting. I simply misunderstood what you were saying, but even then, I should've chosen a more civilized tone. I deeply regret turning a useful discussion into a rude and meaningless flame.

    I am sorry, guys.
    Don't worry, it wasn't too bad. I will tell you though, it takes a certain kind of man to be able to apologize. More power to you! I look forward to seeing you again here on TSS.
  17. Re: Performance expectations[ Go to top ]

    Infiniband works fine (we've done extensive testing with 3.2) but we're not seeing the benefits that we would expect with it. We are working closely with the Infiniband vendors and the JVM vendors to figure out why, since Infiniband should _theoretically_ spank all the other options. (It should have all of: less context switches, higher bandwidth and lower latency.)

    We are running full duplex 20Gb Infiniband, which should give us 2.5GB/s in each direction.
    I am sure you are going to solve the problem, but this is not really "using" infiniband. If you really want to use infiniband for data sharing you should use infiniband's RDMA capability (remote direct memory access,) not using it as an asynchronous transport (like UDP.) But of course Coherence was designed with UDP in mind, so this would require a complete redesign. Specifically, with RDMA you would not push the data to where it's going to be used. You can use it from anywhere you like, because the latency will be ten microseconds or so. You would only keep replicas of data for fault tolerance purposes. BTW, I think the IB latency record is PathScale's HCA (host channel adapter). It plugs directly into hypertransport, thus skipping the bridge to PCI: http://www.deltacomputer.de/produkte/cluster/Pathscale_HTX.shtml
  18. Re: Performance expectations[ Go to top ]

    I am sure you are going to solve the problem, but this is not really "using" infiniband.

    If you really want to use infiniband for data sharing you should use infiniband's RDMA capability (remote direct memory access,) not using it as an asynchronous transport (like UDP.) But of course Coherence was designed with UDP in mind, so this would require a complete redesign.

    Specifically, with RDMA you would not push the data to where it's going to be used. You can use it from anywhere you like, because the latency will be ten microseconds or so. You would only keep replicas of data for fault tolerance purposes.
    Coherence is providing much more than raw shared memory (a la RDMA); it is exposing object management services. To consume those services, we must pass messages to and from those services, and hence the comms-based view of the world (more like MPI instead of RDMA). If the service were just raw memory access, you would be correct, and I have seen RDMA access latencies as low as 2.5us. Peace, Cameron Purdy Tangosol Coherence: The Java Data Grid
  19. Re: Performance expectations[ Go to top ]

    Coherence is providing much more than raw shared memory (a la RDMA); it is exposing object management services. To consume those services, we must pass messages to and from those services, and hence the comms-based view of the world (more like MPI instead of RDMA).

    If the service were just raw memory access, you would be correct
    What services do you need beyond knowing where the data is? I guess I must be missing some major piece of functionality here. I thought Coherence was designed to alleviate the database bottleneck, where the workload characteristics allow it.
    , and I have seen RDMA access latencies as low as 2.5us.
    PathScale reports 1.4us, I think. But I think that's probably an empty packet.
  20. Re: Performance expectations[ Go to top ]

    Coherence is providing much more than raw shared memory (a la RDMA); it is exposing object management services. To consume those services, we must pass messages to and from those services, and hence the comms-based view of the world (more like MPI instead of RDMA).

    If the service were just raw memory access, you would be correct


    What services do you need beyond knowing where the data is? I guess I must be missing some major piece of functionality here. I thought Coherence was designed to alleviate the database bottleneck, where the workload characteristics allow it.
    How about resilience of addressing on the scenario when you assumed you know where the data is, but suddenly that assumption is no more valid (that node you assumed you should go to dies, or the place of the data changes)? How about locking? How about redundancy? I do not really know Infiniband. Does it provide these? Not to mention the other kinds services which Coherence provides, although they really don't fit under the name cache... BR, Robert
  21. Re: Performance expectations[ Go to top ]

    How about resilience of addressing on the scenario when you assumed you know where the data is, but suddenly that assumption is no more valid (that node you assumed you should go to dies, or the place of the data changes)?
    Yes, you have to do that. But since the latency is so low you can keep that data in a central directory (replicated for availability.)
    How about locking?
    Yes, you have to do that. But since the latency is so low, you can keep that data in a central directory (replicated for availability.)
    How about redundancy?
    If you are caching data which is in a database, you don't need to replicate it. Just roll back the transaction and re-load. If you want to replicate it because reloading it will take too long, then go ahead and replicate it, but then how come the first time you loaded it it was okay to get it from the database? If you do not have a database at all, then you do have to replicate the data. But the data never has to move except if one of the replicas leaves the cluster.
    I do not really know Infiniband. Does it provide these?
    No. But it's designed to turn the a cluster into a NUMA machine (Non Uniform Memory Access.) It's not intended to speed up UDP. As I said, it's a different mindset altogether.
    Not to mention the other kinds services which Coherence provides, although they really don't fit under the name cache...
    The problem of how to scale an application on a cluster when you have the benefit of virtualized communication (i.e. no context switches) has already been solved by IBM's coupling facility (CF). http://researchweb.watson.ibm.com/journal/sj/362/nick.html Except that IBM then had to build their own hardware, whereas infinband is accessible to anyone.
  22. Re: Performance expectations[ Go to top ]

    Just to answer a couple of these:
    - In a Coherence system with say 50 servers connected by gigibit ethernet, what is the latency between when one node updates and value and when the value is available to the other 49 servers? - How does one measure network overhead for Coherene's updates?
    See the following URL: http://forums.tangosol.com/thread.jspa?threadID=1158&tstart=0 It shows Coherence handling over 99% of operations in less than 2ms while failover is occurring (a worst case scenario). IIRC, those operations were 4 composite 1KB operations, so a virtual 4KB operation (with a full level of clustered redundancy) occurring in less than 2ms with one of the servers dying while the test was running. In terms of network overhead, it's all related to where your application tells Coherence that it needs the data times the size of the data that needs to be moved. That would seem to be the theoretical minimum, but with some of our algorithms, it is magically even less than that. All of this was much improved in our latest (3.2) release.
    - Some of the data I'd like to store is in XML form and I'd like to use an XPath or XQuery expression to query the hashmap to find my data. Is this possible using Coherence?
    Yes, but not necessarily the most efficient choice (there are XML databases, and various more specialized products). However, Coherence will handle it without blinking, because the "objects" that are managed can be defined by you, the "extractors" that pull data out of those objects to index (or query on) can be defined by you, and the queries themselves can be defined by you. For example, if your objects have a toString() that generates XML, then you can write a query (Filter interface) on the toString() of those objects that will use XPath or XQuery to evaluate the results of the query, and it will be done in parallel across whatever servers you have in the cluster (e.g. with Sergey's 400 nodes, each would do 1/400th of the work). Peace, Cameron Purdy Tangosol Coherence: Clustered Caching for Java