Discussions

News: TechTalk with Cameron Purdy on Caching

  1. TechTalk with Cameron Purdy on Caching (40 messages)

    Cameron Purdy, JCache spec lead, founder of Tangosol, and TSS regular, talks about caching theory such as read through, write through and right behind cahcing, Tangosol's Coherence product (which also powers TSS), the JCache API, and a number of in the field caching stories from working with truly large scale enterprise applications.

    Watch: Cameron Purdy on Caching

    Cameron Purdy will also be presenting at TheServerSide Java Symposium.

    Threaded Messages (40)

  2. Thanks a million[ Go to top ]

    That was an excellent interview. Thanks to TSS for doing it. thank to cameron for taking time too.
    was at a certain government organization that shall remain nameless that had two E10,000's that they had completely saturated with 200 concurrent users, and they had completely saturated two E10,000's with a Java application.

    this sentence made me laugh so hard. I've seen this happen. Bad design and code can easily bog down a Sun E class server.

    peter
  3. Cameron was saying in his interview:
    Our protocol is called TCMP, [...] basically it is a clustered protocol, meaning that down at the protocol level it actually has full awareness of all the issues of clustering; all the members in the cluster, you know the various states that they might be going through, whether they have gone into a zombie state or a dead state, things like that. The reason we did not use JMS, [...] is because it doesn't have a concept of clustering per se. You can have clustered JMS, or clustered JMS implementations out there, but it doesn't actually provide information like this server has come up and joined peer-to-peer clustering, and this server has died, and this server is exposing these services, and things like that.

    Of course, Cameron's reply is very true in regard with JMS and absolutely understandable for anyone who has spent at least couple of sleepless nights struggling with the headaches of clustering. But...

    I don't even know if Cameron is gonna read this thread, and still - it would be very interesting to hear his take on Javagroups. How does it compare with TCMP today? Did they create TCMP because Javagroups was not there, yet or are there some other issues, still left?

    IMHO, Javagroups is a de-facto standard protocol for clustered communication, it is quite solid and used in a lot of projects, so if it is missing something it would be very interesting to hear about it from a person whose main product happens to be a leading product in the J2EE world havily utilizing clustering (I mean Coherence).
  4. JavaGroups[ Go to top ]

    it would be very interesting to hear his take on Javagroups

    I won't speak for Cameron, but I'll bet you a lot of money that JavaGroups is not a candidate for replacing "TCMP".

    God bless,
    -Toby Reyelts
  5. JavaGroups[ Go to top ]

    it would be very interesting to hear his take on Javagroups

    He's given his views on javagroups a few times on here as you can see if you search on some older threads.
  6. it would be very interesting to hear his take on Javagroups. How does it compare with TCMP today? Did they create TCMP because Javagroups was not there, yet or are there some other issues, still left?

    IMHO, Javagroups is a de-facto standard protocol for clustered communication, it is quite solid and used in a lot of projects, so if it is missing something it would be very interesting to hear about it from a person whose main product happens to be a leading product in the J2EE world havily utilizing clustering (I mean Coherence).

    The story of why we didn't use Javagroups is actually somewhat humorous: Basically, google let us down ;-) .. which is to say that we didn't really know what we were looking for regarding clustered protocols when we were searching for a pre-existing solution, so Javagroups didn't turn up in our list of potential solutions. Had we known about it, we certainly would have at least considered using it, since it would have seemed to offer a headstart.

    However, having since evaluated Javagroups on more than one occasion, I think that we made the right choice. As an explanation, the first two versions of TCMP were scrapped before they ever even saw the light of day, because they exhibited the "we can't quite eliminate the edge conditions" problems. That's the same issue that Javagroups has, and so using it presents a series of unsolvable problems, such that you cannot achieve (as in, logically predict or mathematically model) reliable clustering with it, meaning that using it you will perpetually be solving that endless last 1% of problems. Traditional approaches to programming cannot solve that last 1% of distributed computing problems, which is why we modeled TCMP (the third iteration, the one that shipped in Coherence 1.0) as a finite state machine.

    Perhaps someone from Intamission (Autevo) could comment more specifically .. they use Javagroups to implement Javaspaces, and I've heard some stories third-hand ..

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  7. JavaGroups[ Go to top ]

    Cameron is right. It's the edge conditions that are the major problems with group membership and, yes, state machines are a must given the complexities that can arise in the edge conditions. A friend in IBM research once said that there are few areas of computer science that are rocket science but this is one of them.

    WebSphere now has something probably very similar to TCMP in WebSphere 5.1 XD and WebSphere 6.0 that we use in the High Availability Manager component. This lets us do out of the box peer failover for our transaction managers as well as messaging engines. WebSphere 6.0 just requires a shared file system (like NFS v4 or IBM SAN FS), place your clusters transaction logs in a directory per server on that file system, check a box and you'll have in doubt transaction recovery in around 12 seconds. When a server crashes then we failover the transaction manager to a surviving cluster member and it then does recovery. When the original server restarts, it fails back.

    We also used it to replace the message transport that we used for memory to memory replication and WebSphere 6.0 just trounces the older versions of WebSphere as a result despite using a reliable message transport rather than best effort.

    Billy
    WebSphere HA Architect (IBM).
  8. was 5 vs was 6[ Go to top ]

    Billy,
     So would you suggest moving immediately to 6.0 if one is doing failover/clustering? I've never seen a WAS clustering go well (if good OO developement was done).
  9. was 5 vs was 6[ Go to top ]

    So would you suggest moving immediately to 6.0 if one is doing failover/clustering? I've never seen a WAS clustering go well (if good OO developement was done).

    We tested WebSphere 5.1 with ND at the IBM labs here in Boston, and once it's configured well (which can be a challenge ;-), it runs well. I am obviously not allowed to disclose performance numbers without written permission (etc.), but I can say that it did well on clustered HTTP session management. It scaled well, performed pretty well, and behaved well. I haven't seen 6.0 yet (hint: Billy).

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  10. was 5 vs was 6[ Go to top ]

    WAS 6 is a big jump over 5.1. DRS uses the HAManager messaging component now for the session replication traffic. It works over multicast, unicast or the channel framework (http tunneling, ssl, anything you want). It's an ultra high performance reliable pub/sub message transport, capable of easily saturating gb ethernet on a even single processor box on primitive tests. It's not a JMS transport though, it's a cluster message transport.

    We're just finishing performance comparisons with WAS 5.1 session replication now and lets just say, it isn't even close. 6.0.2 runs away with it. We added some fixes from the testing and these fixes are not in the GA product. So, I'd hang on till 6.0.2 before doing any benchmarking. I believe 6.0 is at least as fast as the third party clustering solutions that are out there although I'm sure Cameron has some tricks up his sleeve :)

    The high availability side of 6.0 has been radically changed in 6.0 and I believe it's now way ahead of the game.

    Check out my blog for more information http://www.billynewport.com
  11. was 5 vs was 6[ Go to top ]

    Just to back up what Cameron was saying I've been rolling out a medium sized enterprise app (2,500 users aprox) for a client using WAS ND 5.1 on AIX to take care of clustering duties. It has scaled well for me and been rock solid for months now. We did have some initial performance problems and spent a fair amount of time tracking those out – they were invariably problems in our code though, and nothing to do with WAS. I've not seen 6 in action yet though.
  12. was 5 vs was 6[ Go to top ]

    Just to back up what Cameron was saying I've been rolling out a medium sized enterprise app (2,500 users aprox) for a client using WAS ND 5.1 on AIX to take care of clustering duties. It has scaled well for me and been rock solid for months now. We did have some initial performance problems and spent a fair amount of time tracking those out – they were invariably problems in our code though, and nothing to do with WAS. I've not seen 6 in action yet though.

    Oh, don't get me wrong. A lot of apps do work fine with WAS and many with some changes. But I have a theory for why they do. :) I don't think this issue belongs to WAS alone. If it wasn't a problem, Cameron's company would have little work to do (at least the part they do to solve this problem). Unfortunately we suffer from the round peg, square hole problem in software development.
  13. That's the same issue that Javagroups has, and so using it presents a series of unsolvable problems, such that you cannot achieve (as in, logically predict or mathematically model) reliable clustering with it, meaning that using it you will perpetually be solving that endless last 1% of problems. Traditional approaches to programming cannot solve that last 1% of distributed computing problems, which is why we modeled TCMP (the third iteration, the one that shipped in Coherence 1.0) as a finite state machine.

    I don't see the contradiction. As far as I understand, Javagroups "core" does provide virtually synchronous group communication (GMS, messages reliably delivered in views, message ordering) and thus forms the classical foundation for state machine replication (after all, Bela did work with Ken Birman). I'm by no means an expert but as far as I learnt at least parts of "that last 1%" are provably impossible to solve. Are you saying TCMP can do anything that JGroups does not, i.e. that JGroups is the "traditional approach"? Or did you just find out that JGroups protocols are just buggy/incomplete?

    I'm not insisting that JGroups is the solution to your problem (I actually never developed a lot of trust in it) - I'm just trying to understand whether you developed something that has capabilities beyond JGroups and its friends (Ensemble, Spread, ...)

    Matthias
  14. I give a bit personal view on JavaGroups. We looked at it long time ago (2 years ago or more) when we were developing our clustering capabilities and it was so badly designed that a running joke was that we “just didn’t believe it compiles” :-)

    Furthermore, JavaGroup’s design (at least then) appeared more of post-grad C-port exercise rather than a seriously design software. It was a usual academic mix: good ideas - horrible implementation.

    Again, it could have been changed by now so don’t take my words for granted. It has been used by JBoss (openly) so they might have fixed it’s design by now.

    I think the capabilities of Tangosol’s protocol are more or less the same as of any other vendors in this area. They are not trivial to implement but they are well known and well documented for a decades, I guess. I think the difference between JavaGroups and Tangosol’s protocol (in our example) is that TCMP solves a specific problem while JavaGroups is just a compilation of everything under the sun.

    My 2 cents,
    Nikita.
  15. TechTalk with Cameron Purdy on Caching[ Go to top ]

    I give a bit personal view on JavaGroups. We looked at it long time ago (2 years ago or more) when we were developing our clustering capabilities and it was so badly designed that a running joke was that we “just didn’t believe it compiles”

    This is ludicrous. JGroups' design is one of its strong points. The protocol stack design has totally proven its worth.
  16. I don't see the contradiction. As far as I understand, Javagroups "core" does provide virtually synchronous group communication (GMS, messages reliably delivered in views, message ordering) and thus forms the classical foundation for state machine replication

    I would suggest that you compare the actual source code to the list of claims you just made. You will find that the two do not match. I am neither the first nor the only to point this out, nor am I attempting to be negative. Bela has done interesting academic work, and I have no desire to detract from the knowledge he has added to and the attention he has brought to this field.

    What I'm saying is that the architecture for a working n-point distributed system is significantly different than a client/server architecture, or a multi-threaded local architecture, or (most obviously) a single-threaded architecture.

    In other words, when you build a skyscraper, you put more time into the design, into the foundation of the building. You plan for it to become a certain building. It would be pointless to spend so much time and build such a foundation for a little house, but it's also foolish to try to build a skyscraper on top from a little house's foundation.

    Multi-machine communication is far more complex than multi-process communication, which is far more complex than multi-thread communication. There is simply no determinism in such an environment, but to provide a deterministic API on top of a nondeterministic environment without being able to illustrate it as a set of finite states is a ludicrous proposition.
    I'm just trying to understand whether you developed something that has capabilities beyond JGroups and its friends (Ensemble, Spread, ...)

    It's not about capabilities; it's about conceptual provability.

    In a cluster, one cannot prove that something works precisely because of the nondeterministic nature of the environment; one can only attempt to prove that it does not work, and fail to do so.

    If you do not build a model that allows the application of logic to prove that an implementation will work, then I do not believe that an implementation can work, even if under mild conditions it appears to do so.

    Define the finite state machine, prove its finite nature, illustrate all its potential transitions, and only then can one advance to a discussion of capabilities.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  17. We are in the process of finding a clustered caching solution.

    It appears that most cluster cache products actually push changes to all caches in the cluster. The cluster cache will ensure data consistency across the cluster.

    We are quite concern about the communication overhead with such approach.

    Given that our data are all from the databases, are all version-stamped (so supporting optomistic locking,) and must be transactionally consistent, we consider the following approach:
    1) each server has a simple local cache
    2) write through to the database(s) directly with optomistic locking (version check & update)
    3) using a distributed communication mechanism like JMS to distribute object invalidation messages to other servers. This communication can in fact be asyn so incur no immediates communication overhead (because optomistic locking provides the guarantee for consistency.)

    Can Cameron or someone compare the two approaches?
  18. Oh.. missing a step if that's not obvious:
    3) using a distributed communication mechanism like JMS to distribute object invalidation messages to other servers.
    4) the other servers remove stale objects upon receiving invalidation requests and load the data later, on demand, again from databases

    Thanks
  19. I would suggest that you compare the actual source code to the list of claims you just made.

    Well your claim contradicts the intent and documentation of JGroups: "The current set of protocols shipped with JGroups provide Virtual Synchrony properties" http://www.jgroups.org/javagroupsnew/docs/blocks.html

    If that documentation is wrong, fine, then JGroups makes false claims, is plain buggy and may be well beyond repair. I understand that. But you make it look like their approach is fundamentally wrong.

    Note that I'm not talking about their mixed bag of RPC-Dispatcher, distributed hashtables, ... I well believe they're fundamentally broken. I'm interested in the core JChannel with failure detector, GMS, merge, flush, and, say, total ordering and whatever it takes to consitute a VSYNC stack. How is that "client/server architecture" or "a deterministic API on top of a nondeterministic environment without being able to illustrate it as a set of finite states".
    I am neither the first nor the only to point this out

    If you don't want to repeat your analysis here, can you provide a pointer?

    Thanks
    Matthias
  20. TechTalk with Cameron Purdy on Caching[ Go to top ]

    Note that I'm not talking about their mixed bag of RPC-Dispatcher, distributed hashtables, ... I well believe they're fundamentally broken.

    They are not broken. The point is that I have really never put much effort into the building blocks, b/c my focus has always been protocol design. Besides, blocks like {Replicated,Distributed}Tree and {Replicated,Distributed}Hashtable are deprecated and not maintained anymore, they're essentially precursors to JBossCache.
  21. TechTalk with Cameron Purdy on Caching[ Go to top ]

    I would suggest that you compare the actual source code to the list of claims you just made. You will find that the two do not match.

    Wrong: JGroups *does* provide vsync, check out JGroups/conf/vsync.xml for a sample stack.
  22. TechTalk with Cameron Purdy on Caching[ Go to top ]

    I don't see the contradiction. As far as I understand, Javagroups "core" does provide virtually synchronous group communication (GMS, messages reliably delivered in views, message ordering) and thus forms the classical foundation for state machine replication (after all, Bela did work with Ken Birman).

    JGroups itself does *NOT* implement any particular communication paradigm. Think of it as an aspect oriented framework for Messaging. You can use it like a JBoss interceptor stack (synchronously) or like SEDA (asynchronous).
    It is each particular *protocol stack* that defines the QoS offered by JGroups. A protocol stack is similar to an aspect stack, it is defined by
    - the number and types of protocols and
    - their order

    So, yes, one of the paradigms JGroups ships with is Virtual Synchrony, others are default (relaxed VSYNC), total order, causal order, *no order* and PBCAST.

    So when someone criticizes JGroups, the specific stack used needs to be mentioned.
    I'm by no means an expert but as far as I learnt at least parts of "that last 1%" are provably impossible to solve.

    There are lots of last 1% problems in CS. E.g. read Nancy Lynch's paper on the subject (forgot the title).
  23. TechTalk with Cameron Purdy on Caching[ Go to top ]

    There are lots of last 1% problems in CS. E.g. read Nancy Lynch's paper on the subject (forgot the title).

    According to her MIT paper Designing a Caching-Based Reliable Multicast Protocol:

    "Several studies [1, 3, 5] have observed that packet losses in multicast communication are bursty, i.e., links drop numerous multicast packets while temporarily congested. ... This scheme demonstrates how packet loss locality can be actively used to reduce the recovery latency and the bandwidth overhead of multicast error control. Moreover, in view of increasing our confidence in
    the correctness and performance of our protocol, we use a
    rigorous design approach."

    This problem is what the WebLogic manual's "How Clusters Work" chapter calls a "multicast storm".
  24. According to her MIT paper Designing a Caching-Based Reliable Multicast Protocol ..

    Thanks for the link, Brian. BTW - I'm going to be presenting at the TSS symposium on this subject (clustering protocols), so if you're going to be there, try to make it to the presentation. (Since I only get in that day, it's unlikely that the previous night's activities are going to derail me .. ;-)

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  25. Hi Brian,

    I reviewed the paper. The difference between what that paper is describing and what I am describing is that the paper deals with a hierarchical distribution of data, e.g. streaming video over the Internet, while I am describing a system of many active peers participating in n-point communication with each communicating with the others. There may be some overlap, but they are different problem domains in general.

    When I present at the TSS symposium, I'll cover why multicast itself is not the solution, as it is only an efficient solution for fairly narrow sets of problems.

    In the meantime, do a google on "deterministic broadcast" and "probabilistic broadcast". You'll find some very interesting papers, which are actually very relevant to the paper you pointed me to ;-)

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  26. TechTalk with Cameron Purdy on Caching[ Go to top ]

    "we can't quite eliminate the edge conditions" problems. That's the same issue that Javagroups has, and so using it presents a series of unsolvable problems, such that you cannot achieve (as in, logically predict or mathematically model) reliable clustering with it, meaning that using it you will perpetually be solving that endless last 1% of problems.

    This is very fuzzy. More details wouldn't hurt...
    Traditional approaches to programming cannot solve that last 1% of distributed computing problems, which is why we modeled TCMP (the third iteration, the one that shipped in Coherence 1.0) as a finite state machine.

    Even fuzzier. So finite state machines solve world hunger *and* that last 1%. Hmm. I cannot really refute that because (1) you don't define what that 1% is and (2) I don't know what TCMP is, or what its goal is.

    If TCMP is available publically (open source ?), we could see what it does. But this way we have to take your FUD on JGroups, but cannot verify it against TCMP.

    If you have specific criticism against JGroups it would be nice if you shared it here, but please stop spreading FUD.
  27. Do we need the synch'ed caches?[ Go to top ]

    [quote]So you can say, well, I'm going to work on a certain item within the cache, so I'm going to lock it first, then going to get it guaranteed to be up to date in the cluster value, I'm going to modify it, put it back in the cache and unlock it ... [/quote]

    I'd like to mark that in typical enterprise apps (which are rarely is something more than just a web interface to RDBMS, in real life) most of operations are browsing ones and do not require any synch and/or locking involved when they're touching data. So, in average, across number of end-user interactions with a system there might be only 1-10% of updates in total amount of data hits. And, in presumption of the above, I'm still unsure - is it worth to complicate a system with involving 3-rd party commercial product only for ability to pull-from-another-server instead pooling from database ...?
  28. Distribute only cache modifications[ Go to top ]

    And, in presumption of the above, I'm still unsure - is it worth to complicate a system with involving 3-rd party commercial product only for ability to pull-from-another-server instead pooling from database ...?

    I do not know how exactly Coherence works, but distributed caches should only distribute modifications to cache entries (puts, deletes, clears in java.util.Map terms) and NOT distribute cache read-only access (gets, contains, ... in java.util.Map terms). This way inter-node communication will be much less for read-mostly data.
  29. Caching and Query?[ Go to top ]

    ... only for ability to pull-from-another-server instead pooling from database ...?
    I do not know how exactly Coherence works, but distributed caches should only distribute modifications to cache entries .

    Supposedly, optimizing reads is the whole point of any caches, be it app server data cache or the cache in a CPU. But as I found during implementing an in-house O/R persistence engine with a cache. The biggest problem for an application cache is that not all the times you look for the object by primary key (which is the case you can get the object directly,) a lot of times you still have to send the query to database in order to find a collection of objects; that means you still have to hit the database. (Hibernate can also cache the query result but I'm not sure if that works in a cluster environment.)

    Does Coherence also do querying by itself?

    Thanks
  30. Caching and Query?[ Go to top ]

    Supposedly, optimizing reads is the whole point of any caches, be it app server data cache or the cache in a CPU. But as I found during implementing an in-house O/R persistence engine with a cache. The biggest problem for an application cache is that not all the times you look for the object by primary key (which is the case you can get the object directly,) a lot of times you still have to send the query to database in order to find a collection of objects; that means you still have to hit the database. (Hibernate can also cache the query result but I'm not sure if that works in a cluster environment.) Does Coherence also do querying by itself?

    Coherence has features that optimize write-intensive applications too, including write-behind caching.

    Coherence does querying against the cache as well. That does mean that if you pre-load a cache with the entire set of data that you're working with, you can parallel-query across the cluster instead of going back to the database. (It supports indexing and cost-based evaluation as well.)

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  31. Caching and Query?[ Go to top ]

    <blockquoteCoherence has features that optimize write-intensive applications too, including write-behind caching.Coherence does querying against the cache as well. That does mean that if you pre-load a cache with the entire set of data that you're working with, you can parallel-query across the cluster instead of going back to the database. (It supports indexing and cost-based evaluation as well.)
    Does it mean the cache has to load all data, even some of them have never been read or written? (Since you have no way otherwise to know if you miss some data that match the criteria.) Does that mean the cache is effectly a full distributed database in itself?

    Do real deployments use the Coherent's full querying capability or still send the queries to the database?
  32. Caching and Query?[ Go to top ]

    Coherence has features that optimize write-intensive applications too, including write-behind caching.

    Coherence does querying against the cache as well. That does mean that if you pre-load a cache with the entire set of data that you're working with, you can parallel-query across the cluster instead of going back to the database. (It supports indexing and cost-based evaluation as well.)

    Does it mean the cache has to load all data, even some of them have never been read or written? (Since you have no way otherwise to know if you miss some data that match the criteria.)

    Yes, exactly!
    Does that mean the cache is effectly a full distributed database in itself?

    No, it's just a clustered transactional cache that supports parallel queries and indexing. Databases store things; we just work with them in memory (hence the "cache" term).

    (We do support disk paging, etc., but those features are for the purposes of caching, not storage. The reason that a database, such as Oracle, is valuable is that it will hold your data pretty predictably for 20 years or more, while employees and applications etc. will come and go.)
    Do real deployments use the Coherent's full querying capability or still send the queries to the database?

    Sure, both. One of our customers actually implemented both to see which was faster. Databases are amazingly optimized for queries, so it was not surprising that the database was faster on some of the queries. Coherence has the benefit of scaling out (more machines == more horsepower) so it also wasn't surprising that some queries were faster with Coherence. The customer ended up doing some queries directly to the database, and some directly to Coherence; the ones that would go to the database could always fail over to Coherence if the database went down, letting the read-only operations survive database failure. (The write-behind caching allowed their read/write operations to survive database failure too, by re-queueing updates until the database came back up.)

    However, that's a pretty advanced use case. Here are some other reasons why it's handy to be able to query caches:

    1. Invalidation - to be able to select a group of cache items that you know you want to throw away

    2. Objects - if you want to be able to query for objects based on their properties (or using the methods on the objects themselves to determine whether to include them in the results or not) then you'd search against the cache

    3. Federated - if the objects in the cache represent data that's expensive to load, e.g. because it is federated (it comes from multiple sources and is coalesced into a single object) then it is often much (much) faster to pre-load the cache and query against it

    4. Custom Indexes - if you know how to code the best searches by coding your own indexing (e.g. spatial, etc.) then querying against the cache is often much more efficient

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  33. WAN Clustering[ Go to top ]

    Nice interview.

    Concerning WAN-Clustering you could use 2 SwiftMQ Routers on each side to replicate the changes.

    -- Andreas
  34. WAN Clustering[ Go to top ]

    Concerning WAN-Clustering you could use 2 SwiftMQ Routers on each side to replicate the changes.

    One of the additional communication channels we are planning to support for WAN clustering is JMS, but I don't know if that will be done for the initial 3.0 release.

    A lot of the issues of WAN clustering are related to what the operations people tell you that you MUST use to communicate between data centers. Basically, if they say "use JMS", then we'll have to use JMS.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  35. WAN Clustering[ Go to top ]

    A lot of the issues of WAN clustering are related to what the operations people tell you that you MUST use to communicate between data centers.

    But JMS is plain TCP (at least our routing connections). Who should care about what you use internally to sync your clusters over a WAN?

    -- Andreas
  36. WAN Clustering[ Go to top ]

    A lot of the issues of WAN clustering are related to what the operations people tell you that you MUST use to communicate between data centers.

    But JMS is plain TCP (at least our routing connections). Who should care about what you use internally to sync your clusters over a WAN?

    You're thinking like an engineer ;-)

    The reason is that they go through all the validation of the JMS, testing it on the network, watching the traffic, watching the behavior, then they want everyone to use their official choice because (a) they spent all that time and (b) they spent a ton of money to make it an enterprise standard. And maybe (c) because having tested it, they trust it.

    There are other possible reasons too:

    1) better ability to monitor usage, by collecting stats at a higher level (e.g. per queue)
    2) other management tools that come with the JMS product

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  37. An excellent overview of caching and Application Servers scalability and where Java stands now.

    I would never use a caching technology that works well in 99.9999% of the cases. The immediate analogy I'm making is with the ACID test for databases, it has to be guaranteed 100%. I checked javagroups and while they don't claim to be a caching technology one may think you can use it to broadcast data changes in one node to all other nodes in the cluster. But to implement cache consistency you need something that is beyond broadcasting data (which is easy).
    It's like I know how to implement B*Trees on disk for persisting data and now I just have to add transaction support to implement a database. This last "little part" is difficult to implement and must work 100% even if it's needed only once in a while.

    I would add that some trading applications may implement write behind caching but this is very dangerous and incorrect. Once such an application receives a trade request the request should be persisted and acknowledged. Every step in the processing of the request should ensure that the request is not lost. So at least persist the data in a local temporary durable cache if you can not persist it to the database.
  38. Cameron (and all the other gurus),

    How do you perform housekeeping activities and communication between different cache instances on different servers in a portable manner?

    More specifically, given the lack of a standard work manager API (yet!!) inside of a J2EE container, how do you deal with thread creations? Many specs like EJB, Servlet forbid us from creating housekeeping threads inside servers.

    Also, if you have to communicate with instances of cache on other machines, you will be using a network listener like a ServerSocket or a DatagramSocket or something else. How do you create such resources given that there is no standard way of doing so inside of a J2EE container, especially when many specs prohibit them?

    Can we use JCA (i have never used it) here?

    Dushy
  39. Many specs like EJB, Servlet forbid us from creating housekeeping threads inside servers.

    True, the application components are restricted in what they do. However, system libraries are not.

    For example, consider the JDBC drivers that you use from within an EJB container that open sockets and call native code and read and write OS files, etc.

    Those limitations in the spec are designed to guarantee portability, and to avoid instability, and application developers are wise to avoid crossing them unless there are specific reasons to.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  40. I don't agree with much of that post. Eg...
    Many specs like EJB, Servlet forbid us from creating housekeeping threads inside servers.

    I'm fairly convinced a servlet can start a thread. The J2EE and Servlet specifications mention it. Tomcat allows it.

    Servlet 2.4 spec:
    "This type of servlet container should support this behavior when performed on threads created by the developer, ...".

    J2EE 1.4 spec:
    "If a web component creates a thread, the J2EE platform must ensure...".
    Also, if you have to communicate with instances of cache on other machines, you will be using a network listener like a ServerSocket or a DatagramSocket or something else. How do you create such resources given that there is no standard way of doing so inside of a J2EE container, especially when many specs prohibit them?

    J2EE spec allows all the socket activity you say isn't:
    "The interoperability requirements for the current J2EE platform release allow: J2EE applications to connect to legacy systems using CORBA or low-level
    socket interfaces. ..."

    J2EE spec says servlets with the "typical set of permissions" can connect sockets, presumably including multicast datagrams.
  41. I don't agree with much of that post. Eg...
    Many specs like EJB, Servlet forbid us from creating housekeeping threads inside servers.
    I'm fairly convinced a servlet can start a thread.

    If you deploy a JCA resource adapter in a J2EE app server it will be usually not allowed for the RA to create threads or listen on sockets. To get this permission you would have to add your required security permissions to the ra.xml. Usually this permissions are either granted per default during deployment or an admin has to grant it explicitly:

          <security-permission-spec>
            grant {
              permission java.util.PropertyPermission "*", "read";
              permission java.lang.RuntimePermission "getClassLoader";
              permission java.lang.RuntimePermission "createClassLoader";
              permission java.lang.RuntimePermission "modifyThread";
              permission java.lang.RuntimePermission "stopThread";
              permission java.lang.RuntimePermission "modifyThreadGroup";
              permission java.net.SocketPermission "*", "accept, connect, listen";
              permission java.io.FilePermission "<<ALL FILES>>", "read, write, delete, execute";
            };
          </security-permission-spec>

    -- Andreas