Article: Wang Yu on Uncovering J2EE Clustering

Discussions

News: Article: Wang Yu on Uncovering J2EE Clustering

  1. J2EE clustering is a popular way to build mission-critical applications. When designing large scalable and critical applications, solid knowledge of J2EE clustering is important for the architects. In this article, Wang Yu will take you on a tour of the technology inside J2EE clustering and let you know the features and limitations of some popular J2EE clustering products.

    Read Under the Hood of J2EE Clustering

    Threaded Messages (104)

  2. Clustering ...[ Go to top ]

    improves scalability and relaibilty? Really?

    hmmm.

    .V
  3. Clustering ...[ Go to top ]

    Clustering ...

    improves scalability and relaibilty? Really?

    OK, Vic, I'll bite.

    Yes, well-implemented clustering improves scalability and reliability.

    Do you have a different POV?

    Peace,

    Cameron Purdy
    Tangosol Coherence: Clustered Shared Memory for Java
  4. Clustering ...[ Go to top ]

    improves scalability and relaibilty? Really?hmmm..V

    Is this one of those "any complexity is bad" type comments? Or is there a specific issue, or architecture flaw that you were refering to?
  5. Clustering ...[ Go to top ]

    Matt, Cameron,

    Yes, Kiss works great. I am quite glad that people associate my name w/ Kiss.

    Let me give you my oersinal experience w/ clustering for scalability and relabiliaty, and not on w/ J2EE:
    A client (a car company) placed a cluster of 3 large(tb) db servers for reliability. The syncronization mechanisam was knocking of the servers. After about a year of consulting... they removed the cluster on clients insistence.
    The server never went down anymore, and sice it did not need to syncornize anymore it was much faster.

    Clustering works on a limited set of requirments, such as static RO data and smaller volumes. I have head from clients EJB clusters taking week to deploy and data getting more and more out of sync over time because of syncronization. (of course, those are the clients that would call me and tell me, not the "oh, ejb is a joy" pretenders) So I share in the readers real life comments. Somethings only work in laboratory conditions.

    also, Anytime somone says.... J2EE... try groovy. It can use all the 3rd party jars... except it's Kiss. I think anotations made the lang less readable, it was just to put Xdoclets, anything for EJB. Let's see how many "XYZ is better than EJB presistaence API" articles people post next year. What a trifecta: Sparc + iPlanet EJB + JSF. Lets cluster it ;-), that will make it even faster.

    .V
  6. Clustering ...[ Go to top ]

    Vic -
    Yes, Kiss works great. I am quite glad that people associate my name w/ Kiss.

    I didn't mention KISS, nor did anyone else.

    However, a good clustering architecture can be an important part of a KISS strategy, since it will greatly simplify what the application needs to do to accomplish its scalability and availability goals, assuming that those goals require clustering.

    (My first advice to people considering clustering is this: Ask yourself if you even need it. Most applications don't actually need instant failover, 100% availability, and seamless handling of server and software failures. Most applications don't need more than one server's worth of hardware. However, if those are real requirements, then clustering may be the right solution.)
    Let me give you my oersinal experience w/ clustering for scalability and relabiliaty, and not on w/ J2EE:A client (a car company) placed a cluster of 3 large(tb) db servers for reliability. The syncronization mechanisam was knocking of the servers. After about a year of consulting... they removed the cluster on clients insistence.

    The article is on J2EE clustering, not database clustering (except for an explanation of how Sun does session management). The topic of database clustering has almost nothing to do with J2EE application clustering, so you are way off-topic and thus drawing poor conclusions from even poorer analogies.

    Regarding database clustering, with the current state of the art, database clustering for full availability purposes should only be done as master/slave. For partial availability purposes (i.e. localization of failure), federation can also work well. Database clustering is much more complicated than J2EE clustering, because a database has to cluster both transactions and the resulting persistent data. J2EE clustering (in general) is not involved with persistent storage, which greatly simplifies the model.
    Clustering works on a limited set of requirments, such as static RO data and smaller volumes.

    For J2EE clustering, I would have to disagree. Refer to one of our customers (http://tangosol.com/customers.jsp) that does online package tracking, for example. That application is neither static nor low-volume. We have several customers with production applications handling over 100,000,000 pages per day, and at least one of them does no page caching (i.e. my understanding is that all their pages are 100% dynamic, and all done in Java). With the volumes of data that these applications use, they have no alternative but to cache and/or manage large portions of that data within the J2EE cluster.

    For availability purposes, clustering can remove single points of failure (SPOFs). When you can arbitrarily lose any server at any time, and have no end user impact, it makes a pretty compelling story. We've had a certain government application in production without a second of downtime for over three years now.
    I have head from clients EJB clusters taking week to deploy and data getting more and more out of sync over time because of syncronization.

    First of all, there's no excuse for bad software, and there's plenty of it out there. I've lost a lot of hair trying to get various J2EE servers to cluster, so I know exactly what you're talking about. However, those problems were all resolved three or more years ago, and now even Tomcat can cluster a couple servers.

    Second, there is no reason to statefully cluster EJBs unless you have built a J2EE application that relies on HA stateful session EJBs. If you do use stateful session EJBs, most of the major app servers will now cluster them, and it's really not that hard to set up. (IMHO The only entity EJB clustering that should be done is for read-only entity EJBs used for caching, which also is simple from my experience.)
    So I share in the readers real life comments. Somethings only work in laboratory conditions. also, Anytime somone says.... J2EE... try groovy. It can use all the 3rd party jars... except it's Kiss. I think anotations made the lang less readable, it was just to put Xdoclets, anything for EJB. Let's see how many "XYZ is better than EJB presistaence API" articles people post next year. What a trifecta: Sparc + iPlanet EJB + JSF. Lets cluster it ;-), that will make it even faster.

    I can't really follow your argument here. You are confusing me severely. If you want to, drop me an email (cpurdy at tangosol dot com) and I'll introduce you to someone technical who is actually using clustering in production with J2EE applications, and maybe a real world use case can change your mind.

    Peace,

    Cameron Purdy
    Tangosol Coherence: Clustered Shared Memory for Java
  7. Clustering ...[ Go to top ]

    Is this one of those "any complexity is bad" type comments?
    Cameron, I was refering to above quote, maybe you missed it.

    Scott Ferugson is probobly one of the best J2EE programmers IMO and he had a message post (can't find link) where he said words to the effect that share nothing "clustring" is best. I followed his advice and set up 6 resin servers, worked great. Kiss.

    As long as you can sell something else, good for you. Maybe EJBs worked out well for you. I tend to stress test things to validate the glossy brochure, but what ever works for your organization.

    .V
  8. Clustering ...[ Go to top ]

    Scott Ferugson is probobly one of the best J2EE programmers IMO and he had a message post (can't find link) where he said words to the effect that share nothing "clustring" is best.

    Sure, I know Scott and Steve. Pretty amazing guys. On a related note, see: http://caucho.com/press/2005-01-12.xtp

    The term "share[d] nothing" refers to the ability to cover a spectrum with no overlap. You could use that term to mean one of two different things in this case:

    - a server farm (servers that run independently and do not talk to each other)

    - a shared-nothing organization of data and/or responsibility within the cluster (e.g. Tangosol Coherence partitioned caching)

    Both are perfectly valid approaches. If you don't need clustering communication among servers, then it's obviously best to avoid it (e.g. Apache server farms). OTOH, if you need to manage large amounts of data or large responsibility sets, then a shared-nothing (e.g. partitioned) approach is best.
    I followed his advice and set up 6 resin servers, worked great. Kiss.

    Were they clustered? Resin has its own clustering built in, or you can run them independently.
    Maybe EJBs worked out well for you. I tend to stress test things to validate the glossy brochure, but what ever works for your organization.

    It's not what works for me, but what works for our customers. EJBs aren't perfect, but they aren't a guaranteed failure either. I've never claimed that EJBs follow the KISS principle, but in a large enough organization, standards can be the KISS, even when those standards are not themselves good KISSers. I'm just a pragmatist.

    Peace,

    Cameron Purdy
    Tangosol Coherence: Clustered Shared Memory for Java
  9. Clustering ...[ Go to top ]

    Were they clustered?
    Not clustered!!!

    Round and robin sticky session HW.

    "share nothing". Well.. they share the db.

    .V
  10. Clustering ...[ Go to top ]

    Were they clustered?

    Not clustered!!!

    Round and robin sticky session HW.

    "share nothing". Well.. they share the db.

    OK, understood.

    The decision is simple from my POV: If you need to have up-to-date data in the application tier, and there is more than one server, then you cluster. It doesn't sound like this application had those requirements, so you didn't have to cluster.

    However, your experience with one application that had fairly moderate requirements doesn't translate universally, any more than if I took the requirements from various stock and financial exchanges and applied them to web sites.

    Peace,

    Cameron Purdy
    Tangosol Coherence: Clustered Shared Memory for Java
  11. Clustering ...[ Go to top ]

    If you need to have up-to-date data in the application tier, and there is more than one server, then you cluster. It doesn't sound like this application had those requirements, so you didn't have to cluster.However, your experience with one application that had fairly moderate requirements doesn't translate universally, any more than if I took the requirements from various stock and financial exchanges and applied them to web sites.

    You can cluster EJB for large/dynamic, espcially if you can get it to work in production and have ROI margin. I disagree for one. IMO EJB and clusters work for small/static and it sounds like you think that share nothing is for small/static.

    One time we need to do a public face of using VLDB dynamic and see what aproach has best ROI and all the ilities (reliability, scalability.... ) over time, say keep the test ruing for a month. An EJB cluster(Sparc/IPlanet and the comercial HashMap? ;-), vs Apache iBatis share nothing( on linux, w/ the free softmap). Say for a BBS application as a requirment. There is some stress test like that from Rice.
    I know already, all the people will be like test was not fair. Life's not fair!

    And I did a lot more than 1 application over 20 years. A bit of a put down, ah? ;-)

    .V
  12. Clustering ...[ Go to top ]

    IMO EJB and clusters work for small/static and it sounds like you think that share nothing is for small/static.

    I've given you several real-world examples where clustering was required, including a package tracking app that I know you've heard of and equities exchanges. And I've given a basic rule of thumb for knowing when clustering will help. You keep dodging the questions I ask, and trying to change the topic to EJBs.
    You can cluster EJB ..

    You keep trying to drag the conversation back to EJBs. Why? Clustering has nothing to do with EJBs.
    One time we need to do a public face of using VLDB dynamic and see what aproach has best ROI and all the ilities (reliability, scalability.... ) over time, say keep the test ruing for a month. An EJB cluster(Sparc/IPlanet and the comercial HashMap? ;-), vs Apache iBatis share nothing( on linux, w/ the free softmap). Say for a BBS application as a requirment. There is some stress test like that from Rice. I know already, all the people will be like test was not fair. Life's not fair!

    OK, I can't follow at all. Are you saying that this is a possible scenario? One you implemented? One you tested with several different implementations? I'm guessing that "Rice" is the university, so this is some university application? What are you asking or suggesting?
    And I did a lot more than 1 application over 20 years. A bit of a put down, ah? ;-)

    I'll believe that you've done more than one app as soon as you stop pretending that all apps have the same exact set of requirements. ;-)

    Peace,

    Cameron Purdy
    Tangosol Coherence: Clustered Shared Memory for Java
  13. Clustering ...[ Go to top ]

    One time we need to do a public face of using VLDB dynamic and see what aproach has best ROI and all the ilities (reliability, scalability.... ) over time, say keep the test ruing for a month. An EJB cluster(Sparc/IPlanet and the comercial HashMap? ;-), vs Apache iBatis share nothing( on linux, w/ the free softmap). Say for a BBS application as a requirment. There is some stress test like that from Rice.
    OK, I can't follow at all. Are you saying that this is a possible scenario? One you implemented? One you tested with several different implementations? I'm guessing that "Rice" is the university, so this is some university application? What are you asking or suggesting?


    http://www.theserverside.com/news/thread.tss?thread_id=12698

    There is a tests called Rubis and Rubos. If there is an oportunity for whatever you are advocating (EJB or not, flip flop?) to go against what I am advocating (kiss share nothing) and see the relialibity and scalability using a app like above.

    Stock tradingng, right, if you keep repeating it, you'll start belleiving it. Your's is bigger, fine w/ me. I think PHB's might belive it about your magic hashmap that is faster to replicate then network latency. Same as the rest of your credibility.

    .V
    http://beta.roomity.com
  14. Clustering ...[ Go to top ]

    There is a tests called Rubis and Rubos.

    Thanks; I'll take a look.
    .. whatever you are advocating (EJB or not, flip flop?)

    I was pretty clear with my recommendations on how EJB-based applications should cluster, and when they have to.

    The decision as to whether or not EJBs should be used for a particular project is another topic completely. I'll let Rod Johnson and Bruce Tate fight that one out with Bill Burke.
    .. to go against what I am advocating (kiss share nothing) and see the relialibity and scalability using a app like above.

    Since the link you provided was for a single server test, I'm still trying to figure out your conclusion. The last time I checked, eBay wasn't running on a single server.

    I started by looking at the Java code for TPCW. It is a stateless application used as a database benchmark, and as such the availability of the application in an environment with redundant servlet containers will approach the availability of the database itself. Likewise, the scalability of the application is tied directly to the scalability of the database. Having looked at the code, I am certain that one can do significantly better from a scalable performance perspective using Java clustering, and unless your database has 100% availability, I am likewise certain that one can improve the availability of the application by using Java clustering. (Example: dealer.com, who converted from a TPCW-style application once they hit the inevitable scalability limit, and who subsequently survived database outages with no impact on availability, thanks to Java clustering.)

    Regarding the bidding example, all I can say is Betfair.com. Over 80 million pages per day. Surges up to 12,000 transactions per second.

    Regarding the BBS example, try Jive Software. Their software powers the developer portals at Sun, BEA, IBM and Oracle, not to mention JavaLobby.
    Stock tradingng, right, if you keep repeating it, you'll start belleiving it.

    I offered to introduce you to some customer reference accounts. Offer still stands: cpurdy at tangosol dot com. We're used for real-time exchanges, order book systems, real-time compliance systems, deferred compliance reporting, real-time risk analysis, analysis systems (e.g. the second largest mutual fund), and even real-time automated trading systems (e.g. one of the most successful hedge funds).
    I think PHB's might belive it about your magic hashmap that is faster to replicate then network latency. Same as the rest of your credibility.

    Ah, now I see what's going on here. I won't take it personally. The offer still stands: Talk to a company or two that's actually doing a real large-scale cluster, and see if I'm full of shit or not.

    Peace,

    Cameron Purdy
    Tangosol Coherence: Clustered Shared Memory for Java
  15. Clustering ...[ Go to top ]

    I offered to introduce you to some customer reference accounts. Offer still stands: cpurdy at tangosol dot com. We're used for real-time exchanges, order book systems, real-time compliance systems, deferred compliance reporting, real-time risk analysis, analysis systems (e.g. the second largest mutual fund), and even real-time automated trading systems (e.g. one of the most successful hedge funds).

    Peace,Cameron PurdyTangosol Coherence: Clustered Shared Memory for Java

    anyone that works in the financial industry will know "the second largest" mutual fund company generates a ton of trading traffic 24/7. Assuming the company is public, one can easily look at their quarterly SEC filing to get an idea of how much assets they manage :)

    peter
  16. Clustering ...[ Go to top ]

    It is a stateless application used as a database benchmark, and as such the availability of the application in an environment with redundant servlet containers will approach the availability of the database itself. Likewise, the scalability of the application is tied directly to the scalability of the database. Having looked at the code, I am certain that one can do significantly better from a scalable performance perspective using Java clustering, and unless your database has 100% availability, I am likewise certain that one can improve the availability of the application by using Java clustering.
    How "Java clustering" can replace database, is it just a better database than "traditional" RDBMS ? Doe's it use more scalable communication protocols or "better" concurrency control, cache and indexing algorythms ?
  17. Clustering ...[ Go to top ]

    It is a stateless application used as a database benchmark, and as such the availability of the application in an environment with redundant servlet containers will approach the availability of the database itself. Likewise, the scalability of the application is tied directly to the scalability of the database. Having looked at the code, I am certain that one can do significantly better from a scalable performance perspective using Java clustering, and unless your database has 100% availability, I am likewise certain that one can improve the availability of the application by using Java clustering.

    How "Java clustering" can replace database, is it just a better database than "traditional" RDBMS ? Doe's it use more scalable communication protocols or "better" concurrency control, cache and indexing algorythms ?

    Java clustering should not replace a database, nor should it replace database clustering.

    What Java clustering is really good at is removing the pointless redundant load on the database. Consider the stock quantity example from TCPW, or the shopping cart itself. Why would you want the cart to be transactionally maintained by a database? Why would you want to ask the database for the current stock level? If you don't have to ask the database stupid questions, then you make the database much more likely to be avaialable to handle the real questions.

    Reegarding availability, each tier in a multi-tiered system contributes to a lack of availability. If the data tier goes down, the app goes down. If the J2EE tier goes down, then the app goes down. Now you're 2x as likely to be unavailable because you used a 2-tier server-side architecture. So you improve the [perceived*] availability of the J2EE tier by clustering it, and you improve the availability of the data tier by clustering it. And in some cases, if you have clustered the J2EE tier, you can actually continue running the application even when the data tier is unavailable (see the dealer.com link above.)

    * The reason that I explicitly said "perceived" above is that Vic is correct in one way: If you run multiple servers in the app tier and sticky load balance across them, they don't have to be clustered in order for application availability to be higher than server availability. In other words, if one server goes down, the application doesn't go down. The perception is an end-user perception, since any data managed in that end-user's session will be lost when that server fails, unless that session is persisted to a shared store (e.g. the database) or clustered.

    Peace,

    Cameron Purdy
    Tangosol Coherence: Clustered Shared Memory for Java
  18. Clustering ...[ Go to top ]

    The last time I checked, eBay wasn't running on a single server.

    Hey Cameron, can you expand on this. Are you implying that eBay uses clustering. I thought their architecture was very anti-clustering.

    -Tinou
  19. they probably do cluster[ Go to top ]

    The last time I checked, eBay wasn't running on a single server.

    Hey Cameron, can you expand on this. Are you implying that eBay uses clustering. I thought their architecture was very anti-clustering.-Tinou

    I don't work for ebay and I don't know their actual setup, but I would imagine ebay atleast considers clustering for critical pieces. Things like transactions should have failover, otherwise ebay would have a significant risk. This is based on the assumption that hardware will fail and fail when you one least expects it. which usually means at the worse time.

    peter
  20. Clustering ...[ Go to top ]

    Stock tradingng, right, if you keep repeating it, you'll start belleiving it. Your's is bigger, fine w/ me. I think PHB's might belive it about your magic hashmap that is faster to replicate then network latency.
    Probably my applications are very small and it makes me a lame programmer, but I do not understand how hashmap replication helps for stock exhange or money transfer applications.
  21. Clustering ...[ Go to top ]

    ...I do not understand how hashmap replication helps for stock exhange or money transfer applications.

    E-Trade has a 2-second execution guarantee, which is infeasible without the reliability of clustering.
  22. Clustering ...[ Go to top ]

    ...I do not understand how hashmap replication helps for stock exhange or money transfer applications.
    E-Trade has a 2-second execution guarantee, which is infeasible without the reliability of clustering.
    As I understand "2-second execution guarantee" is very slow for web page or for database transaction, probably distributed hasmap lags if it takes 2 second.
  23. Clustering ...[ Go to top ]

    ...I do not understand how hashmap replication helps for stock exhange or money transfer applications.
    E-Trade has a 2-second execution guarantee, which is infeasible without the reliability of clustering.
    As I understand "2-second execution guarantee" is very slow for web page or for database transaction, probably distributed hasmap lags if it takes 2 second.

    The magnitude (2 seconds) of the guarantee is immaterial. Whether 2 seconds or 20 seconds, the trading industry couldn't make execution guarantees without clustered failover, which is why I found the "I don't understand" comment above so strange.



    If an app server crashes, how could E-Trade achieve it's execution guarantee without clustering? Clustered failover is paramount, regardless of whether the guarantee is 2 seconds or 20 seconds. Clustered failover keeps the trading industry in business.
  24. Clustering ...[ Go to top ]

    Probably it is a misunderstanding, clustering and cache replication are very different things. As I understnad Vic advocates cluster without application level data replication, but there is nothing simple or stupid, it is just the right way to cluster web applications.
  25. Clustering ...[ Go to top ]

    Probably my applications are very small and it makes me a lame programmer, but I do not understand how hashmap replication helps for stock exhange or money transfer applications.

    First, we don't do "hashmap replication"; the "expensive hashmap" comment is tongue-in-cheek. I forget if it was Dion or Geir who came up with it, but it was a joke. (I thought it was funny .. guess you had to be there. ;-)

    Our 1.0 product (released back in 2001) supported replicated data that was accessed and updated via the java.util.Map API (with some minor extensions for things like locking). We chose the Map API because that's what most people were using for local caches, e.g.
    Map cache = new Hashtable();

    So if you replaced that with:
    Map cache = CacheFactory.getReplicatedCache();

    .. then you ended up with a Map that would provide a coherent replication scheme across any sized cluster. For availabilitly, if a server fails, the data is not lost because it has been replicated.

    The best part with replication is that the app could now change the Map and not constantly reload from the database. In one early customer app, this dropped database utilization by 50% just by caching a few values that didn't even change that often.

    Unfortunately, replication has some down-sides. We quickly found that the "any sized cluster" under load was actually a lot smaller size than "any". In fact, we were able to build load tests that would saturate a network with only two nodes, and thus we knew that there was going to be a real-world scalability wall.

    So we invented clustered cache partitioning, and included a partitioned cache implementation in our 1.2 release (mid-2002). Now we could support much larger clusters, and much larger data sets, with linear scale-out on a switched network architecture. This architecture is a "shared nothing" architecture. (Hence my question to Vic, who was referring to "not clustered" as "shared nothing".) Then, for companies like GEICO Insurance and a big financial services company that must go un-named, we invented near caching, which (under certain circumstances) drops read latencies on partitioned caches to near-zero.

    Since then, we added agent invocation capabilities, transactionality, J2CA, querying, JMS support (for going through firewalls and to support rich client trading systems), JMX, WAN clustering, customizable partitioning, and a high-scale / highly-reliable HTTP Session Management module (Coherence*Web).

    Using an exchange as an example, you can maintain all buyers and sellers in memory, keyed by the instrument that they are attempting to trade, and internally ordered by (inversely) their buy and sell prices. Whenever a new request comes in, it can be instantly matched, including partial matches, against that information. If a server goes down, there is typically a sub-second pause while the rest of the cluster picks up the work that was being done by that failed server. (It's not master/slave, it's a cooperative peer-to-peer model using a mesh architecture.) So not only does the transaction go through correctly, but none of the pending data (pending buys and sells) are lost. We have a production-proven MTTR for a failed server of less than 1s, and properly configured an application can approach 100% availability. (Being a statistical model, one cannot actually give a true MTTF until an application is actually unavailable.)

    Today, most exchanges use bucketing because the underlying messaging and data management systems cannot handle the throughput required by real-time trading. That's why there's always a 2-second minimum theoretical delay on the current [un-named exchange] implementation (1 second bucketing window, 1 second match window, messages batched for the leading and trailing edges).

    However, there are simpler examples. Look at the TPCW_Database.java file that Vic pointed to. Every operation that the user can do translates to database operations. Want to view your cart? Go talk to the database. Add an item? Update a quantity? Go talk to the database. Search? Display an item? Ditto.

    One reason that people like to use the database is because the database takes care of the data without data corruption, right? With that in mind, try to figure out how the stock quantity is not corrupted by the TCPW application, since it does not specify SERIALIZABLE as an isolation level. (Check out how order entry updates the stock value.) At any rate, the answer is that the application will corrupt the database unless the transactions are SERIALIZABLE, since the stock quantities are being updated without any optimistic concurrency check from a piece of data that the was read from that database earlier in the transaction. In other words, not only is the app slower and less scalable and less avaialable and more expensive to scale, but it also corrupts the application's data.

    My conclusion is this: Clustering that data would prove much faster (you can't be 0ms for accessing the shopping cart from an HTTP session, for example). It would also prove much more scalable (you can't bottleneck on the database if you aren't talking to it).

    (Regarding the data corruption issue, it is completely avoidable with or without clustering, but an optimistic solution would further complicate and slow down the database-bound version of the application.)

    Peace,

    Cameron Purdy
    Tangosol Coherence: Clustered Shared Memory for Java
  26. Clustering ...[ Go to top ]

    Probably this system is very modern, stock exchange systems use legacy custom protocols (custom transfer control,encryption and comperssion) and web frontend newer see any trade system database in my practice. It can cache statistical information like "Bid/Ask", but this stuff doe's not need any replication, is it so expensive to reload this stuff once per second ? Load on databse doe's not depend on request count if stale data can be tolerated (and it is tolerated in this case)
  27. Clustering ...[ Go to top ]

    Probably this system is very modern, stock exchange systems use legacy custom protocols (custom transfer control,encryption and comperssion) and web frontend newer see any trade system database in my practice.

    FIX may be the most common of the legacy securities protocols. FIX can benefit from failover replication (see below), especially if the brokerage guarantees 2-second execution and the primary app server goes down. Sure, the FIX endpoints need reconnection, but nothing can do this faster than automated failover.
    It can cache statistical information like "Bid/Ask", but this stuff doe's not need any replication, is it so expensive to reload this stuff once per second ? Load on databse doe's not depend on request count if stale data can be tolerated (and it is tolerated in this case)

    It helps to think of a brokerage app server as an intermediary between an upstream order entry system and a downstream exchange. Toward the top of the app server's OSI protocol stack may come requests to resend in either direction. To avoid forwarding resend requests to the opposite endpoint, the app server in the middle would durably cache recent messages, especially for open orders. That durable cache is subject to failover.
  28. from a slightly different perspective[ Go to top ]

    Probably this system is very modern, stock exchange systems use legacy custom protocols (custom transfer control,encryption and comperssion) and web frontend newer see any trade system database in my practice.

    FIX may be the most common of the legacy securities protocols. FIX can benefit from failover replication (see below), especially if the brokerage guarantees 2-second execution and the primary app server goes down. Sure, the FIX endpoints need reconnection, but nothing can do this faster than automated failover.
    It can cache statistical information like "Bid/Ask", but this stuff doe's not need any replication, is it so expensive to reload this stuff once per second ? Load on databse doe's not depend on request count if stale data can be tolerated (and it is tolerated in this case)

    It helps to think of a brokerage app server as an intermediary between an upstream order entry system and a downstream exchange. Toward the top of the app server's OSI protocol stack may come requests to resend in either direction. To avoid forwarding resend requests to the opposite endpoint, the app server in the middle would durably cache recent messages, especially for open orders. That durable cache is subject to failover.

    all good points. I'd like to add a bit. If I take the example above and expand it to a full blown OMS (order management system), the need for data replication becomes more clear. An OMS is getting a constant stream of orders and bulk orders, which means it is under high load and requires fault tolerance.

    Say a customer sends a bulk order of 20K transactions, which is rather common for large institutions. Usually, the bulk order is divided into smaller chunks and routed to a specific node for processing. The bulk order in many cases will be re-organized into several larger or smaller bid/buys. A large order will be broken into smaller ones and small orders will be grouped into large orders.

    the actual order is typically saved to the database when it enters the OMS, but that's just the beginning of a complex business process which tries to match orders for optimal execution and profit. Without doing these types of processing in memory, one would have to do it all with stored procedures, which isn't feasible AFAIK. Basically, the order matching processing is combinatorial. Once the OMS produces a bid order, it sends a message out through FIX/SWIFT or some other protocol. Multiple responses are returned and the OMS has to pick the best match.

    based on my limited knowledge, failover clustering is required. Is there a better way to handle these types of scenarios?

    peter
  29. from a slightly different perspective[ Go to top ]

    Multiple responses are returned and the OMS has to pick the best match.
    Handshaking during best-execution determination sounds time and bandwidth intensive. Maybe the institutionals do it, but I only know day trading.
    based on my limited knowledge, failover clustering is required. Is there a better way to handle these types of scenarios?
    Here's another example. A brokerage's electronic reach usually spans timezones, so day-trade orders are durably queued downstream of order entry if an exchange hasn't opened yet. There's a spike when the exchange opens and the financial network flushes its queues. Without failover, the brokerage is screwed if the queueing system's server crashes during the market's opening rush. So I disagree with Jouzas's suggestion that failover is not a business requirement between the order entry system and the exchange.
  30. from a slightly different perspective[ Go to top ]

    Multiple responses are returned and the OMS has to pick the best match.

    Handshaking during best-execution determination sounds time and bandwidth intensive. Maybe the institutionals do it, but I only know day trading.

    AFAIK, it's only instutional trading and not consumer day traders. Of the systems I know, day trading and instutional trading are in separate systems. Makes sense, considering how different they are from an execution stand point.
    Here's another example. A brokerage's electronic reach usually spans timezones, so day-trade orders are durably queued downstream of order entry if an exchange hasn't opened yet. There's a spike when the exchange opens and the financial network flushes its queues. Without failover, the brokerage is screwed if the queueing system's server crashes during the market's opening rush. So I disagree with Jouzas's suggestion that failover is not a business requirement between the order entry system and the exchange.

    why should that matter :). Not like the customers will be angry and go to your competitors. So what if a few thousand orders are lost. It's only money.

    peter
  31. As I understand you are just trieng to misinterpret my posts. Do you have some reason to do it ?
  32. As I understand you are just trieng to misinterpret my posts. Do you have some reason to do it ?

    excuse my silly joke. obviously I meant that in an ironic way. it's a good thing I don't make a living being a comedian. I'd be broke and homeless.

    on a less retarded note. It's clear to you and me that data replication in trading system is a real requirement. Without it, none of the big financial firms would be able to operate.

    peter
  33. from a slightly different perspective[ Go to top ]

    As I understand you are just trieng to misinterpret my posts. Do you have some reason to do it ?
    excuse my silly joke. obviously I meant that in an ironic way.

    Ohhhh. So that was what the :) was for! :)

    It is funny seeing you try to be funny.
    "You might be a geek ... if you think this is funnny - http://users.characterlink.net/The-Cookie-Jar/math_jokes_04.html
     

    Of course, maybe Juozas was trying to be humorous too with by including "trieng" in that post.
  34. from a slightly different perspective[ Go to top ]

    Thanks, I will never buy spellchecker if you will be so helpful.
  35. What suggestion do you mean ?
  36. Clustering ...[ Go to top ]

    Your's is bigger, fine w/ me. I think PHB's might belive it about your magic hashmap that is faster to replicate then network latency. Same as the rest of your credibility..Vhttp://beta.roomity.com

    There was probably the makings of a good insult in there somewhere, but I'm buggered if I can figure out what it was.
  37. Clustering ...[ Go to top ]

    Your's is bigger, fine w/ me. I think PHB's might belive it about your magic hashmap that is faster to replicate then network latency. Same as the rest of your credibility..Vhttp://beta.roomity.com
    There was probably the makings of a good insult in there somewhere, but I'm buggered if I can figure out what it was.

    I didn't quite get that one either???
  38. Rice...[ Go to top ]

    ...I'm guessing that "Rice" is the university, so this is some university application?

    Mmmm. I like rice.
  39. Clustering ...[ Go to top ]

    Is this one of those "any complexity is bad" type comments?

    Cameron, I was refering to above quote, maybe you missed it.Scott Ferugson is probobly one of the best J2EE programmers IMO and he had a message post (can't find link) where he said words to the effect that share nothing "clustring" is best. I followed his advice and set up 6 resin servers, worked great. Kiss.As long as you can sell something else, good for you. Maybe EJBs worked out well for you. I tend to stress test things to validate the glossy brochure, but what ever works for your organization..

    V

    I fail to see how that applies to all types of applications. if a developer is working on an application that doesn't have real-time requirements + HA + fault tolerance, then sure share nothing works great. I've seen that approach work great first hand on a large site.

    but it doesn't work for real-time applications. Just because one well respected programmer makes a statement, doesn't make it true or applicable for all situations.

    peter
  40. Clustering ...[ Go to top ]

    Is this one of those "any complexity is bad" type comments?
    Cameron, I was refering to above quote, maybe you missed it.Scott Ferugson is probobly one of the best J2EE programmers IMO and he had a message post (can't find link) where he said words to the effect that share nothing "clustring" is best. I followed his advice and set up 6 resin servers, worked great. Kiss.As long as you can sell something else, good for you. Maybe EJBs worked out well for you. I tend to stress test things to validate the glossy brochure, but what ever works for your organization..V

    (If I remember correctly) Scott has posted before about "Share Nothing Clustering", as in advocating clustering in the right circumstance. How does this support your statements?

    I think that most people would vote you as president of the KISS club, which is not a bad thing in itself.

    I also feel that applications should be as simple as possible, but sometimes requirements and application goals require some levels of complexity. In extream apps that I have worked on for the Navy, they require lots of complexity. My point is that you seem to feel that KISS is the answer to every question. If this is not how you really feel, then you might want to change the wording of your posts, as this is how you come across.

    ~Matt
  41. Clustering ...[ Go to top ]

    (If I remember correctly) Scott has posted before about "Share Nothing Clustering", as in advocating clustering in the right circumstance. How does this support your statements?

    I agree w/ him. I cluster, I just don't replicate. I am just arguing here to make a point that I think clustering is overused and complexity can reduce reliability. In theory it's supposed to have no single point of failuer, and I belived it. Then ... in practice, the syncronization mechanisams would un-do the reliaibility, we even had strange cases of diferent data depeending on what box you are on.

    Camreron, Sorry, it wore out my patience. I do belive that your product ads value to customers. I am so tired of the Magic EJB and it's Magic capabilities.

    .V
  42. Clustering ...[ Go to top ]

    I agree w/ him. I cluster, I just don't replicate.

    Right, and I think that is a perfectly valid approach for applications whose session state is non-existent or represented in the database, assuming that the cost model and scalable performance (i.e. storing it in the database) makes sense.
    I am just arguing here to make a point that I think clustering is overused and complexity can reduce reliability.

    I'd agree with the second half; I don't personally see many apps being clustered that don't require it. (Then again, you'd expect me to say that. ;-)
    In theory it's supposed to have no single point of failuer, and I belived it. Then ... in practice, the syncronization mechanisams would un-do the reliaibility, we even had strange cases of diferent data depeending on what box you are on.

    I've seen that with a few different clustering implementations, both "free" ones and very expensive ones. That's one of the reasons why we wrote Coherence.
    Camreron, Sorry, it wore out my patience. I do belive that your product ads value to customers.

    My offer still stands ;-) .. and look us up if you're ever in Boston.

    Peace,

    Cameron Purdy
    Tangosol Coherence: Clustered Shared Memory for Java
  43. Clustering ...[ Go to top ]

    .. and look us up if you're ever in Boston.

    I love New England. I'll ring you when I plan to go, 2 reasons now to go. (The other is Fanel Hall sp, good food).

    .V
  44. Clustering ...[ Go to top ]

    Clustering is a funny topic. Many people have a specific m ental model of what they believe clustering is all about. The problem is that different people can have very different mental models, and there isn't any well known reference to nail down a definition (or should I say, rather, definitions).

    Clustering most often means resource transparency at the "machine" level (for some definition of machine :-) ). The best example I can think of is my personal introduction to the concept via the Vax VMS cluster (oh crap, I've dated myself again!). In a VMS cluster many machines could appear to be one big one and you could achieve very interesting scaling with little application development effort. How many "real" machines didn't matter, just that their collective resources were transparently available to you.

    Coherence and some other clustering technologies are somewhat like that. I'll try not to take the analogy too far, but they're about resource transparency. In many respects it can look like you have a giant JVM, when in fact there may be tens or hundreds of physical JVMs under the covers. Conceptually and at a high level I think it's clear that this is a very appealing model.

    But of course the devil is in the details, and I think this is partially where the conversation has gone astray. A common belief about clustering technologies is that you have concurrency and synchronization issues, and that they don't scale well beyond a few machines. And that the good old "shared nothing" approach is ideal.

    There's a lot of problems with those beliefs. First, the concept of "shared nothing" is bullshit. They are almost never shared nothing - instead, the problem of sharing is merely pushed out from the application layer to the database layer. For many people this is adequate - but the reality that there is a shared database under it all can also torpedo this for truly high performance systems. The simple fact is that databases don't scale well at all beyond a single machine, and to get good performance with higher and higher loads requires gigantic, expensive database machines (and even then you still might be screwed).

    To solve this many people go with local caching schemes, and for them this is adequate. But local caching doesn't work if your data tends to be global in nature and not per-user. It doesn't work for the simple reason of keeping all your various "shared nothing" caches in sync with each other (they won't be). To many people being out of sync for a second/minute/hour/day/<whatever> is sufficient.

    However, for some applications it is not.

    Now, as I mentioned many people are of the belief that clustering technologies do not scale. To an extent they are justified in this belief, because there's lots of awful clustering technologies out there. But good clustering software which does not take naive approaches really does scale very, very well, and often does far better than a database ever can.

    If you re-read Cameron's posts you see that he talks about partitioning. This is a common thread to many highly scalable clustering technologies. The basic idea is this: you take your problem set, whatever it may be, and you carve it up. You present to the user a cohesive, transparent view of resources, but under the covers you carve up those resources according to what's physically available. You then pin those resources to specific machines (with dynamic failover etc etc). What do you get? Whenever you are dealing with a given resource, you only talk to a small piece of the cluster. To get "X" does not mean involving the entire cluster. Likewise, to save "X" does not mean involving the entire cluster - instead you only deal with the small sliver of the cluster that deals with that piece of the pie.

    Where people go wrong is that they see full-replication clustering systems and think that's the only option. They envision a system where an update to one value means 15 writes if there are 15 servers, 20 writes if there are 20, and they imagine the exponential explosion of network traffic (and process interrupts) if all 20 processes are chattering away at once. Partitioned clustering technologies don't work like that and don't have such data traffic explosions.

    Now imagine what this means in real life. You can go with a database running on a single machine. Then start spending money like water with SANS, tens of CPUs, tens of gigabytes of memory, etc to keep up with growing user bases. You start doing database replicas (master/slave) to offload some of the burden. You start doing all sorts of interesting things to get around the fact that you have a single machine bottleneck.

    With clustering software, you replace that single machine bottleneck with many machines, each of which is responsible for a sliver of the overall work. Where you might need a 16 CPU box with 32GB of RAM on a database side, you might well end up getting better performance with 5 2-CPU boxes clustered with a good distributed clustering system. Some people may cry "global synchronization problems!" but they don't generally exist - that's why you partition in the first place, to avoid global synchronization issues. And as Cameron mentioned, you don't have to worry about database persistence issues and you avoid more bottlenecks there (e.g no transaction log).

    Not every body needs this, but when you do need it you often need it bad. And believe it or not, from an application developer's point of view such software is very easy to use. You let smart product people sweat bullets making the clustering software fast and reliable so dumb application programmers like me don't have to :-)
  45. Clustering ...[ Go to top ]

    To solve this many people go with local caching schemes, and for them this is adequate. But local caching doesn't work if your data tends to be global in nature and not per-user. It doesn't work for the simple reason of keeping all your various "shared nothing" caches in sync with each other (they won't be). To many people being out of sync for a second/minute/hour/day/<whatever> is sufficient.
    As I understand we are talking about J2EE clustering (web frontend for legacy system or just a homepage type "application").
    Stale "global" data is tolerated ("news","products","Bid/Ask", ... ) in this case and it doe's not makes to use relpication for this stuff (it just needs to be invalidated in some way like time event). "per-user" data are transactions and I do not believe cache can scale better than database in this case, It is possible to cluster database too if it make sence. Probably "distributed j2ee cache" is a very good technology, but it is very hard to find a use case for this stuff.
  46. Clustering ...[ Go to top ]

    "per-user" data are transactions and I do not believe cache can scale better than database in this case, It is possible to cluster database too if it make sence

    I agree with your definition of transactions as per user data.
    Most transaction processing applications as you rightly mentioned are based on data base technology and stored procedure for handling transaction processing.
    This approach can handle in most cases 100x 1000x transaction per second on a per machine basis.
    There are several converging trends that happen in parallel especially in the financial industry that drives organizations to look for alternatives to this approach.
    The move to electronic trading (as an example) is one of the causes for an exponential growth in transaction volumes.
    In parallel to that the requirement to achieve higher throughput and lower latency due to competition and regulation rules increases the challenge even more. The traditional approach of purring more money on HW becomes relatively impractical due to the cost and due to the fact that there is a limit on how well this type of applications can utilize the number of cpu's of those machines. In addition to that disk I/O remains a limiting factor in terms of throughput.
    So if everything would have worked well as you noted with traditional data base technology then we wouldn’t have been talking today about clustering and scale out techniques. The fact is that many organizations realize the limit of what they can achieve with this approach and are actively looking for alternatives is a strong indication that this is indeed the case.
    When looking into alternatives it is important to understand the changes in the technology echo system that makes things that was impossible or impractical in the past possible today. Those changes are the main enabler factor that generated a new wave of alterative technologies (such as distributed caching and grid) to address the data distribution and processing challenges. One of the main enabling factors is the memory capacity and network speed. Both increased significantly throughout the past few years, the emergence of concepts such as SOA/Grid that becomes more widely used to improve utilization and scalability increased the awareness that perhaps the traditional approach is not the only approach possible. Organizations started to realize that not just as an alternative solution to their challenges but also as a potential competitive advantage over their competition.

    An interesting quote that is an indication for that can be found on one of the recent publications from Sharon Reed (CTO for global markets trading technology Merrill Lynch): <quote> "The major initiatives are building up our low-latency infrastructure, moving toward a service-oriented architecture (SOA) and leveraging grid computing”.</quote>

    I agree with your observation that caching alone may solve only part of the problem and therefore can potentially become useless under such scenarios.
    Since you'r not the only one that is skeptic about the ability of caching based technology to deliver the promise of scalability and performance under this conditions we decided to take a proactive approach in which we actually took a real production application in one of the biggest financial exchange and ran it against our transaction processing middleware (That is based on JavaSpaces caching implementation at its core layer).
     We used our JavaSpaces cache product as the core underlying virtualization technology and provide (in addition to that) load balancing that distribute (partition) the transactions between the available processing units. In each processing units we used parallel processing technique to maximize the utilization of each processing unit. With this combined capability we had been able to prove not just better throughput and utilization per machine we were also able to demonstrate truly linear scalability (within machine and between machines) at a much lower cost (using commodity HW resources).
    Unfortunately I could not share with you this report however if your interesting in more information about this experience feel free to contact me directly natis@gigaspaces.conm

    You can also refer to our solution page on our site for a reference on a similar architectural approach that we used for another financial application.
     

    Nati Shalom
    CTO GigaSpaces
  47. As I understand we are talking about J2EE clustering (web frontend for legacy system or just a homepage type "application").
    Stale "global" data is tolerated ("news","products","Bid/Ask", ... ) in this case and it doe's not makes to use relpication for this stuff (it just needs to be invalidated in some way like time event).

    I see. You define the application to tolerate stale data and say voila - clustering is not needed!

    Now let's come back to reality. As I mentioned - some applications do not tolerate stale data very well. You cannot define them away because they exist (I know because I've written some of them :-) ). On such systems you also don't want to have massive re-load times in case a component in the system goes down. In these cases you really, really do want to use a clustering technology, and a database alone will not meet your needs.
    It is possible to cluster database too if it make sence.

    Have you done it? Have you tried? Clustering databases is very hard and the results are often...not what you would expect. Most of the time people end up not clustering their databases but instead paritioning them in some manner, either via replication or manual paritioning of their datasets.
    Probably "distributed j2ee cache" is a very good technology, but it is very hard to find a use case for this stuff.

    It's very easy to find such use cases, they are quite abundant. I think the problem is that you are assuming that all problems solved with Java and J2EE are simple "web front end for legacy system or just a homeage". Sorry Juozas, but there are _much_ richer and complex applications being written out there in Java and J2EE, and it is in these applications that clustering can be a great benefit.
  48. As I mentioned - some applications do not tolerate stale data very well. You cannot define them away because they exist (I know because I've written some of them :-) ). On such systems you also don't want to have massive re-load times in case a component in the system goes down.
    So like with stock trading, if the primary FIX server goes down, the hope is that the secondary could resume without having to renegotiate message sequence numbers with the counterparty. Ie, data as trivial as protocol message sequence numbers deserve replication.
    It is possible to cluster database too if it make sence.
    Have you done it? Have you tried? Clustering databases is very hard and the results are often...not what you would expect.
    I understand there are two underlying ways to physically replicate live data with commodity hardware and without introducing a single point of failure: a) volatilly in replica hosts' RAM (eg, Tangosol) or b) durably on a shared drive (either SAN or network attached RAID). So is database replication done volatily or durably?
  49. I do not cluster databases myself and I am not looking for "easy" solutions. Clustering is transparent for application developer, is not it ? My job is just to make it possible and I think the best way to it is just not to look for a problem with cache.
  50. So like with stock trading, if the primary FIX server goes down, the hope is that the secondary could resume without having to renegotiate message sequence numbers with the counterparty. Ie, data as trivial as protocol message sequence numbers deserve replication.

    There are many examples where clustering can help tremendously, the example you give sounds quite valid to me. In my case there's just a great deal of shared data among servers which really can't be out of date (given well known synchronization points) and where pulling from the DB all the time is way too expensive because there's too much data, and too much activity.
    I understand there are two underlying ways to physically replicate live data with commodity hardware and without introducing a single point of failure: a) volatilly in replica hosts' RAM (eg, Tangosol) or b) durably on a shared drive (either SAN or network attached RAID). So is database replication done volatily or durably?

    Most DB replication techniques I've seen are variations of having something listen to the database transaction log and replicate those changes out to another database. This is usually done durably. But this is is a master/slave sort of relationship. Inserts have to happen in the master, you can't insert into the slave.

    True clustering of databases really is really hard - that is, trying to get N instances of a database coherently talking to one another, and where all the instances are peers.
  51. Probably it is very hard and is not the "True clustering", but it looks like NASDAQ.com uses it without problems
    http://msdn.microsoft.com/SQL/sqlperf/default.aspx?pull=/library/en-us/dnsql2k/html/sql_mtappdcache.asp or is it just bullshit and tolling ?
  52. Probably it is very hard and is not the "True clustering", but it looks like NASDAQ.com uses it without problemshttp://msdn.microsoft.com/SQL/sqlperf/default.aspx?pull=/library/en-us/dnsql2k/html/sql_mtappdcache.asp or is it just bullshit and tolling ?
    From what I've read in that arcticle, it describes the clustering of databases responsible for the NASDAQ web sites only, meaning mostly read-only data being cached, which simplifies the problem very much (the arcticle even sites that it is less than 1Gb of data). It is not about clustering of their trading servers, which would be a completely different problem. For some kind of problems, simple read-only database clustering can work, but for other specific ones with much more complex requirements it may not, as many people here have already pointed out. It is up to us to apply the best solution to the problem. In any case, DB clustering can be a maintenance nightmare. The person who works beside me is responsible for a system which uses a distributed clustered database since 97, and I have seen the hell she's been through all this time due to maintenance nightmare this system generates every time an update must be done.

    Regards,
    Henrique Steckelberg
  53. Yes, I am talking about web sites and I do not know any j2ee trading systems (it doe's not mean the do not exist), is it not clear ?
  54. Yes, I am talking about web sites and I do not know any j2ee trading systems (it doe's not mean the do not exist), is it not clear ?
    A few from Google's first result page for ' "fix protocol" j2ee '

    Starpoint, FixMA, Nybot, ULBridge, FinancialFusion, Javelin
  55. FIX is a protocol, it is not a trading system
    http://www.fixprotocol.org/what-is-fix.shtml, I use JAVA to communicate with trading system without problems (it is not FIX in my case). I am not sure it is possible to sell it, but I am sure it is possible to use j2ee to implement trading system (backend on stock exchange) too.
  56. I am not sure it is possible to sell it, but I am sure it is possible to use j2ee to implement trading system (backend on stock exchange) too.

    From what I've seen, it's not possible to sell anything *but* Java for new trading systems. We're working with at least half a dozen such projects right now. Even more humorous is that some of these "new" Java trading systems are replacing old trading systems written in .. Java.

    Most of the exchanges are not Java-based, but that's just a matter of time. Furthermore, none of these systems could be referred to as "monolithic" -- when you hear about "an exchange", it is probably 1000 different applications that make up "the exchange", and while a good number of the supporting apps are written in Java, most of the core engines of the big equities exchanges are not.

    Peace,

    Cameron Purdy
    Tangosol Coherence: Clustered Shared Memory for Java
  57. Yes, "system" is many different applications and probably
     different programming languages without JAVA serialization support are used in the system. How client side cache integrates with legacy applications ? Do you relapace all applications (including not cache enabled JAVA applications) at the same time ?
  58. Yes, "system" is many different applications and probably different programming languages without JAVA serialization support are used in the system.

    Absolutely! Java serialization is _only_ for tightly bound systems, i.e. basically only for an application talking to itself.
    How client side cache integrates with legacy applications ? Do you relapace all applications (including not cache enabled JAVA applications) at the same time ?

    There's no single answer to that question. Here are a few of the things that I've seen:

    1) There are a number of services that these companies buy. For example, you've heard of "market data feeds", but executing trades is similar (except that the data is flowing in the opposite direction). Each of these services has its own proprietary wire formats (and a few use standardized formats, like the mentioned fixml), and some of these services even come with proprietary hardware. (One even has its own proprietary international network.) For these services, there are usually APIs provided by the vendor, or sometimes by a third party, so "integration" is done on an app-by-app basis. The big financial services firms insulate themselves somewhat by having an internal group "wrap" the vendor API, and in turn they provide their own API to the rest of the organization.

    2) The term "trading systems" means different things to different people. For example, there are a few companies that we are working with that refer to their desktop applications as "trading systems". (I personally refer to them as "trading clients", not "trading systems", and when I say "trading system", I am referring to the application that owns the order book transactions themselves.) Older trading clients range from Powerbuilder (no kidding!) to Smalltalk (quite popular!) and C++. Newer ones are mostly Java and .NET. (I've yet to see one done in HTML.) For Java, I've only seen Swing, but I know of one company that was investigating the use of SWT (they called it an "eclipse rich client", which I assume refers to some client library or framework). At any rate, these trading clients allow "traders" to monitor the market and submit trades .. basically, it's like what a day-trader would see, except that the professional trader is working on behalf of someone else. I've also seen trading clients for automated trade systems, but they are primarily for human monitoring of algorithmic trading (what I call the "just in case scenario). Anyway, the integration for the client to the trading system used to be done with things like IIOP, but we've also seen custom socket-based approaches, JMS-based integration (I know of three clients using JMS for this, one of which has a .NET rich client) and even one XML-based messaging approach. The thing about the client apps is that they display a lot of data, and they want to see changes to that data very quickly, but the actual number of transactions from the client is relatively low. In other words, the read/write ratio is extremely high, but the events have to be delivered in as close to real-time as possible. We built a JMS extension to our clusterd caching to allow the market data caches -- including the real time events -- to be accessible on these rich clients (over JMS, through firewalls, etc.)

    3) Trading systems themselves are basically a "hub" for processing incoming requests (e.g. from trading clients) to "do" trades. They are conceptually very simple (despite how "cool" and complex the term "trading system" sounds). In practice they are usually very complex, but most of that complexity comes from integration with other systems, and most of the integration is related to legal stuff. Basically, the trading system takes orders and makes sure that it doesn't ever lose any information about those orders, it checks each step against a series of complicated rules (compliance, which is often a *series* of separate applications), and then if conditions appear to be ripe, it issues requests to the actual trade execution system.

    4) A trade execution system is an application that interfaces to one (or more) of those third party services that I mentioned earlier. Basically, it tells some service "I have been told that XYZ costs $ZZ.ZZ and I want to buy NNN of them but only if you can fill PP% of the order at or below $YY.YY within a time window of T seconds." Sometimes the trade execution system is part of the trading system itself, but it's usually separate, and there's usually one for each exchange that an organization deals with. In fact, you can buy these "off the shelf" or semi-off-the-shelf (90% done and customizable). These systems can be nightmares to deal with, because you don't necessarily get instant confirmation of anything, and you don't want to accidentally issue the same order twice, etc. Most of the older execution systems that I've seen are proprietary socket-based messaging (that's the "API" facing the trading system, and it's built for speed, speed and speed) but I've been involved with a project in the London and more recently one in NYC that are building their own custom execution engines, and they are all Java based and integrate via JMS (or similar).

    5) Compliance systems are built in to everything, but there are also separate compliance systems. Compliance just means "keeping your ass out of jail". There are internal compliance systems that are watching for insider trading patterns, because a couple years ago some of the big companies were doing preferential (or self) trades called "market timing", and they got sued and people went to jail. There are massive compliance systems that are basically just rules engines for making sure that various funds don't violate their own prospectuses (e.g. don't ever hold more than Y% of the fund in any one equity). There are complicated systems for making sure that leveraged trades are not going to bankrupt either the client or the firm trading on behalf of the client. You've got all sorts of rules, both internal and legal, and most of these companies are bound by laws from many countries. If you want to know where the real money is in trading systems, make compliance simpler and faster. Even a little bit simpler can save a company tens of millions of dollars, and a little bit faster can help them make more money (since some parts of compliance are pre-trade).

    Anyhow, take all this with a grain of salt, because I don't actually write any of these systems, I just get to see them and sometimes I even get to help architect specific parts of them. For example, we've helped with three different architectures this year to support trading systems that can survive data center failure (using WAN clustering), and we have a couple more that are doing POCs to prove the same.

    Regarding integration of these systems, there is no "one way" to do it. They integrate at the database (which a majority of the time turns out to be Sybase!) They integrate over JMS. They use ESBs. They have custom socket-based messaging. They use XML over HTTP. They use proprietary message buses. They use CORBA. They use DCOM and non-IIOP-RMI (no I'm not kidding -- I've seen both, although these are both frowned upon heavily now.) Basically, it's a little bit of everything, and most of this software gets replaced on about a three-to-seven year cycle, so there's _always_ some new application having to integrate with a slew of old applications.

    And to get back to the original point, they don't use clustering for integration. They use clustering to make _an application_ be resilient to server failure, and they use clustering to make _an application_ be more scalable. To give just one quick example, I mentioned the trading clients, which can number in the thousands, each of which is accessing thousands of individual pieces of data per second and receiving probably on the order of one hundred events per second. To support so many trading clients (and to handle server failures without interruption), we've seen the JMS cache extension that I mentioned used with the HA capabilities of clustered Weblogic JMS, so that they can scale out to handle a virtually unlimited number of clients, and never have a single-point-of-failure in the trading system environment (thanks to JMS failover). Of course, the trading client machine itself is a potential SPOF, but only from the point of view of the trader who is sitting there. ;-)

    Peace,

    Cameron Purdy
    Tangosol Coherence: Clustered Shared Memory for Java
  59. It looks like "trading systems" means the same for us, I work with "trading system clients" only, I was forced to use Delphi (no kidding) to implement one of them. It is very good if JAVA replaces this stuff.

    Thanks, it was very informative.
  60. tiny bit more information[ Go to top ]

    It looks like "trading systems" means the same for us, I work with "trading system clients" only, I was forced to use Delphi (no kidding) to implement one of them. It is very good if JAVA replaces this stuff.Thanks, it was very informative.

    thought I'd contribute a bit more info on the compliance side of things. many of the mutual fund companies have multiple compliance systems. my experience is limited to 1940Act, 2A7 (which is a section of 1940Act), FSA and few other government regs. Along with government regulations, institutional traders also have to comply with firm and customer compliance rules. Firm level rules are very similar to diversification rules and account level rules are mostly arbitrary. Handling these types of processes has to be fault tolerant, because they involve legal issues. At no point should a compliance report simply "disappear" due to bad design or malice. For those cases, fail-over is needed to avoid stiff fines from SEC. But given compliance reports can be large, it's also good to replicate the data to improve scalability as a factor of transaction rates.

    I've probably run 200+ benchmarks on compliance systems and it's very easy to generate several hundred to thousands of rows of data for a single bulk order. Loosing a few rows of data in a compliance report would be a bad thing :) There's a lot of ways to cluster this stuff, but at the end of the day, what matters is not loosing data.

    peter
  61. Probably it is very hard and is not the "True clustering", but it looks like NASDAQ.com uses it without problemshttp://msdn.microsoft.com/SQL/sqlperf/default.aspx?pull=/library/en-us/dnsql2k/html/sql_mtappdcache.asp or is it just bullshit and tolling ?

    That's one way of caching the data for read operations, but there's another method that might be easier. Instead of having a middle tier for "read-only" data cache, one could simply partition the data and route the request using filtering at the router. Data changes can be pushed out to the front end webservers on a regular interval and it avoids the need to use COM+. This approach is pretty flexible and has proven it works well in many "real-world" cases.

    peter
  62. Probably it is very hard and is not the "True clustering", but it looks like NASDAQ.com uses it without problems

    I think that's an excellent example! There's a web site that spent millions of dollars on an overly-complicated database-as-caching infrastructure (just look at the picture!) when they could have achieved the same thing for a tiny fraction of the cost, and with a much shorter implementation time to boot.

    Peace,

    Cameron Purdy
    Tangosol Coherence: Clustered Shared Memory for Java
  63. I think they know how to spend and to make money, but it is interesting to know about easy ways too.
  64. I think they know how to spend and to make money, but it is interesting to know about easy ways too.

    I didn't mean to imply their approach is bad or anything like that. Just that there are other ways to cluster and reduce complexity. Each approach has it's weaknesses and strengths, so it really needs to be based on actual requirements.

    peter
  65. Clustering ...[ Go to top ]

    To solve this many people go with local caching schemes, and for them this is adequate. But local caching doesn't work if your data tends to be global in nature and not per-user. It doesn't work for the simple reason of keeping all your various "shared nothing" caches in sync with each other (they won't be). To many people being out of sync for a second/minute/hour/day/<whatever> is sufficient.

    As I understand we are talking about J2EE clustering (web frontend for legacy system or just a homepage type "application"). Stale "global" data is tolerated ("news","products","Bid/Ask", ... ) in this case and it doe's not makes to use relpication for this stuff (it just needs to be invalidated in some way like time event). "per-user" data are transactions and I do not believe cache can scale better than database in this case, It is possible to cluster database too if it make sence. Probably "distributed j2ee cache" is a very good technology, but it is very hard to find a use case for this stuff.

    I'm not sure I understand the thinking. For something like news, products descriptions or other read-only data, no clustering is needed. for bid/ask scenario, it all depends. If you're talking about something like ebay, then sure, you can do it every few minutes.

    For things like mobile applications, real-time trading systems or command control systems, clustering is a real need. There are tons of scenarios that simply won't work without clustering and replication. Go ask charles schwab, putnam, wellington, citibank, fidelity, UBS, bank of america, and state street bank whether they need clustering, fault tolerance and data replication.

    peter
  66. Clustering ...[ Go to top ]

    If j2ee is used to impement real time trading systems then probably it needs some kind of data replication (I am not sure cache replication can solve problem). Real time trading systems are implemented many years ago and I see j2ee is used to implement web fronends only. Web fronend can be clustered and fault tolerant without hasmap replication, it is not real time system and it very simplifies a problem.
  67. Clustering ...[ Go to top ]

    If j2ee is used to impement real time trading systems then probably it needs some kind of data replication (I am not sure cache replication can solve problem).

    It sounds a bit like you're guessing here. There's no need to guess. Check out magainzes/web sites like "Wall Street & Technology" and the like, and you'll see what Java is used for in the financial services world. I've also recently interviewed a bunch of people from various wall street style firms (banks, brokerages, etc) and all of the big names are using Java in critical new development applications. Not just web sites but very complex systems for order management, ticket matching, risk computation and analytics, trading/blotter apps, etc etc etc.

    In those types of apps caches often play a very important role, and clustering plays another closely related one.
    Real time trading systems are implemented many years ago and I see j2ee is used to implement web fronends only.

    Now it sounds like you're just trolling, but I'll give you the benefit of the doubt and answer it straight.

    In financial markets at least, things don't sit still. Huge amounts of money are on the line and with such gigantic motivators (trillions of dollars in aggregate) people become very, very innovative in finding ways to get that money. And IT systems are one of the biggest ways they do this. Electronic trading, real time risk exposure numbers, and the ability to consistently process more and more trades every year directly helps firms make more money.

    This world isn't static, it's always changing. Every year new security and financial products are invented, new risk measures are used, new trading models, and always more volume. In such an environment you can't just use a trading system that was "implemented many years ago". The systems have to be constantly upgraded to move with the business needs, and in regular cycles apps just become too top-heavy and baroque and rewrites happen as the cycles reach their beginning (very roughly it seems to be about 3-7 years, depending on the firm/group/business environment/etc).

    And the technology changes too, hardware and software. While the very rough view from 10,000 feet designs of trading systems may not have changed too much in the past 10 years, the implementation details of trading systems have changed dramatically. In '95 people still talked about things like Powerbuilder, and some systems were still using things like curses/VT102 and 3270 emulators talking to mainframes. People were still iffy about C++ and there was a ton of legacy C code around. Today the legacy systems tend to be C++, with C a relative rarity. There are still some vt and 3270 apps floating here and there but they getting rarer and rarer. People are transitioning off X11/Motify style GUI fat clients into web clients, or into Swing or C# fat clients. And they're finding that Java in the middle tiers is just as fast as C++ and has real implementation advantages.

    Why do you think the big app server vendors still have customers and make tons of money off of them? Why do you think technologies like JMS are so popular? Why do people buy a partitioned/distributed hash map? Why do all the major market data systems have Java interfaces (or even all-Java back-to-back implementations)?

    The answer is simple: because Java is used for alot more than simple web sites and LAMP-style applications. That stuff has its place but the IT world is a great deal bigger than that. For the past 3 years I myself have been doing exclusively Java server side stuff (well, with a brief foray into C#, but that's another story...). And for those 3 years none of it has involved a web server or HTML (though it is J2EE). And yes, it all required some form of application tier clustering. And my situation is hardly unique, at least not on wall street.
  68. Clustering ...[ Go to top ]

    No, I am not trolling, I just do not have information about it.
  69. Clustering ...[ Go to top ]

    People were still iffy about C++ and there was a ton of legacy C code around. Today the legacy systems tend to be C++, with C a relative rarity.

    I agree that Java is rapidly becoming the predominant language of choice for financial institutions. But, that doesn't imply 'C' or 'C++' development are taking back seat.
    I have recently visited some of the largest investment banking houses where C++ and sometimes 'C' is still being used for nextgeneration applications. Not just the next direct market data access or acquisition systems, yes, even Order management systems.

    I believe the perception to some extent on Java vs C++ speed is a lil tough to change. With development teams baked in years of C++ development it is a cultural shift. When you really sit down and talk to some of these savvy developers, you realize that it isn't so much about Java matching C++ on speed, but it is about the unavoidable GC pauses in Java environments.
    Sure, 1.5, with policies for configuring the maximum permitted pause in a VM goes a long way in easing some of the concern. But then, the VM provides no guarantee and in real-life applications it comes down to how the app is designed - the use of long-lived objects, abundance of heap for a pauseless collector, etc.

    Take this case - one of the largest NY based investment banks wants to redo their OMS such that the round trip - from order acquisitions thru a FIX gateway, all the way through matching, compliance, exchange routing and confirmation back to customer - in all about 5 hops across applications to be completed in no more than 25 ms. No exceptions. It works always - handles market open/close spikes.
    They chose to do this using primarily C++ (and interfacing to Java systems).
    And they're finding that Java in the middle tiers is just as fast as C++ and has real implementation advantages.Why do you think the big app server vendors still have customers and make tons of money off of them? Why do you think technologies like JMS are so popular? Why do people buy a partitioned/distributed hash map? Why do all the major market data systems have Java interfaces (or even all-Java back-to-back implementations)?The answer is simple: because Java is used for alot more than simple web sites and LAMP-style applications. That stuff has its place but the IT world is a great deal bigger than that.

    Wouldn't you say there are quite a few outfits that are considering new "direct market access" handlers besides the use of providers (reuters, bloomberg,etc) that will primarily be based on C or C++?
    On data management, the most talked about topic at 2005 SIA was "design for low latency".

    But, overall, your points are well made. Java has come along a long way and with the endless choices available, it is a lot cheaper and easier.

    Cheers!
    Jags Ramnarayan
    GemStone Systems (http://www.gemstone.com)
  70. Clustering ...[ Go to top ]

    I'd like to point out one error in this otherwise excellent article on clustering. For in-memory replication of servlet session state, the article states that the load balancer must remember which server hosts the secondary if the primary fails, adding complexity to the load balancer. This is not correct. In WebLogic, the load balancer may route traffic to any node in the cluster. This node becomes the new primary, and the session state is copied from the secondary. The session id contains enough information for the new primary to locate the secondary which has a copy of the state. It's really quite clever.

    Don Ferguson
    VP Engineering
    Terracotta, Inc --- Naturally Clustered Java
  71. JDBC Clustering ...[ Go to top ]

    Don,

    How does your product differ to that of TimesTen DB product in related to JDBC Clustering?

    Thanks
    T.N.
  72. JDBC Clustering ...[ Go to top ]

    I believe TimesTen mirrors a subset of the rows of an oracle database in memory, and provides a JDBC driver that can operate on those rows. Our JDBC product caches result sets rather than rows, such that repeated queries are served via the cached results. This provides high performance, since there's no relational math to compute the queries. We furthermore provide a second-level disk-based cache that is hosted in a separate server (aka the L2). This allows us to maintain arbitrarily large amounts of cached query data, without filling up the java heap of your application.

    Unlike some caches, we also recognize when cache entries become invalid through deep integration with Oracle.
  73. more than one error[ Go to top ]

    I'd like to point out one error in this otherwise excellent article on clustering. For in-memory replication of servlet session state, the article states that the load balancer must remember which server hosts the secondary if the primary fails, adding complexity to the load balancer. This is not correct.
    It’s indeed excellent article – covering so many products and areas really noticeable, but unfortunately there more than one error.. on JNDI Weblogic actually have both global and local tree
    http://e-docs.bea.com/wls/docs81/jndi/jndi.html#475689
     so 1. you can intentionally put object only to the local tree 2. custom objects forced to local trees if they modified by other node than originally inserted or placed under same name by different nodes before nodes was able to see each other. Hence statement that global tree allow to store and retrieve custom objects in this case limited.
    Websphere from other side also provide both local, cell and global trees so you can explicitly access particular node local tree or global.
    http://publib.boulder.ibm.com/infocenter/wsdoc400/index.jsp?topic=/com.ibm.websphere.iseries.doc/info/ae/ae/cnam_name_space_partitions.html
    About initial context list of url accepted not only by both JES/JBoss but all 4 servers
    http://e-docs.bea.com/wls/docs81/jndi/jndi.html#475689
    http://publib.boulder.ibm.com/infocenter/wasinfo/v5r1//index.jsp?topic=/com.ibm.websphere.base.doc/info/aes/ae/rnam_example_prop2.html
  74. On idempotent operations[ Go to top ]

    Idempotent operations (functions) are:

    f (maps X into Y, and if Y is a subset of X) then f is idempotent if, for all x in X; f(f(x)) = f(x).

    (see: http://en.wikipedia.org/wiki/Idempotent )

    So, basicly, f is idempotent if f(f(x))=f(x), f(x)=x is idempotent, but f(x)=x*x is not!

    Artur
  75. On idempotent operations[ Go to top ]

    For higher level operations, such as the article describes, idempotency means that the load balancer (or the end user) can issue the same request multiple times, and each time the result will be the same and there will be no additional side-effects of re-submission.

    All applications with end users should be based on an architecture that incorporates these concepts. It is also very useful for web services, because idempotency can provide transactional-like guarantees in environments that cannot use or provide transactions.

    Peace,

    Cameron Purdy
    Tangosol Coherence: Clustered Shared Memory for Java
  76. On idempotent operations[ Go to top ]

    Did anyone else notice that the example of a non-idempotent operation (file deletion) is actually idempotent?

    Don Ferguson
    VP Engineering
    Terracotta, Inc
    Naturally Clustered Java
  77. On idempotent operations[ Go to top ]

    Indeed.
    Did anyone else notice that the example of a non-idempotent operation (file deletion) is actually idempotent?Don FergusonVP EngineeringTerracotta, IncNaturally Clustered Java
  78. On idempotent operations[ Go to top ]

    Did anyone else notice that the example of a non-idempotent operation (file deletion) is actually idempotent?Don FergusonVP EngineeringTerracotta, IncNaturally Clustered Java

    I think it depends if your delete operation fails if the resource to be deleted isn't present. That's not idempotent.
  79. Another...utopy?[ Go to top ]

    Well, I think it's another thing that is like an utopy. Things and concepts that only exists at words of Architects, specially when they are talking at conferences.

    All seems to be so perfect, so...easy!

    I really believe that the real problem is to make it works at real world. About 2 years ago I was working on a project using:
    struts+ejb+JBoss

    At the end of Development stage, we started to try to make our application clustering. I would like to tell you here that JBOSS is a powerfull (maybe the most) application server, and we hasn't much knowledge about it.

    But we were not so stupid...the team had 12 developers, which 8 had Java Programmer Certification (including me), 5 had Web Developer Certification, and 1 had Business Components Development Certification (or something like this).

    We bought books, we have tried very hard...but at last...the damn application clustering didn't work.

    Today I think this "perfect world" of Software Development that they try to show us, is just a dream, a nice history to hear, but impossible to be implemented.
  80. Another...utopy?[ Go to top ]

    I don't see how issues you had with JBoss clustering implies that J2EE/JEE clustering is invalid. I have had great success building clustered applications using products like WebLogic. I don't know if any open source product can deliver this level of quality, it would be interesting to find out.
  81. it's not impossible[ Go to top ]

    Well, I think it's another thing that is like an utopy. Things and concepts that only exists at words of Architects, specially when they are talking at conferences.All seems to be so perfect, so...easy!I really believe that the real problem is to make it works at real world. About 2 years ago I was working on a project using:struts+ejb+JBossAt the end of Development stage, we started to try to make our application clustering. I would like to tell you here that JBOSS is a powerfull (maybe the most) application server, and we hasn't much knowledge about it.But we were not so stupid...the team had 12 developers, which 8 had Java Programmer Certification (including me), 5 had Web Developer Certification, and 1 had Business Components Development Certification (or something like this).We bought books, we have tried very hard...but at last...the damn application clustering didn't work.Today I think this "perfect world" of Software Development that they try to show us, is just a dream, a nice history to hear, but impossible to be implemented.

    I've used a couple different flavor of clustering and they did work. Though not all of them are Java or J2EE. I've used weblogic, resonate, veritas and some custom clustering written in C++ for Netscape. There's a couple more I can't remember off the top of my head. The point is, clustering definitely works and it isn't impossible. It is definitely hard, but far from impossible.

    clustering is one area where OSS has a big handicap. the only way to make sure the clustering works on a wide variety of network configurations is to build a nice lab, but OSS typically doesn't have that luxury. out of curiousity, did you ever call JBoss and get a JBoss consultant out to get clustering to work?

    peter
  82. Clustering Open Source Software...[ Go to top ]

    <peter>
    clustering is one area where OSS has a big handicap. the only way to make sure the clustering works on a wide variety of network configurations is to build a nice lab, but OSS typically doesn't have that luxury. out of curiousity, did you ever call JBoss and get a JBoss consultant out to get clustering to work?
    </peter>

    surely OSS J2EE clustering/load balancing capabilites are not that mature in comparable with commercial counterparts. But they still work quite good. We use Enhydra since 5 years and it has Enhydra Director (for Apache 1.x) and Enhydra Conductor (for Apache 2.x) for load balancing of web containers which is very easy to install (per InstallShield) and to manage. Also in Enhydra Enteprise you can use SSB load balancing which is based on JOnAS J2EE app server. I think this would be the same in JBoss...

    Cheers,
    Lofi.
  83. Right there are good solutions[ Go to top ]

    clustering is one area where OSS has a big handicap. the only way to make sure the clustering works on a wide variety of network configurations is to build a nice lab, but OSS typically doesn't have that luxury. out of curiousity, did you ever call JBoss and get a JBoss consultant out to get clustering to work?

    surely OSS J2EE clustering/load balancing capabilites are not that mature in comparable with commercial counterparts. But they still work quite good. We use Enhydra since 5 years and it has Enhydra Director (for Apache 1.x) and Enhydra Conductor (for Apache 2.x) for load balancing of web containers which is very easy to install (per InstallShield) and to manage. Also in Enhydra Enteprise you can use SSB load balancing which is based on JOnAS J2EE app server. I think this would be the same in JBoss...Cheers,Lofi.

    I didn't mean to give the impression that OSS clustering is bad. Tomcat has session clustering and it works. I haven't used Enhydra clustering, so I can't say. My point is that OSS clustering typically hasn't been tested on a wide variety of network configurations and may not be as mature as commercial clustering products. If we look at OSS in general, Beowolf clusters work well and many people have been successful using it.

    clustering can be an effective tool, but it does require experience and patience to get the setup right.

    peter
  84. Clustering Tomcat[ Go to top ]

    I didn't mean to give the impression that OSS clustering is bad. Tomcat has session clustering and it works.

    Yes, though it's fully-replicated state sharing, so as others have pointed out it doesn't scale beyond a few nodes.
    For the record, I have written a more scalable implementation, which does buddy replication and session transfer (in case when client fails over to non-secondary node), along the lines of some commercial products. I offered it to the Tomcat dev team but they passed, apparently satisfied with their current impl. Anyone interested in trying it can download from http://sourceforge.net/projects/tomcat-jg

    Another note: the article lumps JBoss with Weblogic in the scalable single-replication camp, but I believe this is incorrect. Even the provided JBoss clustering article describes Tomcat's own clustering feature.
  85. Clustering Tomcat[ Go to top ]

    I didn't mean to give the impression that OSS clustering is bad. Tomcat has session clustering and it works.

    Yes, though it's fully-replicated state sharing, so as others have pointed out it doesn't scale beyond a few nodes. For the record, I have written a more scalable implementation, which does buddy replication and session transfer (in case when client fails over to non-secondary node), along the lines of some commercial products. I offered it to the Tomcat dev team but they passed, apparently satisfied with their current impl. Anyone interested in trying it can download from http://sourceforge.net/projects/tomcat-jgAnother note: the article lumps JBoss with Weblogic in the scalable single-replication camp, but I believe this is incorrect. Even the provided JBoss clustering article describes Tomcat's own clustering feature.

    My knowledge of clustering is limited, so I could be totally wrong. Beyond pairs, fully replicated clustering quickly runs into performance problems. I've read over the current code a few times, so that statement is true as far as I know.

    But to put things in perspective, most people don't need more than pair setup. Going beyond that usually means the fault tolerance requirement is very high, so tomcat's clustering wouldn't be appropriate for a few reasons.

    1. it's likely the clustering has to fail over to another hosting facility
    2. the state replication would have to go over WAN
    3. the setup would be a large cluster, with a variety of servers and systems

    As much as I like tomcat, in that case I'd get a license of coherence, gigaspaces or some other commercial product that is proven. Tomcat's clustering is a great entry point for applications that have low fail over requirements.

    peter
  86. Clustering Tomcat[ Go to top ]

    My knowledge of clustering is limited, so I could be totally wrong. Beyond pairs, fully replicated clustering quickly runs into performance problems.

    Very true. Even with only two servers, full peer-based replication is far from optimal.

    The reason has to do with responsibility. The concept behind full replication is shared responsibility, which implies an overhead for decision-making. (This is the reason why Coherence uses an "issuer" assigned with entry granularity for replication purposes.)
    As much as I like tomcat, in that case I'd get a license of coherence ..

    Coherence does support clustered session management for Tomcat :-)

    Our session management is designed for horizontal scale-out, and linearly scales to 100+ servers. This is proven by actual production applications.

    The article I linked to previously explains how we do it:

    http://dev2dev.bea.com/pub/a/2005/05/session_management.html

    It shows how the linear scale can be accomplished, why there are no single points of failure, how the cluster automatically reacts to server failure (re-balancing and automatically eliminating the potential for single points of failure), how it deals with sessions that fail to serialize without losing data, how it automatically detects changes on the attribute level, etc.

    Peace,

    Cameron Purdy
    Tangosol Coherence: Clustered Shared Memory for Java
  87. Clustering Tomcat[ Go to top ]

    Yes, though it's fully-replicated state sharing, so as others have pointed out it doesn't scale beyond a few nodes. For the record, I have written a more scalable implementation, which does buddy replication and session transfer (in case when client fails over to non-secondary node), along the lines of some commercial products. I offered it to the Tomcat dev team but they passed, apparently satisfied with their current impl. Anyone interested in trying it can download from http://sourceforge.net/projects/tomcat-jg.
    But to put things in perspective, most people don't need more than pair setup. Going beyond that usually means the fault tolerance requirement is very high, so tomcat's clustering wouldn't be appropriate for a few reasons.1. it's likely the clustering has to fail over to another hosting facility2. the state replication would have to go over WAN3. the setup would be a large cluster, with a variety of servers and systems
    Clustering is used for both availability and scalability. For availability, a single backup node is usually sufficient (if not, the state should arguably be stored in the DB). But you can easily need to scale out beyond two nodes, and once you do, you still want each unit of session data replicated only to one backup node. That's the value-add of the tomcat-jcluster plugin, compared to native Tomcat full-replication.

    Rob
  88. Clustering Tomcat[ Go to top ]

    Yes, though it's fully-replicated state sharing, so as others have pointed out it doesn't scale beyond a few nodes. For the record, I have written a more scalable implementation, which does buddy replication and session transfer (in case when client fails over to non-secondary node), along the lines of some commercial products. I offered it to the Tomcat dev team but they passed, apparently satisfied with their current impl. Anyone interested in trying it can download from http://sourceforge.net/projects/tomcat-jg.
    But to put things in perspective, most people don't need more than pair setup. Going beyond that usually means the fault tolerance requirement is very high, so tomcat's clustering wouldn't be appropriate for a few reasons.1. it's likely the clustering has to fail over to another hosting facility2. the state replication would have to go over WAN3. the setup would be a large cluster, with a variety of servers and systems

    Clustering is used for both availability and scalability. For availability, a single backup node is usually sufficient (if not, the state should arguably be stored in the DB). But you can easily need to scale out beyond two nodes, and once you do, you still want each unit of session data replicated only to one backup node. That's the value-add of the tomcat-jcluster plugin, compared to native Tomcat full-replication.Rob

    looks interesting. I took a quick look. I didn't realize it's based on Filip's original clustering implementation using JavaGroups. I could be wrong, but filip is one of the active developers on the current clustering implementation. I'm sure where both approaches are appropriate.

    peter
  89. it's not impossible[ Go to top ]

    I agree with Peter, to realize a clusterable application you need a good design and deep knowledge of product, I've used with successes WebSphere, JBoss + Tomcat, and WebLogic.
    Belive, is not impossible !
  90. it's not impossible[ Go to top ]

    Belive, is not impossible !

    Anything is possible given certain amount of time and resources. A better question would be whether it's worth it.

    The answer, of course: it depends.

    --
    Igor Zavialov
    Factoreal Financial Data and Technical Analysis solutions.
  91. it's not impossible[ Go to top ]

    Well, I think it's another thing that is like an utopy. Things and concepts that only exists at words of Architects, specially when they are talking at conferences.All seems to be so perfect, so...easy!I really believe that the real problem is to make it works at real world. About 2 years ago I was working on a project using:struts+ejb+JBossAt the end of Development stage, we started to try to make our application clustering. I would like to tell you here that JBOSS is a powerfull (maybe the most) application server, and we hasn't much knowledge about it.But we were not so stupid...the team had 12 developers, which 8 had Java Programmer Certification (including me), 5 had Web Developer Certification, and 1 had Business Components Development Certification (or something like this).We bought books, we have tried very hard...but at last...the damn application clustering didn't work.Today I think this "perfect world" of Software Development that they try to show us, is just a dream, a nice history to hear, but impossible to be implemented.
    I've used a couple different flavor of clustering and they did work. Though not all of them are Java or J2EE. I've used weblogic, resonate, veritas and some custom clustering written in C++ for Netscape. There's a couple more I can't remember off the top of my head. The point is, clustering definitely works and it isn't impossible. It is definitely hard, but far from impossible.clustering is one area where OSS has a big handicap. the only way to make sure the clustering works on a wide variety of network configurations is to build a nice lab, but OSS typically doesn't have that luxury. out of curiousity, did you ever call JBoss and get a JBoss consultant out to get clustering to work?peter


    Emic Network provides Clustering for JBoss.
    http://www.emicnetworks.com/index.php?option=com_content&task=view&id=26&Itemid=58
  92. it's not impossible[ Go to top ]

    We (GigaSpaces) had been involved in making cluster works for quite some time now and been pioneers in the P2P clustering architecture based on shared memory virtualization.
    Recently I came across many applications that are looking to achieve linear scalability in a grid fashion i.e. scale out dynamically over commodity HW and gain linear scalability.

    As suggested in this article it is fairly easy to distribute the business logic using load balancer however at the end the business logic is dependent on the middleware stack i.e. the Data Base or the JMS which becomes the sticky point which becomes the bottleneck.
    It is needless to say that the middleware becomes the bottleneck and therefore with such approach linear scalability remains a nice dream.

    Taking that background into account we realized that without solving the DB and Messaging bottleneck there will be no way in which we could achieve the desired linear scalability. In 2004 we came out with the concept of virtual middleware that leverages shared-memory cluster (known as Distributed Cache) technology as a mean to virtualize the messaging and data layer. What that means is that the queue can span across multiple machines and the data can span across multiple machines dynamically. The next step was to eliminate the N-Tier approach (that is a topic for another conversation) which we found to be one of the main reason for clustering complexity.
    Instead of the N-Tier approach we broke the Tiers into services that can be collocated in each processing box. The transaction between the processing boxes was partitioned based on the correlation-id. To achieve the DYNAMIC scalability we monitored the backlog (The amount of pending transactions).
    When the backlog reached certain threshold we simply launched more processing instances and visa versa using a new layer of service gird that we recently added into our product.

    You can find more information about this approach on our web GigaSpaces web site
    An interesting case study on this approach works for financial applications can be found in our solution section: http://www.gigaspaces.com/solutions.html

    Nati Shalom
    CTO GigaSpaces
  93. Shared Memory or Java Shared Memory[ Go to top ]

    In 2004 we came out with the concept of virtual middleware that leverages shared-memory cluster (known as Distributed Cache) technology as a mean to virtualize the messaging and data layer. What that means is that the queue can span across multiple machines and the data can span across multiple machines dynamically.

    Is it really "shared memory"? Shared memory to me means the ability to virtualize the heapstack and allow a simple API to access the heap stack and "Put/Get" any kind of object on it without having to muck around with pointers and keep track of who's got it. I don't think anything can do that without some some very OS and Java specific byte and binary code enhancement. C++ and Java can't live in the same environment, natively, and unfortunately, that's what Wall St. has...a lot C++.

    JavaSpaces and Caches can't do it but you can make some nice work arounds for it if it's important. The question is, is all that patching worth it? I would rather use a monolithic 64 bit system and cluster two or three of those. Forget 32bit and the use of COTS. 64 Bit AMD machines are just as cheap as the 32bit counterparts. In the end, what has become evident to me is that all of these work arounds were to exploit 32 bit limitations because the only scale out solution was to buy a 64bit SMP Solaris machine (too expensive). Fast forward to today and that story becomes a little murky. A good solution would have shared memory interoperability with Java/C++ and today, it looks like vendors will have to build a flavor for one of each.

    Check out GemFire Enterprise...they have a native C++ and Java implementation that has pure interoperability.
  94. Another...utopy?[ Go to top ]

    We bought books, we have tried very hard...but at last...the damn application clustering didn't work.

    Why not ?

    When I was a university student I even wrote a load balancing algorithm for EJB's. I used JBoss to implement it, because I had to do some changes in app. server's code.

    http://maris.site.lv/loadbalancing/
  95. Another...utopy?[ Go to top ]

    We bought books, we have tried very hard...but at last...the damn application clustering didn't work.
    Why not ?When I was a university student I even wrote a load balancing algorithm for EJB's. I used JBoss to implement it, because I had to do some changes in app. server's code.http://maris.site.lv/loadbalancing/
    Probably he is talking about state replication with transaction semantics with things like entity beans, application level caches, distributed locking (stuff to replace database for some reason). There are many products for this stuff, but it is very hard to trust this kind of architecture itself.
  96. Too many assumptions to count[ Go to top ]

    I would love to jump in here and add value, but I am not sure there is much to dispute. The original post is simultaneously based on sound logic and incorrect in that it does not take into account the entire scope of the state of the art.

    Terracotta fundamentally believes that business logic and infrastructure logic are discreet and separable. A motherboard manufacturer does not partner with Microsoft or Sun or Redhat to build APIs that expose HW failures to the end user developer (what would I do with a "RAM failed a parity check" error) just like BEA, IBM, and JBoss should not partner with the JVM to expose clustering to the end user. It is all application agnostic and, while customization and tuning is required, custom APIs and errors [to continue the example] are not.

    To be honest, companies exist today and are making money and I would consider their relative success emperical evidence that the you can indeed design cleanly for one VM and scale at runtime to many. (This is not to say that you can take an arbitrary single-node app and cluster it. But I could show you customer apps where the delta between single-node and clustered is small, and in the case of TC usually zero).

    As for serialization, and copying the data everywhere, and all the other assumptions about clustered object caches, these are just plain inaccurate. TC pushes the envelope on the state of the art and has solved every one of those problems. We have:
        + zero serialization required
        + fine grained replication
        + push only deltas only to the subset of servers that need them
        + complete preservation of object identity

    I have personally been a part of a team that built one of the largest commerce sites in the world and it has fully clustered fault tolerant session and developers don't do anything special to get it. Note that I have friends at the other large commerce sites and they have the same--it is definitely an "understood" problem.

    I would love to spend time online or in person with anyone out there who will download our software, try it, and tell us how it could work better / more transparently. But, I am 100% confident that we can add value with a totally api-free solution. Give us a try: http://www.terracottatech.com/
  97. For further reading on the subject of clustering HTTP sessions, including information on Weblogic clustering and our own Coherence*Web module: http://dev2dev.bea.com/pub/a/2005/05/session_management.html

    Other commercial approaches include using a clustered database (Sun "Clustra"). Other open source projects include Caucho Resin and WADI by Jules Gosnell (Codehaus).

    Peace,

    Cameron Purdy
    Tangosol Coherence: Clustered Shared Memory for Java
  98. I think this article is better called "Wang Yu on Uncovering Load balancing and J2EE clustering". I think those concepts are quite often mistaken for one another and intermixed, which I've noticed in this article as well.

    You can use both technologies together or separate, but in itself these are two differen paradigms.
    Load balancing is one of the key technologies behind clustering
    That is not true since data replication, failover detection and fallback detection are the main features of clustering.

    Data replication talks for itself, failover detection is a mechanism used to inspect the primary node for failure be it a non-response(health check failed) issue or one of the services configured for clustered environment failing health check, and fallback is the mechanism to inspect the primary node for the healthy state to fallback to it.

    None of these are part of the load balancer functionality. All load balancer does is figures out, based on the load balancing algorithm, like round robin or response time threshold, what is the best route and according node to route the request to.

    A similarity between a cluster and a load-balancer is that both use some kind of health check algorithm (TCP stack Level 3 - port availability and Level 7 - application availability for example), but where in the clustered environment it is expected behaviour to failover to a secondary node and fallback to a primary with a minimum data loss and performance degradation, in the load balanced environment, although tolerable, it is not a good thing to loose one of the nodes since that will degrade performance.

    This BTW would be a perfect example of a load balancing and clustering used together to afford failover for each node on the load balancer, as well as replicate data across those node for stateful applications.

    Another scenario would be using "sticky" option on the load balancer to assign a client (by session or by machine id for example) to a particualr node on the load balancer to handle all of its request for the some defined duration(session, preset interval, etc). In this case each node can be clustered to its own secondary counterpart to achieve failover and retention of the stickiness upon primary failure, this scenario does not require replication of data across node in the load balancer since every client gets only one node.

    Anyway, the point is that it is not necessary to use load balancing and clustering together, and many times it would be ill-advised to go such an expensive route where for example a state replication is the goal and uptime is a critical issue, but the number of concurrent connection is low enough to be handled by one server, in which case one would cluster, but will not load balance until necessary.

    Or if the application is stateless and all the data is persisted in queues or RDBMS or otherwise is centrally located and accessible, one would consider load-balancing for high traffic environments, but would not necessary clsuter, unless the budget is tight enough and the number of load balanced nodes are calcualted against the traffic requirements without significant padding. Clustering alone though will not provide any improvement in performance.

    One exception to that is something like MOSIX or real-time clustering (Linux clustering), whcih is more a grid computing concept that a real clustering environment as depicted by products like MS cluster, JBOSS cluster, GeoCluster and etc. Such allows for real-time distribution and processing across multiple physicall nodes. Personally, I've heard about very few of those implemented outside a scientific, research lab or intesive calculation type of environment.

    Anyway, my 2,000,000 cents... :-)

    Sincerely,

    Artem D. Yegorov
  99. Thank you Wang,
     
    I found your article very interesting and informative. I especially enjoyed the section describing some of the shortcomings of current solutions as it highlighted a number of areas that we are working hard to solve at Terracotta.
     
    Here is a short list:
     
    Problem: Objects in the HttpSession need to be Serializable
    Solution: We do not rely on Java Serialization. Objects do not have to implement Serializable for DSO (Distributed Shared Objects).

    Problem: Storing large or numerous objects in the session should be avoided
    Solution: One shouldn't store anything in the session that isn't needed. However, with our fine grained approach to dealing with object changes we don't have to serialize whole trees of objects whenever a change occurs. We only send around the actual changes. We dynamically load and unload parts of Object trees and changes to Objects are only sent to VMs that contain the changed Object. The above features combine to allow us to support large Sessions in our Virtual Heap.
     
    Problem: Limitations on cross-referenced attributes
    Solution: Since we maintain true Object Identity we don't suffer from problems in this area. No Object explosion and one can use == where practical.
     
    Problem: setAttribute requirements
    Solution: We trap fine grained changes to objects at the field level and have no such requirement.
     
    Problem: Regular Caching not necessarily efficient in a distributed environment.
    Solution: Our Virtual Memory management, fine grained change detection and fine grained locking allow us to have quite efficient distributed caches.
     
    Problem: Static Variables not in session and so not distributed
    Solution: While I’m not a big fan of using static variables for Singleton implementation I do believe the design pattern of having some Objects use the Singleton pattern is useful. I just find that injection is a cleaner implementation. That said, using Terracotta DSO one can mark a variable, either static or not static, as shared and turn it into a cluster wide variable.

    Some of the other topics covered such as distributing java.util.Timer and JNDI are areas we are currently exploring. We are very excited about the prospects for layering in services and the above-solved problems are a big part of why.
     
    Thanks again for your excellent article explaining the current landscape and I hope I have helped show where we fit in.
     
    Cheers,
     
    Steve
    www.terracottatech.com
  100. Resources about this article[ Go to top ]

    1 J2EE clustering, Part 1 http://www.javaworld.com/jw-02-2001/jw-0223-extremescale.html
    2 J2EE clustering, Part 2 http://www.javaworld.com/javaworld/jw-08-2001/jw-0803-extremescale2.html
    3 Clustering J2EE with Jini http://www.artima.com/intv/cluster.html
    4 Session Management for Clustered Applications http://dev2dev.bea.com/pub/a/2005/05/session_management.html
    5 High-impact Web tier clustering http://www-106.ibm.com/developerworks/java/library/j-cluster1/
    6 Java theory and practice: State replication in the Web tier http://www-128.ibm.com/developerworks/java/library/j-jtp07294.html
    7 The role of JNDI in J2EE http://www-128.ibm.com/developerworks/java/library/j-jndi/
    8 Websphere interview http://www-128.ibm.com/developerworks/java/library/j-jndi/
    9 Clustering with JBoss 3.0 http://www.onjava.com/pub/a/onjava/2002/07/10/jboss.html
    10 JBoss Clustering document http://docs.jboss.org/jbossas/clustering/JBossClustering7.pdf
    11 Tangosol Homepage http://www.tangosol.com/
    12 Tomcat Cluster http://www.onjava.com/pub/a/onjava/2004/03/31/clustering.html
    13 Implementing Highly Available and Scalable Solutions Using the WebLogic Cluster http://www.samspublishing.com/articles/article.asp?p=101737&rl=1
  101. new framework[ Go to top ]

    Stand-alone applications can be transmit transparently to a cluster structure?

    answer: YES

    why? if you use Jdon Framework, you only need change the xml configure, you can easily transmit to EJB.

    you can download the sample application:
    http://prdownloads.sourceforge.net/jdon/samples.rar?download
  102. First point of contact..[ Go to top ]

    I must admit that article was quite informative covering a good amount of J2EE spectrum. We also in the process of evaluating a clustering option (between various J2EE application servers and 3rd party products, if required) which can provide good scalability and availability.
    1) The Load Balancer or the HTTP server (FPOC- first point of contact), which ever is used as the FPOC for users (as their URL is published) becomes a SPOF (single point of failure). Just to elucidate, if the site name is www.abc.com then this translates to a single IP address (so SPOF), now what if the machine that represents that IP goes down. What different ways can we avoid this SPOF. Round robin DNS has inherent limitation and thus cannot be used. The solution seems to be somewhere at the networking protocol level where one should be able to reach multiple machines with the same IP.

    2) As far as i read about the Weblogic it decides the secondary server instance based on the preferred replication group setup and prioritize the machines which are not part of the preferred replication group and are not on the same machine (if their are multiple instance on the same m/c). This would mean that the secondary server is chosen at the server level and not the session level as described in the article. Correct me if am wrong?

    3) As we are evaluating these app servers i was curious to know whether there exist any testing framework (with sample test cases) to test these extreme conditions in a cluster under load.
  103. First point of contact..[ Go to top ]

    1) The Load Balancer or the HTTP server (FPOC- first point of contact), which ever is used as the FPOC for users (as their URL is published) becomes a SPOF (single point of failure). Just to elucidate, if the site name is www.abc.com then this translates to a single IP address (so SPOF), now what if the machine that represents that IP goes down. What different ways can we avoid this SPOF. Round robin DNS has inherent limitation and thus cannot be used. The solution seems to be somewhere at the networking protocol level where one should be able to reach multiple machines with the same IP.

    There are several solutions to this.

    1. The global DNS for the web site address itself can resolve to more than one IP address, for example. You will see this for various web sites such as "yahoo.com" -- if you look it up multiple times, you will get different IP addresses.

    2. The ISP(s) that host the network containing those IP(s) can provide various redundancy options. For example, some sites are actually hosted from multiple locations, even though the site has a single IP. (One big US telecom customer of ours does this across seven datacenters.)

    3. There are products known as "global load balancers" that can be used to load balance across multiple sites.

    4. Within a single site, a "local load balancer" (e.g. BigIP, f5, Cisco LocalDirector, etc.) can be used to spread the load coming to a single IP across multiple servers. (I've seen this used to load balance across a cluster of close to 150 servers, but you definitely need to disable the stateful inspection! ;-) These devices can be configured in an HA manner, so that two such devices are claiming the same IP address and receiving all the same IP traffic (they actually manage it at a layer below TCP/IP), but one of the devices (the backup) is doing nothing until the primary fails.

    Using something like an Apache plug-in to load-balance is not a very HA or high-throughput solution in comparison, but it's (a) easier and (b) cheaper for small apps and (c) usually good enough if HA is not a requirement.
    2) As far as i read about the Weblogic it decides the secondary server instance based on the preferred replication group setup and prioritize the machines which are not part of the preferred replication group and are not on the same machine (if their are multiple instance on the same m/c). This would mean that the secondary server is chosen at the server level and not the session level as described in the article. Correct me if am wrong?

    I have heard both. To find out for certain, you can go through BEA support and ask for the question to be forwarded up the chain, but you'd have better luck on the BEA newsgroups (AFAIK also accessible through dev2dev). I've always had good luck getting responses from Weblogic engineers there.
    3) As we are evaluating these app servers i was curious to know whether there exist any testing framework (with sample test cases) to test these extreme conditions in a cluster under load.

    The best thing is to have a set of typical-usage scenarios for your own application (which you should have anyway for regression testing), and to mix those up in a heavy load test, shrinking think time to zero and increasing the number of users until your servers start to back up their request queues. At that point, you can see if the solution you selected will actually hold up. Assuming it does hold up, then you know that you can handle peak load -- in other words, the application will work fine except that response times will degrade as more users are added. Of course, then the fun testing begins, because you have to test the various failure scenarios.

    Make sure that your tests aren't just submitting requests, but also checking results for correctness. Otherwise, the tests will be useless for regression testing, and they'll also be useless for testing failover!

    If you happen to be testing with Coherence*Web, we may be able to provide some additional tests that we use internally, but nothing is as good as your own application as the test.

    Peace,

    Cameron Purdy
    Tangosol Coherence: Clustered Shared Memory for Java
  104. First point of contact..[ Go to top ]

    Thanks for the prompt reply. Following is my reply on the same. Overall, if u could help in understanding the implementation of these solutions or refer me to some site where i can find the same.

    1) For the first question i am dividing the reply into four sub-heading which sequentially point to ur suggestions-
    a) My interest would lie in the implementation of the same. If it is implemented thru the Round robin DNS where the DNS is translated into a set of IP addresses at run time, then i feel that its not the best in class apporach e.g. if a node fails it would still forward the request to that node, secondly many local DNS would cache the IP address of the same and would be forwarding the request to the same IP address without going for a new IP address lookup.
    b) Same querstion here, what is the implementation for the same.
    c) Just a question here, how is the global load balance made redundant in case that fails.
    d) This is in reference to an active-passive approach setup, but here also my concern is the implementation (even if it is at an OSI layer below the TCP/IP) i.e. which device or protocol is converting a single DNS to multiple IP address dynamically and if there is one would that device/protocol take care of failover of a load balancer node.

    I agree with your assessment on the Apache plug-in for HA.

    2) Point taken, will post the same to the weblogic site

    3) I agree with you suggestion, the only catch is that we are doing a research analysis of various clustering solutions(to give u a brief background we belong to a big IT consultancy firm and we are currrently in process of evaluating these options so that we can recommend the same to our client base) and then we would start testing the same in our labs. I was aiming to reduce the effort in building an application for the same.

    We would definitely be testing Coherence*Web also and we would be more than happy to bug u at that time to understand the product in a better fashion

    Thanks and Regards,
    Kapil
  105. Clustering for JMS and MDB[ Go to top ]

    Under the hood of J2EE clustering is an excellent article, but it is silent on clustering of JMS and MDB. Though individual vendors have their own implementation of it, but are there any standards for it?

    Regards,
    Himanshu B