High Availability for J2EE Platform-Based Applications

Discussions

News: High Availability for J2EE Platform-Based Applications

  1. A recent article on Sun's Dot Com Builder site tests the availability and fault tolerance of various clustering topologies possible in deploying J2EE applications. The ECperf application was used for the tests. The results suggested that a multi-tier clustering approach (physically separating the jsp/servlet layer from the EJB across machines) yields the best availability.

    Check out High Availability for J2EE[tm] Platform-Based Applications.
  2. Ahem! Doesn't it look like a WebLogic cluster availability test ? Even though nothing is said about which application server they used, I have a very strong impression that they did use WebLogic. Just compare the report with the official cluster documentation from the BEA documentation site, the terms they use are just the same and so are the clustering configurations. Anyway...

    I wonder whether that would not be an attempt to push people to buy more application server licenses. IMO, for performance reasons, it is *much* better to collocate tiers on a unique machine and avoid network calls. So if your cluster spans on different machines, you lose all the benefit of the multi-tier approach. If you're running the multi-tier cluster on a multi-home server (like the one they used for their tests) however it may be fine under heavy load. But can many people afford a 24-processor Solaris machine ?
  3. I don't agree with the fact that collocating everything
    on the same machine gives better performance. We have
    seen with different applications that the network cost
    does not affect significantly response time. Furthermore,
    hosting the Web container and the EJB container on different
    machines really improves scalability (and thus performance).
    More generally, the network does not prove to be an issue
    in performance and scalability in a clustered environment
    where the bottleneck is usually the cpu. At the extreme,
    the network can very slightly affect the response time if
    the clients are accessing the server from the LAN.
  4. Yann,

    It seems like you are looking for a high availability solution on low cost hardware.

    I am posting some content here, hoping that it might benefit readers as they evaluate the clustering technology they choose to ensure no loss of service.

    FYI:
    Oracle9i Application Server supports three levels of clustering on the middle-tier: Web Server, J2EE Server, and Web Cache clusters. In addition applications hosted on top of Oracle9iAS can take advantage of high availability features of Oracle9i database RAC.

    Oracle9iAS HTTP Server enabling http processes to work in a cluster configuration. Oracle9iAS Containers for J2EE enables creation of J2EE “cluster islands” – collection of servers where state is replicated to improve availability and scalability in a transparent manner. Oracle9iAS allows caches to be deployed in a clustered environment in front of an Application Server Farm. In order to reduce the risk of denial-of-service attacks, and improving availability of dynamic/static content, Oracle9iAS Web Cache enables multiple cache instances to work together as a single logical cache. Nodes participating in cache clustering communicate with one another to request cacheable content.

    For more information please listen to this free seminar:
    Oracle9i Application Server Advanced Clustering

    Thanks...
  5. Hi Sudhakar,

    Sudhakar: "FYI: Oracle9i Application Server supports ...."

    Not that you're hiding it, but you should probably point out that you work for Oracle. BTW - Are you working with the JSR 107 for caching?

    Sudhakar: "Oracle9iAS Containers for J2EE enables creation of J2EE “cluster islands” – collection of servers where state is replicated to improve availability and scalability in a transparent manner."

    When you get a chance, if you're interested in high availability and caching, check out our Coherence product at http://www.tangosol.com/ or drop me a line at cpurdy at tangosol dot com.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  6. Emmanuel Cecchet said:
    I don't agree with the fact that collocating everything
    on the same machine gives better performance. We have
    seen with different applications that the network cost
    does not affect significantly response time. Furthermore,
    hosting the Web container and the EJB container on different machines really improves scalability (and thus performance).

    More generally, the network does not prove to be an issue
    in performance and scalability in a clustered environment
    where the bottleneck is usually the cpu. At the extreme,
    the network can very slightly affect the response time if
    the clients are accessing the server from the LAN.

    ------

    My experience is that network traffic is not the only cost in deploying a splitting out Servlets/JSP's from EJB's. An additional (and sometimes quite significant cost) comes from the need to serialize the objects flowing back and forth across that interface. This generates additional garbage on the heap (increasing frequency and cost of GC) and eats some CPU time, not including that time eaten by the RMI and network layers.

    Deploying these two layers together allows most J2EE containers to short circuit the serialization protocol. True, the Business Facade pattern reduces the number of round trips between the layers, but the cost is still there.

    In my experience, a single layer of J2EE containers works very well from a performance, scalability, and availability perspective and can reduce the total cost significantly due to both efficiency gains and probable reduction in number of hosts and software licenses.

    That said, I think the nest solution varies depending on the complexity of the two layers (Servlet/JSP versus EJB) and so I would expect to revisit these decisions on each new system.

    Chuck
  7. One thing that stuck out to me were the examples with load balancing. I don't believe either load balancing scenario (one hardware load balancer, or Round Robin DNS) is realistic for load balancing.

    First, as the article pointed out, one hardware load balancer is your single point of failure (we won't mention the fact that so is 1 firewall in their configuration). Why would you want this setup? For a proper hardware load balancing scenario, you should have two hardware load balancers that can do failover. This is pretty standard with hardware based load balancers. In fact, most hardware load balancers on the market today come in priced specially in pairs -- there's a reason for that.

    Second, Round Robin DNS is only acceptable if you're not doing SSL. Once you throw SSL into the mix, Round Robin goes out the door, since you can't guarantee (at least to my knowledge) that a client will go back to the same SSL web server throughout the entire session.

    If you have a critical web application, I would think that none of the given scenarios in the article are appropriate. What does everyone else think?
  8. We use IIS->Weblogic with round-robin load-balancing. Our app is entirely SSL and I haven't seen any problem with losing/skipping sessions yet.

    To comment on the main article, right now we have a fairly simple almost-read-only reporting app spread across 4 Weblogic instances on two machines. The only EJB we have is a message driven bean that tracks user activity and report response time. I've been considering setting this bean up on it's own instance just so it won't bog down any of the main app instances. (The messages are very lightweight, and I don't care how long they take to get to their destination.) In the future we are going to add an asynchronous batch-mode for large reports that I think would be better off running on a separate instance. I guess this article sort of validates that idea. Anyone know of any pitfalls to this approach?

    thx,
    Matt
  9. I totally agree with what you are saying here in regards to load balancers and dns round robin. However, the point of the exercise was to test the availabilty features built into current app servers (current at the time of testing).
  10. If the point was testing high availability, why not investage more thorougly.

    With JSP and servlets there is no reason for the user (nearly) at all to know about that transaction failed (i.e. that 8 percent can be further eliminated by catching exceptions and repeating the access.)

    Load balancing can be more intelligent. In fact I am very much suprised why to try at all with a "never-test-anything round robin approach" I would not even call it load balancing if the load balancer do not check if the web server is up or not. (Not to mention content checking from the load balancer which could again significantly improve test results. Altough, this area of discussion might be to far from J2EE.)

    However, my main disappointment in reading the article is in the fact that the test results only shows what you would have anyway expected after reading the specifications (and knowing the type of the application they were running to get the percentages). I also did not get why they worry about "information-shared clusters". I did not have any such problem with using BEA WebLogic.

    Unfortunetly my missing experience does not allow me to comment on comparision of one server to clustered set ups. However, if "high availability" is the purpose. Where we mean HIGH on high. I would not even consider these set ups.

    One thing left to add, I find it very promising that they could recover the web server - running services - in 1 minute. If I consider a 24 hour day. Then the risk of server being down might not prompt for J2EE clustering capabilities at all. Unfortunetly, with larger web servers this issue would not be this simple...

  11. Joshua,
    >>
    Second, Round Robin DNS is only acceptable if you're not doing SSL. Once you throw SSL into the mix, Round Robin goes out the door, since you can't guarantee (at least to my knowledge) that a client will go back to the same SSL web server throughout the entire session.
    >>

    We are using SSL and DNS RR with cluster of two weblogic server. We haven't loose session, I also did lot of testing by killing the primary server, it still works fine.

    Nilesh

     
  12. Some have commented that the collocation of servlet/JSP tier and EJB tier is key for performance while others have lambasted it for lack of scalability.

    I think it really depends on the application. For the web application I recently spent a lot of time working on, there were a fairly high number of JavaBeans that were sent to JSPs for rendering. Enough so that the number one multi-tier latency (multi-tier meaning separate web and app tier) was not even network traffic, but serializing and deserializing these JavaBeans. I'm not kidding, we checked.

    So going to single tier (web-app) gave us fairly significant performance gains. It also allowed us to better use some significant application caching of data objects, by requiring less duplication of these caches, and higher correlation of cache hits, due to sessions sticking to a web server. In effect, the lack of load balancing helped cache efficiency here.

    Our web application was unique in that almost no JSP output was cacheable in itself, and there was very little static content. If this was not the case, the ability in a multi-tier architecture to scale up the web tier without scaling up the app tier becomes very valuable. And in that case load balancing the app tier becomes even more important. So keep that in mind.

    - Lawrence
  13. IMHO this just scratches the surface of making your system highly available, and the whole subject is enough to give you a headache. Just the fact the biggest source of downtime is software upgrades makes me wonder how much hardware HA is really necessary.

    Also, I don't really believe that splitting the web tier and the ejb tier gets you a more available system. In fact, I am inclined to believe that the opposite is true. The reason I say that is that by co-locating resources you can use local method calls, and that increases the maximum throughput. That means that you can do the same amount of work with fewer boxes, and therefore a longer mean-time-between-failure. I hope that's right, but if I am missing something, do tell me.

    Guglielmo
  14. Regarding the conversation about pairs of h/w load balancers, that really doesn't start to scratch the surface on the extreme end. Several clients that we are working with now have fully duplicated sites for disaster-recovery (geographically distributed) with the ability to quickly (almost instantly) switch over from one to the other. That's in addition to Q/A setups that basically match the production environments (I've seen what must be $20+MM of hardware at one site to duplicate the production system just for Q/A and load testing purposes). There are sites where even major upgrades must be done without any downtime, and where an "act of God" (or man-made equivalent) must not take down the application.

    Isn't it amazing how far Java has come since the juggling duke?

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    http://www.tangosol.com/