Discussions

Performance and scalability: session state replication clustering

  1. session state replication clustering (7 messages)

    Hi,

    I wanted to know if a aproduct like tangasol or gemstone is really necessary for clustering/session state replication. We are using sun system webserver 6.1 and the documentation doesn't talk about session state replication. Any comments? Thanks

    Threaded Messages (7)

  2. session state replication clustering[ Go to top ]

    I wanted to know if a aproduct like tangasol or gemstone is really necessary for clustering/session state replication. We are using sun system webserver 6.1 and the documentation doesn't talk about session state replication. Any comments?

    Sun's app server (or some specific editions of it) does have session clustering built in to the product, so if you were only doing session clustering then you probably would not need Coherence.

    We've tested the Sun implementation (built on top of the Clustra product that they bought) and it works fine for most applications. There are different reasons that some of the big customers of Sun app server (e.g. a certain big airline) use Coherence*Web. For example, one such customer selected Coherence*Web for its reliability. The dynamic linear scalability of Coherence*Web, including unlimited session size support, is a big factor for several Sun app server customers. Other customers are looking for specific functionality, including clustered SessionContext support, clustered ServletContext support, custom data models for sessions, the ability to cluster only specific sessions (or specific data within a session), etc.

    There's also a lot of state management outside of session management. Tangosol Coherence allows you to share, coordinate and transact against ANY data among all of your servers.

    Peace,

    Cameron Purdy
    Tangosol Coherence: Powering the world's busiest Java sites
  3. app server[ Go to top ]

    Cameron,

    Thanks a lot for your timely help and responses. We are currently using sun java system webserver at enterprise level ... For failover scenario, and session state replication, sun java system app server is necessary. I'd like someones suggestion as to whats the right approach at this point? Use a failover/load balancing product and interface it with webserver or migrate to application server?
  4. app server[ Go to top ]

    Use a failover/load balancing product and interface it with webserver or migrate to application server?

    A load-balancing product is used to spread the load over multiple VMs. Those VMs could be running just a web container or a full app server; either way, we tend to call it an app server.

    Peace,

    Cameron Purdy
    Tangosol Coherence: Also supports clustered spatial indexes.
  5. session state replication clustering[ Go to top ]

    In my experience, share-nothing clusters front-ended with a load balancer (like F5 or even Apache) scale much better than those based on distributed caches replicating the session state. Also, simplier to maintain.

    Of course, the promise of session replication is that if a box goes down, ALL the current sessions will survive and continue.
    In case of primitive load-balancer, it has to have sticky session on so all the incoming traffic for a user goes to the same server so in case of crash your users will lose their sessions and will have to re-login.

    It is a trade off between much better scalability and lower cost with a load-balancer and alleged session survivability, worse scalability and higher cost of a distributed cache.
  6. In my experience, share-nothing clusters front-ended with a load balancer (like F5 or even Apache) scale much better than those based on distributed caches replicating the session state. Also, simplier to maintain.

    Yes. That is a truism. You don't cluster sessions in order to get better scalability, since it involves "extra" work.
    Of course, the promise of session replication is that if a box goes down, ALL the current sessions will survive and continue.

    It depends on the implementation. Our approach is to get as close as possible to losing *no* data whatsoever.
    In case of primitive load-balancer, it has to have sticky session on so all the incoming traffic for a user goes to the same server so in case of crash your users will lose their sessions and will have to re-login.

    There are several fallacies here:

    1) The load balancer (e.g. F5) should not have sticky sessions enabled because it will cause the load balancer to become a bottleneck. (It can drop the number of connections that it can handle by 100x!)

    2) If the load balancing is not sticky, that does not mean that sessions are lost.

    3) If servers fail over, that does not mean that users have to log back in.
    It is a trade off between much better scalability and lower cost with a load-balancer and alleged session survivability, worse scalability and higher cost of a distributed cache.

    In a system that provides linear scale-out for session management, it does not impact scalability, it impacts throughput per node. And yes, it does tend to cost more, both because of the lower throughput per node and because if you are doing serious clustered session management, you're probably using a high-end solution like Coherence*Web.

    Therefore, the trade-off is much simpler: higher cost vs. lower reliability.

    Peace,

    Cameron Purdy
    Tangosol Coherence: High-Scale and HA for HTTP Session Management
  7. Wow, if I ever have a product to push on a public forum, I know who to learn from.

    My point was that session replication (no matter how "high-end") and load-balanced share-nothing cluster members are mutually exclusive solutions with listed advantages and disadvantages.

    As far as fallacies, yes, you have to have sticky sessions on F5 for the scenario when cluster members do not share anything (there is no distributed session cache).

    And yes, the sessions bound to this particular server will be lost if this server goes down because there would not be a failover - they share nothing.

    As far as F5 becoming a bottleneck, the amount to work to correctly maintain shared cache is orders of magnitude greater than amount of work required for F5 to examine the request (cookie or session-rewritten URL) and direct the traffic. So whatever slowdown F5 will suffer will be compensated amply by the fact that my servers will not have to wait for session cache to synchronize.

    And sorry for speaking in truisms, these forums are visited by many seeking answers for situations they have not faced before and the voice of experience is just as important as a vendor's pitch.
  8. Wow, if I ever have a product to push on a public forum, I know who to learn from.

    You're talking to someone who tells people that they should only be clustering their servers if their business requirements actually _require_ failover, so I find your comment potentially insulting. (Not that I am personally insulted .. actually, it was kind of a compliment.)

    I don't suggest that people cluster just so that I can sell more software, and I've talked quite a few people _out_ of clustering because I thought it was just a waste of time and money for their applications.
    My point was that session replication (no matter how "high-end") and load-balanced share-nothing cluster members are mutually exclusive solutions with listed advantages and disadvantages.

    I thought I reinforced that point. We should probably only argue about things that we disagree on ;-)
    As far as fallacies, yes, you have to have sticky sessions on F5 for the scenario when cluster members do not share anything (there is no distributed session cache).

    Not true. The hardware load balancer is NOT required to do sticky load balancing. For example, it can round-robin to an Apache farm which does intelligent (and sticky) load balancing to one or more JEE apps.

    HLB and stickiness, yet the HLB is not providing the stickiness. It is possible, and it is fairly commonly done.
    And yes, the sessions bound to this particular server will be lost if this server goes down because there would not be a failover - they share nothing.

    Correct. A perfectly acceptable scenario for most apps.
    As far as F5 becoming a bottleneck, the amount to work to correctly maintain shared cache is orders of magnitude greater than amount of work required for F5 to examine the request (cookie or session-rewritten URL) and direct the traffic.

    That doesn't matter. The F5 represents a single-point of bottleneck, so even if it is more efficient to do stickiness at the F5 level, it's often _better_ to do it in a stateless web tier. OTOH, I'm referring to high-scale environments; on a small-scale system, the F5 will have no problem with it.

    (The only reason that I know that the F5 has problems with the load is because I have seen it, at real customer sites, on real production applications. They could not do stickiness scalably at the HLB level.)
    So whatever slowdown F5 will suffer will be compensated amply by the fact that my servers will not have to wait for session cache to synchronize.

    That is a misleading argument. If the F5 becomes the bottleneck, then you're stuck. Period. Further, I never suggested getting rid of stickiness; I just clarified that stickiness as an implementation was not required, and if it is used, it is not required to be implemented in the HLB. (e.g. it could be implemented by an Apache mod.)
    And sorry for speaking in truisms, these forums are visited by many seeking answers for situations they have not faced before..

    I was specifically referring to the truism of clustered session management versus no clustered session management.

    Regarding applications that are completey stateless in the JEE tier, those applications tend to scale very badly because they end up delegating all the load of the entire cluster onto a database, which becomes a single-point-of-bottleneck.
    .. and the voice of experience is just as important as a vendor's pitch.

    This is a conversation. If you have suggestions and ideas that differ from someone else's (e.g. mine), then you should raise them. If I disagree with you, it's not because I am a bad person (or that I am pitching something), but rather it is because my experience and knowledge in some way don't match yours. I am trying to be respectful of your opinion and experience, and I'd ask that you try to do the same for me, and perhaps one or both of us will end up learning something from the conversation.

    Further, it is very unconstructive to completely dismiss someone's experience and opinions just because that person works for a company works in an area related to the discussion. In fact, that person might just accidentally have gained some knowledge on the topic space from being exposed to it in various customer accounts.

    In this case, I have made it exceedingly clear where I work, so you can certainly take that into account if you believe I am being purposefully misleading in my answers. In fact, if I am being misleading, you should point that out. You attempted to label my comments as a vendor pitch, instead of explaining what part of my comments were incorrect. I would be glad to discuss the technical specifics with you, and I would ask that you avoid the off-topic accusations.

    Peace,

    Cameron Purdy
    Tangosol Coherence: High-Scale and HA for HTTP Session Management