Discussions

News: WADI 0.9 Released: Open source distributed session management

  1. This article discusses the recent release of WADI. WADI is a new open source project that targets the reliable management of HttpSessions for distributed web applications hosted in Tomcat or Jetty clusters.

    A note from Jules

    I though I would post to let you know that I have recently released WADI-0.9 (http://wadi.codehaus.org).

    WADI (Web Application Distribution Infrastructure) is an open source (Apache2.0) technology currently focused on simplifying the management of stateful distributed webapps running on Tomcat/Jetty clusters.

    The specific problem that WADI currently chooses to address is that of node maintenance. This involves the shutting down of a node that may be hosting a number of active http sessions whilst allowing these sessions to remain transparently available to their clients.

    There are two traditional approaches to this problem:

    Bleeding - This is the low-cost/high-management solution. You somehow configure your load-balancer tier to route all requests which may create new sessions to other nodes, only allowing clients already 'stuck' to the node in question to continue talking to it. You then wait for the sessions on this node to die off naturally, either through explicit invalidation via e.g. log-out, or implicit timeout after a period of inactivity. Unfortunately, manually reconfiguring your load-balancers all the time will be error-prone and your sessions will take an indeterminate time to die off since users may keep them alive for as long as they like, just by frequently revisiting your site. This is not a solution for sites with long lived sessions.

    Replication - This is a high-cost/low-management solution, essentially developed for sites that require sessions to survive the catastrophic failure of their hosting node. Changes to sessions are copied off-node as soon as is practicable. If the original node dies or is shut down one of the copies can be promoted and take over from the original so that the users session is transparently maintained. This is useful functionality if you can afford the substantial overhead in terms of extra hardware and reduced performance that replication necessarily involves. There is also an increased risk factor, as what were unrelated processes are now interconnected by a complex piece of middleware that may become a single point of failure. In the common case where we are simply concerned with the controlled shutdown and maintenance of nodes in our cluster this can seem an unrealistic price to pay.

    WADI is the low-cost/low-management solution. It combines features found in many different open source http session managers, but not until now, together in a single one, to overcome the problem. A clean shutdown of a Tomcat or Jetty running a webapp whose session manager has been configured to be WADI will evict all sessions within it to a shared store. A subsequent request, directed to a different node (since the sessions original host is now unavailable) will find the session in this store and load it transparently underneath the incoming request. If a node receives a request for which the session is not held locally or in the shared store, it can broadcast a query for its location, then open a connection direct to the responding node and demand the session's migration to it down this connection. In this way, WADI provides a simple maintenance path, whilst transparently preserving existing sessions. Rather than pay in terms of cycles and IO upon every change to a session throughout its lifetime, cost is only incurred in the eviction and migration of sessions when this is strictly necessary. For those sites that do require protection from catastrophic node failure, WADI will ship with optional in-vm replication functionality in a shortly forthcoming release.

    Please, if interested, visit the website (http://wadi.codehaus.org), browse the FAQ, download and play with the release and join the mailing lists. We look forward to all feedback.

    Thanks for your time,

    Jules

    Threaded Messages (8)

  2. Production ready?[ Go to top ]

    In the WADI documentation it refers to 2 goals, ctrl-c resilience where sessions are replicated upon graceful shutdown and kill -9 resilience which you refer to here are the high-cost, low maintenence solution. In our situation, the former resilience is most useful, we rarely have node fails, but often lose sessions when restarting applications. Do you think that WADI is close to production stability for ctrl-c functionality?

    Is there a time-line for a 1.0 release, which I assume would have kill -9 functionality too?
  3. Production ready?[ Go to top ]

    You sound like exactly the audience that WADI currently addresses :-)

    Node maintenance is something that I see being done on a regular basis. JVMs and h/w blowing up happens now and then, but at an exponentially lower frequency.

    WADI is close to production readiness as far as this functionality goes. There are a few things that I should like to add, the most important one being the ability to redistribute sessions directly to other nodes upon your own shutdown, rather to shared store. This would mean the removal of the only single point of failure from WADI, which would make me happy :-)

    If you are interested in deploying WADI, jump on to the user list and lets talk.

    Jules
  4. Now here's a subject I have some interest in .. we just finished something similar (our Coherence*Web module,) although probably with a different set of requirements and a much different approach.
    WADI is a new open source project that targets the reliable management of HttpSessions for distributed web applications hosted in Tomcat or Jetty clusters.
    Just out of curiousity, how does it differ from the stuff that is built in (or is supposed to soon be built in) to Tomcat? Any relation to what Filip Hanik (sp?) published a couple of years ago?
    The specific problem that WADI currently chooses to address is that of node maintenance. This involves the shutting down of a node that may be hosting a number of active http sessions whilst allowing these sessions to remain transparently available to their clients.
    Hmm. That could even be done pretty easily to disk, right?
    Replication - This is a high-cost/low-management solution, essentially developed for sites that require sessions to survive the catastrophic failure of their hosting node.
    Yup .. basically the ability to lose (or shut down) a server without affecting end users who may have session data. I'd call it something besides replication .. maybe just "session redundancy" .. since replication can imply that it is replicated across the entire cluster.
    Changes to sessions are copied off-node as soon as is practicable.
    We've seen different approaches used for this. WebSphere typically does it on a delay for example, so you still have a window of time that recent updates to sessions will be lost.

    Some servers do it at the end of (or by the end of) the HTTP request.

    Some servers do it as the data changes, within the request processing itself. That's probably the most expensive way in terms of CPU and network.
    If the original node dies or is shut down one of the copies can be promoted and take over from the original so that the users session is transparently maintained. This is useful functionality if you can afford the substantial overhead in terms of extra hardware and reduced performance that replication necessarily involves.
    It depends what you are comparing to. A lot of apps store the sessions in the database; that is really expensive (cost and performance) for any site that has any load.

    OTOH, comparing to the ability to just manage sessions locally and lose them on node failure, then everything else is more expensive ;-)
    There is also an increased risk factor, as what were unrelated processes are now interconnected by a complex piece of middleware that may become a single point of failure.
    You mean the database?
    A clean shutdown of a Tomcat or Jetty running a webapp whose session manager has been configured to be WADI will evict all sessions within it to a shared store.
    What do you support? Databases? Shared disk? Both?
    A subsequent request, directed to a different node (since the sessions original host is now unavailable) will find the session in this store and load it transparently underneath the incoming request.
    That does mean that there will be an extra cost for every request that has a non-locally-active session ID, including old / expired IDs, right?
    If a node receives a request for which the session is not held locally or in the shared store, it can broadcast a query for its location, then open a connection direct to the responding node and demand the session's migration to it down this connection.
    Same issue as above. That sounds like it could create scalability issues. This is optional?

    Anyhow, I'm interested in which approach(es) you are taking and what your reasons were.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Clustered JCache for Grid Computing!
  5. Thanks for the posting - I'll do my best to cover your points...

    As far as differentiating itself from Tomcat offerings - particularly Filip's...

    TC offers a number of different session managers, written by different authors and offering various session persistence and replication mechanisms. In writing WADI, I considered all these (and more) approaches, then rationalised and unified them into a single pay-as-you-go solution - WADI.

    TC offers both persistent and replicated session managers. WADI combines these into a single manager that can leverage both approaches to support the other. Both approaches suffer from a common problem, which I am not convinced has been adequately resolved in TC - "How can the container know when it is safe to serialise a session?" - This was the first problem that WADI addressed (A RWLock in every session - app requests take a R-lock, container threads take a W-lock). Once this is sorted, the eviction of inactive sessions to long-term storage or their marshalling up and copying off-node at any time becomes a possibility.

    As far as Filip's manager goes, and other upcoming work that I am aware of - these are solutions based on in-vm replication. Most of them replicate to all other nodes in their cluster. This doesn't scale, so you have to manually partition the cluster into smaller pieces. This complicates load-balancer coordination etc. WADI has the beginnings of a dynamic topology manager with pluggable topologies. This will be responsible for organising WADI nodes into a self-arranging, self-healing substrate upon which the WADI session manager will replicate its sessions. If you shut down the whole cluster, sessions will revert to their persistent state, in long-term storage - in a purely replicated scenario, all the sessions would evaporate with the last node to exit - assuming that it ad enough RAM to contain them.

    Evicting inactive sessions to disk is a little more complex than it looks at first glance, because you are in a cluster and, if session affinity breaks down (which WADI is built to withstand, since this happens on node shutdown) you end up in situations where one node might be evicting a session to shared store and another might be looking for it. You have to be quite precise about the ownership of the session and migrate this around as atomically as possible. You also have to be sure that the session is inactive - I describe WADI's solution above.

    I'm happy to rephrase 'replication' to 'redundancy' - we are on the same wavelength...

    As far as approaches to session redundancy go, there are many different points at which you could ship your copy - there is also the question of what you ship, whole session or delta (although the scope of object identity becomes an issue in deltas, because distributed sessions do not have remote semantics like EJBs). Redundancy is a goal of WADI, and one to which I have given serious thought. Whichever of these 'when' and 'what's WADI finally chooses to support, as well as the 'to-where's (on of which I describe above) shall be pluggable, so that users can choose the one that closest suits their deployment requirements.

    In terms of 'where' WADI [will] send inactive and backup session[s] [deltas], only a filebased solution is currently implemented. The interface for storage, however, is clearly demarcated and I envisage adding a number of HA-backends (jdbc, WADI-cluster, JavaSpace, etc...) as soon as the direction of WADI's development becomes less vertical and more horizontal. Hopefully the community may donate some.

    WADI is designed on a pay-as-you-go basis, so , yes, you pay if a request lands on a node that does not own the relevant session. You pay in terms of having to load it from store or organise migration of its ownership from another node. However, this is the exceptional case. You do NOT pay in the usual case, where the session affinity support in your load-balancer routes a request to the node that owns its session. This means that WADI uses minimal resources (unless you switch on replication - NYI), until something like node shutdown or session-affinity breakdown occurs. Then, instead of giving up, it will try everything it can to locate the session and either route the request to the session or the session to the request.

    Currently, if you send WADI a request with a dead session id, it is costly. However I plan to either add timeouts to session cookies so that they evaporate at the same time as the session, or maintain a list of dead sessions, or both, to ameliorate this situation. Once again, though, this should be an exceptional case.

    WADI fn-ality is layered, so that WADI can provide exactly what you require, without your incurring the cost of further fn-ality that you don't. If you don't want the ability to migrate sessions between nodes, you can unplug it.

    This all probably seems a little fragmented, because it answers disjointed questions rather than being planned as a coherent whole - so apologies there. I hope to soon find time to write some fresh doc for the project, which I will put up on the site. If you have any further questions, please get in touch and we can take it from there...

    Finally, I should mention that WADI is not an island, but part of the Apache Geronimo project. WADI replication will sit on top of ActiveCluster and ActiveMQ and be consumed by a number of Geronimo containers. I expect WADI technology to push back into the Geronimo clustering layer. Many aspects of Http and EJB session preservation are similar, and I have been talking to the OpenEJB team about sharing WADI solutions with them - So I have a bit of work to do :-)


    Jules
  6. WADI[ Go to top ]

    Hi Jules,
    Thanks for the posting
    Ditto.
    TC offers both persistent and replicated session managers. WADI combines these into a single manager that can leverage both approaches to support the other. Both approaches suffer from a common problem, which I am not convinced has been adequately resolved in TC - "How can the container know when it is safe to serialise a session?" - This was the first problem that WADI addressed (A RWLock in every session - app requests take a R-lock, container threads take a W-lock). Once this is sorted, the eviction of inactive sessions to long-term storage or their marshalling up and copying off-node at any time becomes a possibility.
    Right .. we had to do the same thing. Do you just lock locally, or across the cluster? (Since we don't require sticky load balancing, we have to manage the locks at the cluster level .. but it's built into our API so it was easy.)
    As far as Filip's manager goes, and other upcoming work that I am aware of - these are solutions based on in-vm replication. Most of them replicate to all other nodes in their cluster. This doesn't scale, so you have to manually partition the cluster into smaller pieces.
    Yup. That's correct, and looks a bit hackish, inflexible and hard to manage no less.
    Evicting inactive sessions to disk is a little more complex than it looks at first glance, because you are in a cluster and, if session affinity breaks down (which WADI is built to withstand, since this happens on node shutdown) you end up in situations where one node might be evicting a session to shared store and another might be looking for it. You have to be quite precise about the ownership of the session and migrate this around as atomically as possible. You also have to be sure that the session is inactive - I describe WADI's solution above.
    Exactly. Plus, sessions have multiple states. For example, the events for created / destroyed indicate initial / final state transitions signified by the session listener. Then you have active / inactive states signified by the activation listener. Then you have binding events signified by the HttpSessionBindingListener and HttpSessionAttributeListener. Where those events get delivered is an interesting question in a cluster. For example if you page sessions to disk, and they expire (e.g. 30 minute lifetime) then who expires them? What server(s) get those events? ;-)
    I'm happy to rephrase 'replication' to 'redundancy' - we are on the same wavelength...
    Yeah, with Coherence*Web we just called it "session management" to avoid specifying the redundancy topology. (It is configurable, so it _could_ be fully replicated if you want, but by default it is automatically partitioned.)
    Redundancy is a goal of WADI, and one to which I have given serious thought. Whichever of these 'when' and 'what's WADI finally chooses to support, as well as the 'to-where's (on of which I describe above) shall be pluggable, so that users can choose the one that closest suits their deployment requirements.
    Yup. I've seen different approaches. We figured that everyone would take the "never lose any data" approach if they had the choice and it didn't cost too much in terms of scalable performance. Building it on top of our partitioned cache service gave us linear scale out past 100 servers, and the HTTP session data is backed up by another server before the HTTP request even completes. The only weirdness is that HTTP session data isn't transactional, so there's no "commit" mechanism per se between the client (the browser) and the HTTP session. (This is a problem even for apps that run on only one JVM, since it's impossible to be sure whether the client got the page back or not.)
    In terms of 'where' WADI [will] send inactive and backup session[s] [deltas], only a filebased solution is currently implemented. The interface for storage, however, is clearly demarcated and I envisage adding a number of HA-backends (jdbc, WADI-cluster, JavaSpace, etc...) as soon as the direction of WADI's development becomes less vertical and more horizontal. Hopefully the community may donate some.
    If you need any help on a Coherence back-end, let me know :-)
    Currently, if you send WADI a request with a dead session id, it is costly. However I plan to either add timeouts to session cookies so that they evaporate at the same time as the session, or maintain a list of dead sessions, or both, to ameliorate this situation. Once again, though, this should be an exceptional case.
    Yup, that's what I figured.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Clustered JCache for Grid Computing!
  7. WADI[ Go to top ]

    Locking : I only lock locally - the session is its own lock. This avoids the overhead of a distributed lock manager, or if you like, it exists, but its locks are distributed between the nodes, each of which is a very specialised lock manager, with locks that migrate on and off it...

    We are agreed on the TC managers then :-) It would be unfair to be too critical. I make the point in the WADI FAQ (I think), that whereas the TC guys have to concern themselves with the building of an entire servlet container, I can spend all my time thinking about the problems of managing state in a cluster. These are two very different problem spaces, so I would expect WADI to have a distinct advantage here...

    The event types specified by the spec make perfect sense until you switch on the distributable tag :-), then you find that whilst you may get notification of a session's creation on one node, you hear about its destruction on another node..etc... Resources which you might be managing via e.g. BindingListeners need to be distributed etc.. (WADI has to make a special effort to reload sessions that have expired in longterm store, simply so that it can make the correct notifications on a node somewhere in the cluster) You have to completely shift mindset. This will, I think, be a stumbling block for many legacy webapps as they are ported to clustered deployments. There are lots of listeners that WADI has to service. All this notification baggage clutters up the code, so WADI currently uses AspectJ to weave in notification as well as e.g. validation and replication aspects.

    Re 'when', I will probably support 'immediate' (your "never lose any data"), 'request-group' (at the end of an overlapping set of requests) and 'timeout' (a period after which any changes are guaranteed to have been flushed to the replication layer). I'm still not entirely happy with the semantics around 'immediate', because there will be app threads running in the container and it will be hard (and possibly not worthwhile) to try to isolate the session from them in order to replicate either just the invocation causing the change, or the whole session... The lack of transactionality, and therefore isolation of the change from the rest of the session contributes to this issue.

    I shall have to look at Coherence now ! My todo list grows longer and longer :-)

    I'm still thinking about the dead session id thing - I think that explicit knowledge of all the nodes in the cluster, which WADI currently does not have, but will as soon as the replication stuff is integrated, will help here. Ultimately I think a list of dead sessions, which time out after a prolonged period, may be needed, but I have tried to avoid the extra overhead so far - we will see how much of a problem this is.

    Thanks for the questions, I was beginning to think that I was the only person on the planet who thought about this sort of thing in such obsessive detail...


    Jules
  8. without session replication ?[ Go to top ]

    This sounds interesting ....

    I have had problems getting the Tomcat 5 session replication to work with a complex web app, so this sounds like a possible alternative.

    Forgive my ignorance, but I don't really understand how this would work without session replication (which is not yet implemented). How do you maintain consistent session state across all nodes in the cluster ?

    The only way I can think of is that the request is intercepted and the session ID used to direct the request to the node on which the session resides. If this is the case then doesn't this make the load balancer kind of obselete ?

    (I'm using a hardware load balancer by the way)

    Regards,
    Greg
  9. without session replication ?[ Go to top ]

    The session only lives in one place (the servlet spec is very specific about this), so it is fragile - if its VM dies, it evaporates. This does not preclude the possibility of migrating the session from vm to store, from store to vm and from vm to vm, or of proxying or redirecting a request from the node on which it lands to the node which owns the session.

    In a non-replicating deployment, WADI can use all of the above tricks, to ensure that, if a breakdown in session affinity occurs, requests will still be processed in the same vm as their corresponding session. This is particularly useful at node shutdown, when a node will disappear from underneath an unsuspecting load-balancer which will then may randomly place subsequent requests for the same session all over the rest of the cluster until affinity is reformed.

    Many deployments currently use the 'hammer' of fullscale session replication to crack the 'nut' of convenient node shutdown/maintenance without session loss. WADI currently provides an ergonomically shaped pair of nutcrackers with which this problem can be more efficiently resolved.

    WADI will provide full replication fn-ality in future, so that if you are dealing with a coconut, we still have the solution :-)

    I hope that makes things clearer :-)


    Jules