Discussions

News: New Article: "In Memory Session Replication In Tomcat 4" Posted

  1. In this article we will discuss the theory behind HTTP session replication and clustering, and how it used within the J2EE model. In the second half of the article, an example is provided of how session replication across a cluster can be implemented using Tomcat in conjunction with JavaGroups, an open source 'group' communication toolkit.

    Read In Memory Session Replication In Tomcat 4.

    Threaded Messages (96)

  2. Thanks for reading my article. I hope you have enjoyed it.

    If you actually want to try this code out, please make sure that you don't have any line feeds in your protocol stack configuration.

    that is the protocolStack attribute in your <Manager> element.

    /Filip

  3. Its a great article. As a new person this, can you please let me know about how should i come out of the following error.
    java.lang.NoSuchMethodError
    at org.apache.catalina.session.InMemoryReplicationManager.createSession(InMemoryReplicationManager.java:194)
  4. Use tomcat 4.0.3
    It solved my problem
  5. I just installed two instances the Tomcat4.0.3 and Configured Apache using mod_jk. Now Load Balancing is working fine (without session replication).After this
    i have added <Manager> Tag(default specified in the site) in the the server.xml in both servers.

    <Manager className="org.apache.catalina.session.InMemoryReplicationManager" protocolStack="UDP(mcast_addr=228.1.2.3;mcast_port=45566;ip_ttl=32)
    :PING(timeout=3000;num_initial_members=6):FD(timeout=5000):
    VERIFY_SUSPECT(timeout=1500):pbcast.STABLE
    (desired_avg_gossip=10000):pbcast.NAKACK(gc_lag=10;retransmit_timeout=3000):
    UNICAST(timeout=5000;min_wait_time=2000):MERGE2:FRAG:pbcast.GMS (join_timeout=5000;join_retry_timeout=2000;shun=false;print_local_addr=false)">
    </Manager>

    Now the server is not serving the page . Nothing is coming .
    Do i have to do something else ?
  6. Hi,
    I have the same problem. I tried to combine this session replication approach with load balancing using AJP13 connector mod_jk. I added the mentioned line "<Manager..." with the protocolStack attribute unchanged to both of the Tomcat instances. When i try to browse http://localhost/examples/servlet/SessionExample, what i get is a "...NullPointerException at org.apache.catalina.session.InMemoryReplicationManager.createSession(InMemoryReplicationManager.java:212)..." in a localhost_examples_log file.

    I would appreciate it if anyone could help...

    Baris...
  7. Hi Baris,

    It is open source, so you will have the source code to debug the problem. Drop Filip an email if you find it so he can update his example. (You will find his email higher up in this thread.)

    Also you should evaluate the Coherence solution that I provided a link to above. If you have any problems with it, let me know.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  8. Did any body tried this ,

     Please help me in configuring,
     Here is the steps i did below:
     
     1)
     
     I installed tomcat4.0.3 two times with different names
     as
         c:\tomcat1
         c:\tomcat2
     Here i didnt change port for tomcat1 in /conf/server.xml
     but i changed port for tomcat2 in /conf/server.xml from
     8080 to 8088.
     
     2) I copied javagroups.jar,tomcat-javagroups.jar
        in both tomcats
           tomcat1/server/lib/
           tomcat2/server/lib/

     3) I opened both tomcat's server.xmls
           tomcat1/conf/server.xml
           tomcat2/conf/server.xml
        and pasted the code ie.
           <Manager className="org.apache.......>
       
        I think here i could done a mistake, i dont
        know where to put the above <Manager..> tag
        in server.xml. I put on the top of already
        commented <Manager...> tag under
        <Context path="examples"...>
        Because they didnt mention clearly which context
        there are 3 context in my server.xml
        1. Tomcat Root Context
        2. Tomcat Manager Context
        3. Tomcat Examples Context
        and in which already Root Context is in comments.
        should any other Context be in comments...???

      4) I installed Apache_1.3.23-win32-x86-no_src in
         c:\
         and put the following lines in apache/conf/httpd.conf

    "ProxyPass /examples1 http://localhost:8080/examples
     ProxyPassReverse /examples1 http://localhost:8080/examples
     ProxyPass /examples2 http://localhost:8088/examples
     ProxyPassReverse /examples2 http://localhost:8088/examples"

         
     5) I started running 2 Tomcats like this
         c:\tomcat1\bin\startup
         c:\tomcat2\bin\startup
        But the result is only one tomcat is running
        with 8080 and other is not. Also 8080 is shutting
        down when i issue shutdown on command line in both
        tomcat1 and tomcat2.

        what is wrong and how to do please can any body
        if explained i will be so thankful to them.
        awaiting for the reply..
      

        thanks. ( venu_y2k at yahoo dot com )

         
     
       
        

     then i put the code
  9. Tomcat 4.1.18 Trouble[ Go to top ]

    I read the article and am trying to get everything to work with tomcat 4.1.18. I got over the initial hurdles of having to drop the JMX lines at the beginning of server.xml. However, as soon as I put in the Manager section, the context will no longer start up. No exceptions are thrown and the final line in the context log file is this:

    2003-01-21 17:32:17 NamingContextListener[/Standalone/localhost/mycontext]: Resource parameters for UserTransaction = null

    The server will then not respond, and it will also not shut down anymore since the startup failed to initialize the shutdown port.

    I'm not sure how much I would have to edit the protocolStack section within the Manager tag. I just changed the IP address to 127.0.0.1.

    Has anybody seen this before?
  10. Tomcat 4.1.18 Trouble[ Go to top ]

    I have the similar problem with Tomcat 4.1.8. It generates errors as bellow at startup.
    ServerLifecycleListener: createMBeans: MBeanException
    java.lang.Exception: ManagedBean is not found with InMemoryReplicationManager
            at org.apache.catalina.mbeans.MBeanUtils.createMBean(MBeanUtils.java:530
    )
            at org.apache.catalina.mbeans.ServerLifecycleListener.createMBeans(Serve
    rLifecycleListener.java:422)
            at org.apache.catalina.mbeans.ServerLifecycleListener.createMBeans(Serve
    rLifecycleListener.java:651)
    ....
    and then looping:

    java.lang.StackOverflowError
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSessio
    n.java:179)
            at org.apache.catalina.session.StandardSession.expire(StandardSession.ja
    va:601)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSessio
    n.java:149)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSessio
    n.java:179)


    There is no problem with tomcat 4.0.2 at all though. Does anyone know why?

    Regards,

    Alvin
  11. Great article (and thanks for the JavaSpaces work that you have done).

    The code in this article seems to take the approach of broadcasting changes to the entire cluster.

    BEA WLS takes the approach of choosing one backup server, and only sending state changes to that server.

    I wonder if anyone has done any testing, to see any performance trade off between both approaches?

    That would be an important experiment.

    Cheers,

    Dion
  12. Fantastic article - very interesting. I have just one question with regards to using Apache as a front end to multiple Tomcat engines. You reference the paper by Pascal Forget at http://www.ubeans.com/tomcat/index.html for a description of using Apache etc.

    This paper is also very useful, but it uses session affinity such that established sessions are sent to the same Tomcat engine with each request. Your article negates this need, and in fact this is undesirable if the engine fails.

    Am I correct in thinking the session affinity is enabled with the jvmRoute=<engineTag> attribute for <Engine/> in the server.xml file ? If so, can we simply ignore the step in Pascals instructions for this to get the best of both worlds, i.e. Apache front end with JavaGroups session sharing ?

    Jon.
  13. Hi Jon,
    you are correct. The jvmRoute is used to uniquely identify a Tomcat instance when Apache and mod_jk sits in front.
    And you are also correct in the fact that it is not needed when you are using JavaGroups since all the Tomcat instances will have the same set of session information.

    And if Apache is doing a round robin (not using sticky sessions) it will work as well.

    /Filip

  14. This is a great article. Actually, we have a home grown Http session clustering system runing using RMI as the communication protocol. Like BEA, we have a primary server and a secondary server for every session. The performance of the system is great even under very heavy load. However, the administration is not as easy. The solution provided in this article is a "transparent" solution. I have 2 questions:

    1. Has the performance of the solution being tested? Some numbers will help a lot.

    2. Does anyone know whether or not a similar solution will work for JRun?

    Thanks,

    Adam
  15. Nice article. I like the design, but I do fear the amount of network overhead would be quite expensive in a large clustered installation. Any benchmarks or performance testing done on this?
  16. Hi Adam and Clay.
    Both of you are making good observations. Since this implementation propagates session changes to all members in the cluster I would not create large clusters. Instead create cluster partitions of two or three tomcat instances in each.

    Also noteable is that this implementation was done without changing any existing Tomcat source code, hence it can be further optimized in terms of serialization and how much data actually is being transferred across if Tomcat is optimized for the same. This is work in progress.

    I haven't made any benchmarks yet but will do as I move forward with a more generic implementation for Tomcat.

    thanks for your feedback
    /Filip
  17. Filip,
          I realize the issue with the large number of nodes in the cluster when the state information is replicated to all the nodes in the cluster. An alternate design could be where the node replicate the session information to a configurable number of other nodes (say 1 or 2). The node can dynamically choose the individual nodes where the session is to be replicated. The selection of the invididual nodes can be different for different sessions.
          This way, the memory and network overload on the nodes will not significant. This would require use of it with sticky load-balancing. But, if request goes to a node which does not have the state information, it can query the cluster for the session information (similar to how WebLogic manages it).
          Any comments on how this may be used with JavaGroups. Also, how would you compare this with the approach of defining multiple cluser?

    -Mahesh
  18. Dear Mahesh, thanks for your questions, I'll try to answer them in the same order they are coming in.

    >Filip,
    > I realize the issue with the large number of nodes
    >in the cluster when the state information is replicated to
    >all the nodes in the cluster. An alternate design could be
    >where the node replicate the session information to a
    >configurable number of other nodes (say 1 or 2). The node
    >can dynamically choose the individual nodes where the
    >session is to be replicated. The selection of the
    >invididual nodes can be different for different sessions.

    > This way, the memory and network overload on the
    >nodes will not significant. This would require use of it
    >with sticky load-balancing.

    >But, if request goes to a node
    >which does not have the state information, it can query
    >the cluster for the session information (similar to how
    >WebLogic manages it).

    > Any comments on how this may be used with
    >JavaGroups. Also, how would you compare this with the
    >approach of defining multiple cluser?


    Question 1: Dynamic peer selection for replication

    This is possible to do with a minor change to my code. However, it will become extremely hard for the load balancer to know where to direct a request that has to fail over since the load balancer will not know who the secondary server (peer) is. And if you are using a hardware load balancer which is pretty common, this will be hard to configure, because the load balancer will not know what server is the secondary.

    Therefor, it is easier to designate the target for the secondary server up front. You can do that in several ways.

    One is to simple dedicate a single multicast address for each cluster partition. Or to set up JavaGroups to use TCP instead of multicasting and therefor directly send the request to a upfront selected peer. JavaGroups offers a great flexibilties in messaging.

    I think I answered all your questions, if not, just let me know and I will try to clarify this for you.

    /Filip
  19. <quote>
    However, it will become extremely hard for the load balancer to know where to direct a request that has to fail over since the load balancer will not know who the secondary server (peer) is.
    </quote>

    As far as I know this is how session replication in WebLogic 6.x works - for the load balancer it is not nesessary to know which server was the secondary. This is exactly what allows it to work with hardware loadbalancers.

    --
    Dimitri
  20. Dear Dimitri,

    >As far as I know this is how session replication in
    >WebLogic 6.x works - for the load balancer it is not
    >nesessary to know which server was the secondary. This is
    >exactly what allows it to work with hardware oadbalancers.

    A complete documentation for how weblogic does session replication and how to configure it with hardware load balancers can be found
    http://edocs.beasys.com/wls/docs61/cluster/servlet.html#1008984

    thanks for your comments,
    /Filip
  21. Filip,

    You mention of running into issue with dynamic replication group. Wouldn't you run into the same issue with hardware load-balancer when designating a static replication group? Well, I suppose you can work around it by configuring load-balancer with the knowledge and intelligence to fail-over only to the replication group. Is this what you had in mind?

    However, with static replication group if all the nodes of replication group were to go down (not necessarily at the same time but with some time interval) then you run into QoS issues. The above is not an issue if you are choosing replication group dynamically (and differently for different sessions). Of course, if all the servers of the replication group fail at the same time then you have the same issue.

    With dynamic peer selection, you would want to definitely use sticky load-balancing. This can be easily configured on load-balancer using cookies or session parameters.

    Also, if the primary node were to fail, the load-balancer could send request to the remaining live nodes. If that node does not have the session info, it can request the session info from the cluster where the secondary node can satisfy the request. From that point on, the node the request went to, becomes the primary node. Of course, this requires additional logic and require modification to the Servlet container to obtain the session info from the cluster.

    Thanks,
    -Mahesh

  22. Hi Mahesh,

    >You mention of running into issue with dynamic replication
    >group. Wouldn't you run into the same issue with hardware
    >load-balancer when designating a static replication group?
    >Well, I suppose you can work around it by configuring load-
    >balancer with the knowledge and intelligence to fail-over
    >only to the replication group. Is this what you had in
    >mind?

    I'm definitely not a load balancing expert hence I won't say what will work or not work. Certainly the clustering code that I wrote can be easily modified in a very simple way to send replication data into single nodes in the cluster that can be chosen with some selection algorithm.

    The hardware loadbalancer I worked with in the past were definity not rocket science. The way they worked was like this.
    1. You configure a load balancer by giving it a static number of routes it can take.
    2. When a request hits the load balancer it choses one of the routes
    3. It performs a health check on that route (it will check if a port is open on a certain server)
    4. If the health check fails it will take the next route and perform the same until it finds a server that responds positively.

    This means, yes if all nodes in a cluster are gone, there is nowhere to forward the request to. So the hardware load balancers that I worked with, you couldn't program them to query the cluster for information. BEA's Apache module can probably do this, but this is software load balancing, and it is customized to their cluster. You couldn't put BEA's Apache module in front of a Tomcat cluster, because they don't have an interface to talk through.

    But in the scenario where nodes go up and down, that shouldn't be a problem with the Tomcat/JavaGroups since when a node comes online, the very first thing it does prior to saying it is ready, is to retrieve the entire session state from another node. This is an example of where peer-to-peer interaction happens.

    I just realized that I'm digging into the territory of load balancing, my apologies, I'm not an expert on this field, so I can only speak from past experience.

    But I hope that I shed some light on your question, if not, just let me know and we will start over :)

    thanks for your feedback
    /Filip
  23. If I remember correctly, JBoss uses the JavaGroups package to implement it's clustering/failover services. Does anyone know if similar network overhead problems will be present in the JBoss 3.0 release as a result?
  24. If I remember correctly, JBoss uses the JavaGroups package

    >to implement it's clustering/failover services. Does
    >anyone know if similar network overhead problems will be
    >present in the JBoss 3.0 release as a result?

    Clustering means network overhead - always. You can't cluster and have less resource being used.

    But you can trust that JBoss developers on JBoss clustering and myself on Tomcat clustering will do everything we can to minimize the amount of data we are sending.
    The rest we leave up to you. If you put 18 servers in one cluster partition, you will get less bang for the buck than putting 2 servers in 9 cluster partitions.

    This is why your design and your implementation has the biggest effect on the cluster performance, because hopefully the platform that you are building ontop of will give you the options to cluster in different ways.

    I hope this helps
    /Filip
  25. <quote>
    If you put 18 servers in one cluster partition, you will get less bang for the buck than putting 2 servers in 9 cluster partitions.
    </quote>

    When you replicate session state to all cluster members, but if you do point-to-point replication from primary to secondary only then this is not an issue.

    --
    Dimitri
  26. <quote>

    >If you put 18 servers in one cluster partition, you will
    >get less bang for the buck than putting 2 servers in 9
    >cluster partitions.
    ></quote>

    >When you replicate session state to all cluster members, >but if you do point-to-point replication from primary to >secondary only then this is not an issue.

    You are right, since the initial implementation replicates to all members in a cluster partition, you would want to divide them into 9 partitions. That way you achieve the exact same thing. You can do that by configuring the protocolstack in your server.xml file.

    /Filip
  27. Great article! We were writing a similar article (using Coherence instead of Javagroups of course!) to show the delta between replicated (local on all nodes) and partitioned (local on one or a subset of nodes) caches ... both ways can be completely transparent, just as you did with clustered (your example) vs. non-clustered (the Tomcat default).

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  28. What do you guys think of the new JCache JSR? Will Coherance / JavaGroups be part of it? SpiritSoft Cache was demo'd at JavaOne, and they talked about many caching techniques (local caches, caches placed at different nodes, etc...) and how they can all work together to get great performance.
  29. Dion: "What do you guys think of the new JCache JSR? Will Coherance / JavaGroups be part of it? SpiritSoft Cache was demo'd at JavaOne, and they talked about many caching techniques (local caches, caches placed at different nodes, etc...) and how they can all work together to get great performance."

    I had a conversation with Bela Ban (JavaGroups founder) about this particular topic. I'm currently trying to join JSR 107 (JCache). It is good to be standardizing the caching concept (interoperability, portability, etc.) but I had some issues with the JSR as it was original presented (I felt that some of the approaches were too complex and some items simply did not belong in the JSR).

    Regarding implementation, you can be sure that the final JSR will be implemented by (among others) Oracle (submitted the JSR), SpiritSoft (SpiritCache is based on the current JSR spec), Tangosol (Coherence), and there will undoubtably be implementations based on JavaGroups (which is LGPL).

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  30. Filip,
    First of all, a first rate article on clustering. Thank you!

    I had some questions/comments about some of your bullet points at the end of the article.

    Re: Bullet - Your session state is all you need:
    Great point. I keep seeing programmers plugging away every single thing into the session - I even saw some code which uses the session for parameter passing between servlet methods!! In reality a LOT of it can be passed simply thru the request object like you mentioned. Moving to a clustered environment would certainly unveiled some sleazy coding practices. :-)

    Re: Bullet - Be aware of what you are storing in your session:
    I've found it very useful in my projects to organize/group together related attributes into classes. For instance, instead of saving each attribute relating to a user profile (login, access privileges, etc) as separate Strings, I would group them into a UserProfile object. This keeps this object easily manageable & used in a uniform manner (via some standard interface.)
    However, it looks like this approach may adversely affect session replication performance, huh? Though these objects individually may not reach sizes as great as 1MB as you mention, I wonder what would be the effect of such a coding practice. Any comments?

    Re: Bullet - Design with clustering in mind:
    I'm not sure if I understood why each Tomcat instance stores 50% more session data. On an average, at any point of time, ALL the sessions on each instance must be identical, right? So, regardless of the "distribution" of original user sessions, once the replication occurs all instances are identical as far as the sessions are concerned.

    In other words, if there are 300 users being served by two Tomcat nodes, each node will have 300 sessions. If this is increased to three Tomcat nodes, each node will still have 300 sessions, right?

    So, why would an instance store 50% more? As compared to what?

    Thanks for a great read!
    --Das
  31. Hi Das,
    thanks for your comments, very good indeed.


    >Re: Bullet - Be aware of what you are storing in your
    >session:
    >I've found it very useful in my projects to organize/group
    >together related attributes into classes.

    That sounds like a healthy thing to do. The reason this is actually better to store your UserProfile object instead of 10 attributes that it consist of separately, is that for each time you set a attribute, it sends a cluster message.
    Hence your network IO increases if you store separate attribute. Remember that JG adds some overhead to each message sent out. And usually objects like UserProfile's are small enough to not be bad to group together, and you can have it implement the Externilizable interface hence take care of the serialization yourself. The disadvantage is that you changing the age in the UserProfile object, causes the whole object.
     
    So you decide when to make the cut off.


    >So, why would an instance store 50% more?
    >As compared to what?

    yeah, the 50% sentence is kind of complicated, I knew I didn't formulate it well enough.

    There are a couple of different ways of looking at it:
    1. 300 sessions per tomcat instance. if you add a third server to the existing cluster of two your cluster uses 50% more memory (RAM) instantly.

    2. Ie if you run a cluster of 2 servers and then ran one server outside the cluster, the clustered servers would have 200 sessions each and the independent server would have 100. Equal load of 100 per server. If you add the independent server to the cluster, each of the original clustered servers have to take on that additional load 100 sessions, hence an increase of 50%.

    >Thanks for a great read!

    You are very welcome! Thanks for the feedback.

    /Filip
  32. Hi Filip,

    > That sounds like a healthy thing to do. The reason this
    > is actually better to store your UserProfile object
    > instead of 10 attributes that it consist of separately,
    > is that for each time you set a attribute, it sends a
    > cluster message.

    If I group them into a UserProfile object, the above "cluster message" problem still persists. Each time I change an attribute, the entire UserProfile object is replicated. :-) Yes, the overhead of the protocol "per message" may be reduced. I wonder what percentage of the "message" is protocol overhead & if it is significant.

    > There are a couple of different ways of looking at it:

    Cool. I suspected as much. :-)

    Thanks for the reply!
    --Das
  33. If I am reading this correctly, I canuse this to break the J2EE rule that two webapps can't share a session. So, can I use this to share sessions between more than one web application on the same tomcat server?

    This could save me a whole lot of work integrating two web applications together, letting me "cheat" until the merge is complete.

    Am I right about this?

    -Pete
  34. If I am reading this correctly, I canuse this to break the

    >J2EE rule that two webapps can't share a session. So, can
    >I use this to share sessions between more than one web >application on the same tomcat server?

    Hi Pete,
    unfortunately this will not do it for you. At least I can't this very moment see how it would work.
    The session ID in tomcat is global to the entire servlet container. But it is the logic inside the Catalina code that separates out sessions between different contexts.

    The javax.servlet.http.HttpServletRequest.getSession() method doesn't allow you to specify where you retrieve your session from.

    So I would say, no, I don't think this code will allow you to share sessions between webapps.

    /Filip
  35. Hear hear for JavaGroups! I've used in my previous company as the multicast protocol for our general-purpose cluster-wide data Smart Cache:

    http://www.theserverside.com/patterns/thread.jsp?thread_id=10610

    I've compared the reliability of JavaGroups with WL 5.1 JMS, and JavaGroups wins hands-down because it's server-less, but has the guarantee-deliver characteristics of JMS. I have yet to compare JG with multi-cast implementation of WL 6.1/7.0 JMS topics, but regardless, it's refreshing to have open-source alternative to expensive servers.

    Gene
  36. Hi all,

    Does anybody know any 'serverless' JMS implementation that uses UDP multicast as transport protocol ?
    (Persistent messages are not a must).

    Regards,
    Mileta
  37. Hi, Mileta

    Have a look at serverless multi-cast product: Softwired i-Bus from Softwired (http://www.softwired-inc.com)

    Regards,
    Tibi
  38. Gene Chuang on April 11, 2002 wrote:
    Hear hear for JavaGroups! I've used in my previous company as the multicast protocol for our general-purpose cluster-wide data Smart Cache:

    Thanks Gene, you will be glad to hear the JavaGroups development has been moving forward and a lot of work has been put into performance enhancements lately. Especially speeding up serialization internally.

    Filip
  39. Nice publicity for JavaGroups, but this replication approach is probably the least efficient and non-scaleable inmemory replication implementation imaginable.

    Thanks to Bela Ban and others for a great product, but this is probably the wrong demonstration of it's powers.
  40. 1) You're my favorite super hero for doing this, i've been trying to find a way to cut back on BEA WLS licenses and this may just be the ticket.

    2) I've seen a lot of people bashing the broadcast to all server approach... Has anyone gone the next step and adapted this to use primary/backup pairs? Obviously you'd have to write a new component or update the current component of apache that talks to the tomcat (mod_jk?)

  41. Hi Jason,

    >1) You're my favorite super hero for doing this, i've been
    >trying to find a way to cut back on BEA WLS licenses and
    >this may just be the ticket

    well thank you.

    >2) I've seen a lot of people bashing the broadcast to all
    >server approach...

    the broadcast to all servers works well in small cluster/cluster partitions. I'm currently working on a primary/secondary approach as we speak. They are not to far from each other and the amount of code to add into the existing is pretty limited.

    I'm actually thinking of making it an option in server.xml so that you can select which method you want to use.

    And to tell you the truth, there will be no need to rewrite anything on the apache side. give me a couple of weeks, and I will be back with something good :)

    thanks for writing me,
    Filip
  42. Hi,

    I was involved in session replication back in 1992 (ish) before this new fangled web stuff. So stepping back we now have:

    a) Servers which are hot swapable, so for failover read disaster recovery. ie someone nukes your building. Local hardware failures in a good server as really unlikely. And cost benefit of coding/complexity against decent hardware means it aint worth it, other than for disaster recovery.

    b) Disaster recovery means your machines are in different physical locations, so your network topology better be optimised for clustering - ie your cluster traffic is not on your backbone.

    c) Network failure is a real issue. How is traffic routed between your failover boxes? What happens if your internal traffic dies, but the load balancer (layer) still believes both (or all) your servers are alive? What happens on network rejoin?

    What worries me about these 'clustering/fail over' solutions is that they are fundamentally missing the point. In terms of load balancing, the replication introduces as much processor load as servicing the requests. And the replication solution presumes that the server hardware is inferior to the networking hardware.

    Dual fire walls, Dual switches, Dual Web Servers, Dual DB servers. In seperate buildings. Now that is the real challange, and not a 'cluster' of servers on a lan. Lets face it, stick this 'cluster' into a single case, and you have a multi-processor server with built in fault tolerance. But you are letting the hardware vendor do all your work and saving on software costs. Decent fault tolerant PC servers are way cheaper than 2 or 3 software engineers, or even a weblogic license.

    Jonathan
  43. Hmm. What an interesting article. I realize that you're point is well taken, but I think you fail to realize that the origional thread was addressing an aspect of the problem, not providing a holistic solution.

    You're right, you do need to have true disaster recovery. We do it with world wide DNS(to partition load) on the top and full failover at every level from web down to database.

    I like the idea of cluster/failover because I don't understand the philosophy of paying a million dollars a pop for ultra redundant super computer wannabes that can still have the power cable kicked by a maintenance guy.

    I'd much rather have 30 $5000 boxes sitting there. They are practically disposable. But licensing a 16 cpu machine is a lot cheaper than licensing 30 2 cpu machines.
  44. Jason: "I like the idea of cluster/failover because I don't understand the philosophy of paying a million dollars a pop for ultra redundant super computer wannabes that can still have the power cable kicked by a maintenance guy."

    That's a little misleading ;-) considering power setups ... but a good point nonetheless.

    Jason: "you do need to have true disaster recovery."

    Some sites do. The question is, what disasters are worth handling by up-front investment. Some sites (their "logic" and processes) are fine on a single CPU / single box / single power supply / single network. Other sites that we've seen since Sept 11 have been fully duplicated right down to the e10k's (expensive little database boxes).

    As you increase your requirements, your prices increase dramatically. How seamless is the failover? Mid-transaction? Session level? Or can you have 5 minutes of no accessibility while you swap which one is the primary and which is the backup (e.g. veritas et al stuff in the real world with geo dist). We've witnessed some of each, and having instant failover without losing in-flight transactions is expensive and (among other things) very hard to test.

    Jason: "I'd much rather have 30 $5000 boxes sitting there. They are practically disposable. But licensing a 16 cpu machine is a lot cheaper than licensing 30 2 cpu machines."

    Not cheaper to put in, but perhaps cheaper to administer / maintain. How about (best Dr. Evil voice) one million dollars?

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  45. Very good come back. BUT, I recon hot swappable PC servers retailing at under 20K each are fine. How many sites have millions of concurrent users. How many have 1000 concurrent users? Concurrent means actually clicking at the same instant?

    And PC power is still doubling, so a 20K server in a year will be how fast? In two years? 8GHz aint far away. Clustering simply isn't worth it for 99.9% of sites.

    I've been reading up on single server JBoss servers under load runner handling 6000 requests per minute, running on a 933MHz box. How many real sites need more than this? - not page displays, but dynamic content handlers - the static web servers can handle gifs etc. And if your site does need more than this the real bottle neck is going to be the database anyway, and no matter what you do with EJB clustering you are going to have to have huge servers.

    And clustering means more cables for your cleaner to kick out :)

    Like the Sun Fire BTW.

    Jonathan
  46. One final point. I do not ever buy into the idea that 30 networked machines will be more resilient than a single decent server, with internal hardware designed for resilience.

    Thinking from every aspect. ie to manage, upgrade, architect, debug, install, power, cool, house, purchase, replace..... etc.

    I am being devils advocate a bit. Cos I also know the preformance of pc arrays is going to be astounding very soon. And I seem to remember that IBM recently produced an array of 300 PC's that was lightening fast. Can't remember where I read that.

    Jonathan

  47. Hi Jonathan,

    Jonathan: "I've been reading up on single server JBoss servers under load runner handling 6000 requests per minute, running on a 933MHz box. How many real sites need more than this? - not page displays, but dynamic content handlers - the static web servers can handle gifs etc."

    First, most high-scale apps do offload all static content to other servers (apache farms et al).

    Second, and here's the crux of the "scalable performance" issue, it is a rare occurrance that a real apps handle 6000 requests per minute on a single CPU box. Why? The honest truth hurts, but here it is: real apps are big and complex and slow because it's too hard to build a very-well-optimized big app. I've even seen apps that chew up multiple e10k's (90+% CPU util on all domains). They can be optimized to a point, but basically when it comes to optimizations, it's the law of diminishing returns. We're not talking about toy apps like the PetStore example ... we're not talking about requests that use less than 2ms processor time (e.g. 6000 requests per minute on a single CPU box). And more importantly, you are spot on that ...

    Jonathan: "And if your site does need more than this the real bottle neck is going to be the database anyway"

    That's exactly right. Almost all applications bottleneck on the database when they scale to extremes. And (in terms of the "scalable performance" issue) that's another area that clustering can help. For example, by moving data into the app server cluster while maintaining data integrity -- and removing that load from the database -- our Tangosol Coherence product can save customers hundreds of thousands of dollars on the database tier.

    Jonathan: "Like the Sun Fire BTW."

    I figured you would ;-). Doesn't run Quake though.

    Jonathan: "One final point. I do not ever buy into the idea that 30 networked machines will be more resilient than a single decent server, with internal hardware designed for resilience."

    Here is the even more difficult and important issue: availability and reliability. That's the bigger selling point for clustering with most of our customers -- how do they ensure up time and provide _predictable_ scalable performance. If you aren't up, it doesn't matter how fast you are (no double entendre intended). Clustering is all about reliability of service, because for some customers that "5 minutes" of downtime during business hours is _completely_ unacceptable.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  48. Hi,

    >For example, by moving data into the app server cluster >while maintaining data integrity.

    This is the bit I am not convinced by. And its really where this thread started. All the J2EE boys are trying to distance themselved from the database and provide in memory clusters of cached data. For some reason they believe that its more efficient and resilient for them to do all this on 2 year old slow technology (java app servers) rather than let the database replicate itself, and rely on database optimisations.

    I know there is always scope for in memory cache for certain data types, I do this myself. Moving result sets into the client is super fast for the user and reduces load!

    But this fashion to downplay 20 years of database technology in favour of an java abstraction is very wierd. Jobs for the boys IMHO.

    Jonathan
  49. Jonathan: "This is the bit I am not convinced by. And its really where this thread started. All the J2EE boys are trying to distance themselved from the database and provide in memory clusters of cached data. For some reason they believe that its more efficient and resilient for them to do all this on 2 year old slow technology (java app servers) rather than let the database replicate itself, and rely on database optimisations."

    It's a very simple concept:

    1) Databases don't scale well: it get's increasingly (exponentially?) expensive to scale a database and there is a hard limit to how far they scale

    2) Data localization: application logic in the application tier performs better when the data it is operating on is available locally

    No amount of database optimizations will solve either one of the above. There are database caches (TimesTen etc.), OODBMS caches for RDBMS (Javlin etc.), Oracle cache appliances, ... why??? Because the database doesn't scale.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  50. May I ask, why is the "messaging" term popping up here? I guess I don't understand how JavaGroups work, but I would think that once you change a Session variable on one server, that one has to propagate to all other servers before you return the control to the user. Otherwise the user may come back on another server before the replication took place and not find the changed value. Maybe less likely in the real world and more likely in a load-testing application? But I would like to believe that the clustering solution is theoretically correct as well. So shouldn't a clustering solution be synchronous in nature?

    Database throughput and clustering information can be found on the TPC site.

    Caching is a different thing. I would distinguish between reading data and changing data -- session or persistent (or both!). So caching the read data is great, and I would never replicate that to all servers. If something is not in the cache of the current server the user was hitting, go read it again. The only issue is that you might want to use that replication multicasting technology to invalidate the cached data of all servers once it is changed on one of them.

    I agree with Jonathan, let the database do its transaction management, reliable updates and possibly clustering. If you want complete session failover, put the session data in the database. If you keep session data in memory, use sticky sessions (always go back to the same server).
  51. Hi Christo,

    Christo: "May I ask, why is the "messaging" term popping up here? I guess I don't understand how JavaGroups work ..."

    JavaGroups is a messaging platform. It's somewhat higher level than multicast UDP and somewhat lower level than JMS.

    Christo: "I would think that once you change a Session variable on one server, that one has to propagate to all other servers before you return the control to the user. Otherwise the user may come back on another server before the replication took place and not find the changed value."

    Even worse than that. Multiple requests per user can be in flight in parallel ... e.g. multiple clicks (gee, this credit card transaction is taking a while to process, maybe I'll just click again a couple of dozen times), HTML frames (however many frames there are, that many requests will be sent in parallel by most browsers), etc.

    Christo: "Maybe less likely in the real world and more likely in a load-testing application?"

    No. Real world users are infinitely more devious than load testers.

    Christo: "But I would like to believe that the clustering solution is theoretically correct as well. So shouldn't a clustering solution be synchronous in nature?"

    You've got to figure that out yourself, since the Servlet spec doesn't specify that aspect of the behavior.

    Christo: "Caching is a different thing. I would distinguish between reading data and changing data -- session or persistent (or both!). So caching the read data is great, and I would never replicate that to all servers."

    It depends. Our Coherence product is used for readonly and readmostly and even very dynamic data ... it provides a cluster-wide synchronized data store with concurrency control, so it compensates for the worries that you expressed (although it isn't transactional until our 2.0 release). If you want more data, check out the benefits overview document (pdf).

    Christo: "The only issue is that you might want to use that replication multicasting technology to invalidate the cached data of all servers once it is changed on one of them."

    Sepuku pattern (on this site somewhere) does just that using JavaGroups ... good call!

    Christo: "I agree with Jonathan, let the database do its transaction management, reliable updates and possibly clustering."

    I hope you don't mind huge bills and slow apps ;-)

    Christo: "If you want complete session failover, put the session data in the database. If you keep session data in memory, use sticky sessions (always go back to the same server)."

    Databases can't handle the load even for session management in some apps. Use databases for what they're good for -- persistent storage of transactional data.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  52. Cameron,

    Thanks for your reply.
    I would like to ask if it is fair to compare Tangosol Coherence with JavaGroups and OSCache (http://sourceforge.net/projects/opensymphony/). If yes, could you please tell me what are the advantages of Coherence over JavaGroups+OScache? If not, could you please tells me the differences?

    Thanks.
    Sam
  53. Sam: "I would like to ask if it is fair to compare Tangosol Coherence with JavaGroups and OSCache (http://sourceforge.net/projects/opensymphony/)."

    There are many cases where it is very fair to compare them, and many more cases where it is not. Start with these ServerSide threads:

    Tangosol releases Coherence: Distributed Cache Product
    A.C.E. Smart Cache: Speeding Up Data Access
    Seppuku pattern

    Sam: "If yes, could you please tell me what are the advantages of Coherence over JavaGroups+OScache? If not, could you please tells me the differences?"

    JavaGroups is a messaging infrastructure, and it has some simple replication examples that come with it (DistributedHashtable I think). Dima used it (JavaGroups messaging) for his original Seppuku asynchronous cache invalidation implementation on Weblogic, for example. You can use JavaGroups as the basis for a clustering implementation, e.g. if you wanted to build a clustered cache or an HTTP session failover implementation (e.g. this article).

    OSCache has a great description of itself: "A JSP tag library and set of classes to perform fine grained dynamic caching of JSP content. It also has persistent on disk caches, and can allow your site to have graceful error tolerance (eg if an error occurs like your db goes down, you can serve the cached content so people can still surf the site almost without knowing)." It is a caching library, not a clustering implementation.

    For a description of Coherence, check out our Coherence product page. Coherence uses an underlying n-point peer-to-peer messaging protocol called TCMP, which is similar in nature to JavaGroups. It has higher performance (less latency) and throughput (7x-8x) with lower memory utilization (90% lower under load) and lower CPU utilization (50% lower under load), and it is designed and tested for reliability under massive load (i.e. 24x7).

    The Coherence product itself is a replicated data store that can be used as a synchronized map of data (sync'd across the cluster to solve data integrity issues) or a size-limited cache. Check out our overview document (.pdf). JavaGroups doesn't support this, but that's really a "doesn't support this yet", since if you can download JavaGroups, code the functionality, publish the code back and then it will exist. ;-)

    Coherence is a commercial product which reflects man-years of research, development and testing, and Tangosol provides support and upgrades. JavaGroups is LGPL and OSCache has an Apache-compatible license, so the source is "free" (the one as in speech, the other as in beer).

    You can also research Oracle's caching solutions and SpiritSoft's ... they are both based on Oracle's JCache API which is currently being evaluated by the JCP (JSR 107) (which BTW I am still trying to join). According so several of our customers, Coherence is faster, more scalable and more reliable, but if you are doing multi-tier flow-down semi-connected (etc.) caching, I would suggest the SpiritSoft solution, and if you are an Oracle-only shop, I'd suggest the Oracle solution ;-).

    The net result? Whether you like sync or async, "free" or commercial, OS or memory based, vendor or sourceforge solution, there's a Java caching or replication solution waiting to be had. That's the great thing about this market.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  54. Hi,

    Could someone please tell me why the article uses JavaGroups for the communication instead of JMS? I would think JMS is more popular for messaging.

    Thanks for any comments.
    Sam
  55. Hi Sam,

    JavaGroups is a lower level messaging system than JMS. You could implement some JMS features on top of JavaGroups, for example.

    That said, you could use JMS topcis to accomplish most of this functionality.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  56. Great article and interesting software!

    Two questions:

    - Is there a mechanism to keep the JavaGroups coordinator alive? In case the computer hosting the coordinator goes down..

    - When you are talking about network usage with several nodes, is it still true with dedicated physical network for JavaGroup?
  57. - Is there a mechanism to keep the JavaGroups coordinator

    >alive? In case the computer hosting the coordinator goes
    >down..

    there is always a coordinator. If the coordinator goes down, another node will assume the role of coordinator.

    >When you are talking about network usage with several
    >nodes, is it still true with dedicated physical network
    >for JavaGroup?

    I'm not sure what configuration you have in mind. I'm sure there are ways to optimize it, and if you have any ideas, we would love to hear from you!

    Filip
  58. Very interesting article,

    In your article you wrote "when a new server joins the cluster..." and gave a scenario with "When a third Tomcat instance TC3 is started up on the network...." what protocols enable you to achieve this fail back, i.e. the "<get-all-sessions>" when a new node enters to an already existing replication group

    Regards
    Uri Lukach
  59. Hi Uri,
    >In your article you wrote "when a new server joins the
    >cluster..." and gave a scenario with "When a third Tomcat
    >instance TC3 is started up on the network...." what
    >protocols enable you to achieve this fail back, i.e.
    >the "<get-all-sessions>" when a new node enters to an
    >already existing replication group

    JavaGroups is a group communication protocol that handles membership notifications. So each manager gets notified that a new member has joined the group.

    But in my implementation it is actually the new member that asks a cluster node for the current state upon startup

    Filip
  60. I just read you source code (the 4 files) and I am thinking in something you stated in the article:
    You said that you hadn't had to change code in Tomcat source.
    I have great doubts about this: Certainly somewhere in Tomcat there must be something like a
    "new StandardManager()" that had to be replaced with
    "new InMemoryReplicationManager()", right?

    Otherwise, how is this Manager created in the first place?

    Some other questions that I have are maybe generic to all this web/servlet/jsp server:
    What is the role of a "Manager" vs a "Principal"?
    What is a "Realm"?

    Thank you! You've done a great work!
  61. In Tomcat there is a configuration file that specifies the session manager.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  62. Thanks Cameron,
    you're a solid backup :)

    Filip
  63. Filip,

    I just tracked down the exact setting. It's the Context/Manager element in the conf/server.xml file. See:

    http://www.tangosol.com/faq-coherence.jsp#tomcat4

    BTW we have posted the Tomcat-4 session manager implementation that plugs into the Coherence replicated cache. Check it out if you get a chance. (Source is available through our developer license, which is free.)

    One of the choices we had to make was how to do events on the session attributes (etc.) ... should they be done on every machine upon session expiry or should they just be done on the one machine that expires the session? Obviously either way is arguably correct, but we used to have headaches with Weblogic when they didn't have this working just right.

    Rob's probably going to write up how to configure the Apache load balancer for Tomcat4 ... it is easy to mess up, but works pretty well on Linux once configured correctly.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  64. Hi Cameron,
    I've been meaning to create an interface for Tomcat session replication independent of the underlying messaging system.
    This interface would be part of Jakarta-commons, and hence we could develop session managers as part of the Tomcat source. The Apache license and Javagroups' LGPL license are currently not compatible, so they say, hence if we abstract out the messaging interfaces, we could work within the Tomcat CVS to improve the session replication mechanism.

    My current gig ends next week, after that I will have some time to wip something together.

    let me know if you would be interested in collaborating on this with me.

    mail at filip dot net

    best
    Filip
  65. Hi Filip,

    I've emailed you the source for our Tomcat module. Let me know if it fits in with the interfaces that you're thinking about.

    Our implementation was done at a much finer grained level than yours. We don't serialize the session object at all (just the attributes, and those individually).

    We also set it up to do cluster-wide evaluation of timeouts, so any server that gets hit updates the last-accessed time and although the entire cluster is processing the sessions, the invalidation events will only occur on one machine.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Tangosol Coherence: Clustered Coherent Caching for Java and J2EE
  66. Firstly, apoligies for the long post, and thanks for a great article (and code of course !)

    I have been having trouble getting to get the replication happening across a multicast network (in fact I just tried it across two servers on the same box and it worked like a treat straight away) and was wondering if I was missing anything obvious, like a configuration switch or something.

    So far the following tasks have been performed:
    1. Followed instructions as per article, installed in seperate VMWare linux clients.
    1a. note: if a route is not available for the multicast packet (a default route will do) the server will freeze.

    2. Setup a network analyser, and hit the test page.
    2a. network analyser sees multicast packets, and the data field shows session id and other session information.
    2b. serverA does not register session for serverB

    3. Maybe VMWare is not supporting the multicast packet.
    3a. compiled Javagroups-2.0 so I could use the McastReceiverTest
    3.b multicast tests recommended in the article worked OK.

    4. Enable debugging for tomcat-javagroups
    4a. would like to see some more debug info (the debugs show that only local addresses are being listened to)

    5. Compile tomcat-javagroups.src.jar with some more logging lines.
    5a. Got more debug information, but could still not see any multicast packets from other than the localhost.

    This is where things got a little interesting. There is a line in the code under run() that says:
    //we are only interested in our own messages

    Unfortunately I'm out of my depth trying to read the actual code following it, so I don't know if it has anything to do with my problem or not.

    Has anyone else had similar issues?, is there any extra configuration required if I want to use multiple hosts. Are the hostnames (and their resolution) in the server.xml file significant for the replication process ?

    Any help would be appreciated.

    Regards,
    Lyall
  67. Lyall,

    Check the TTL setting that is being used for multicast. You may need to raise (or occasionally even lower) it.

    For an explanation of TTL, see our FAQ:
    http://www.tangosol.com/faq-coherence.jsp#mcast

    Whatever the problem is, when you figure it out, could you post back the solution?

    Are you using mod_jk? mod_jk2? Also, if you're doing testing, are you considering doing performance / load tests?

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  68. Cameron,

    Thanks for the reply. I've left the TTL at the default of 32, so it should be right (I tested moving it to 0 and the packets disappeared off the network). I'll try more tests with smaller increments than 32 ;). Also tried specifying a bind_port and bind_addr but that just hung the server.

    However, I didn't install mod_jk (I didn't know what it did so it did not get installed). Sounds like a mistake. I'll try that and get back to you.

    Yes, after getting this going I am planning some performance tests. It will be interesting seeing how far this thing will go, and how much traffic it will generate. Given most servers run 100M full-duplex in a server environment I'm not expecting network load to be a problem, more the amount of processor on serverB processing serverA's updates will take.

    If I get any decent results I'll put them up.
  69. Lyall: "I'll try more tests with smaller increments than 32 ;). Also tried specifying a bind_port and bind_addr but that just hung the server."

    Odd. So the Javagroups multicast test stuff worked but not the Javagroups Tomcat plug in. There's mailing lists on the sourceforge site for Javagroups ... try those.

    Lyall: "I didn't install mod_jk (I didn't know what it did so it did not get installed). Sounds like a mistake."

    It's the load-balancing module. It allows an array of Apache servers to balance (including sticky) to a cluster of Tomcat servers running the Javagroups or the Tangosol Coherence clustering module.

    Lyall: "Yes, after getting this going I am planning some performance tests. It will be interesting seeing how far this thing will go, and how much traffic it will generate."

    Cool. If you haven't had a chance yet, make sure you download and include Coherence 1.1.3.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Tangosol Coherence: Clustered Coherent Caching for Java and J2EE
  70. Cameron,

    Debugging shows that the problem is somewhere in the multicast group merge protocols.

    It attempts it:

    --------=>Module=MERGE2.FindSubgroups.run()<=-------------
    Message=found multiple coordinators: [atom:32806, a:1035]; sending up MERGE event

    ------------------------------------------------------------
    --------=>Module=CoordGmsImpl.merge()<=-------------
    Message=coordinators in merge protocol are: [a:1035, atom:32806]

    ------------------------------------------------------------

    but fails:
    --------=>Module=UDP.receive()<=-------------
    Message=received (mcast) 694 bytes from /192.168.174.130:45566 (size=694 bytes)

    ------------------------------------------------------------
    --------=>Module=UDP.handleIncomingUdpPacket()<=-------------
    Message=Message is [dst: 228.1.2.3:45566, src: a:1035 (2 headers), size = 295 bytes], headers are [[UDP:group_addr=TomcatReplication] [NAKACK: MSG, seqno=16, range=null] ]

    ------------------------------------------------------------
    --------=>Module=NAKACK.handleMessage()<=-------------
    Message=[atom:32806] discarded message from non-member a:1035

    ------------------------------------------------------------

    Now just have to work out why. Might have to try those javagroups mailing lists ;)

    Regards,
    Lyall
  71. Lyall: "Debugging shows that the problem is somewhere in the multicast group merge protocols."

    Sorry, can't help you there. We had some problems with Javagroups when we tried it related to reverse DNS timeout, but no multicast issues.

    If you have the time, test with the latest Coherence 1.2 module for Tomcat, which now includes Tomcat authentication support.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Tangosol Coherence: Clustered Coherent Caching for Java and J2EE
  72. Cameron,
    I downloaded Coherence and SpiritCashe products and am planning to
    test them in same environment: WLS 6.1SP5 clustered(possibly 7 or 8). I am sure you have done some testing in similar environments.

    The problem we are trying to solve is the following:
    we are implementing RBAC( www.nist.gov/rbac ), role
    based access control. The model's data, of course, belongs
    in cache. Updates are not frequent. We are considering
    several implementation options: caching middleware mentioned above
    are just two of our list of 5+ options.
    The one that I am considering vs. using your software is
    this:
    - load cache on the startup
    - publish list of cache object keys that require update
    through JNDI using UserTransaction while also updating
    the db
    - rely on the server's JNDI replication(???)
    - on every request, before checking the cache, check the "update required list"
    first and update cache object from db if required
    - run separate thread to check "update required list" to upload changes from
    db to keep it small

    The choice is between performing a push vs. performing a pull each with several configuration options.

    The functionality that this approach is missing is usage of SoftReferences
    to free least recently used cache objects under heavy load which we
    don't expect for the first release anyway.

    I hope I described the problem in enough detail to have a meaningful
    conversation.

    I would appreciate if you could help me with following
    questions:
    1. Have you generated demo certs (ie. with localhost subject)? The certs are
    required to startup Managed Servers using NodeManager (two way SSL authentication between the admin server and NodeManager). I have
    all the links to SSL cconfiguration docs for WLS 6.1 by bea. I am
    wondering about how you set up your testing environment.
    2. Do you use UserTransaction for cluster wide transaction support?
    3. Have you noticed that in the article on session replication for
    tomcat( I am assuming that you have read it, of course ) Filip
    dosn't mention how he provides consistency of session id's
    accross the cluster ("...create a new session and replace the new session id ..." )? I haven't looked at the code yet. Any idea on top of your head, if
    that presents a problem ( overiding existing valid session )?
    4. Do you exploit least recently used concept in your product? If yes,
    do you use standard java implentation, a proprietary one or both?
    Let us say, I wouldn't want to periodically update cache unless there
    is need for that( need more memory or need to update).

    Thanks in advance,
    Vadim

    P.S. Also thanks to everyone who contributed to a very informative
    and educational thread.
  73. Vadim: I downloaded Coherence and SpiritCashe products and am planning to
    test them in same environment: WLS 6.1SP5 clustered(possibly 7 or 8). I am sure you have done some testing in similar environments.


    Yes. We've tested with WLS 5, 6, 7 and 8. We have some customers that are (or at least were) on 4.x as well.

    Vadim: The problem we are trying to solve is the following: we are implementing RBAC( www.nist.gov/rbac ), role based access control. The model's data, of course, belongs in cache. Updates are not frequent. We are considering several implementation options: caching middleware mentioned above are just two of our list of 5+ options. The one that I am considering vs. using your software is this:
    - load cache on the startup
    - publish list of cache object keys that require update through JNDI using UserTransaction while also updating the db
    - rely on the server's JNDI replication(???)
    - on every request, before checking the cache, check the "update required list" first and update cache object from db if required
    - run separate thread to check "update required list" to upload changes from db to keep it small


    You definitely can't rely on the WLS JNDI replication. You can use JMS topics or something like that instead. You are going to trade off between the cost of writing / debugging / maintaining it yourself vs. the cost of buying a trusted solution like Coherence.

    Vadim: I would appreciate if you could help me with following questions:
    1. Have you generated demo certs (ie. with localhost subject)? The certs are required to startup Managed Servers using NodeManager (two way SSL authentication between the admin server and NodeManager). I have all the links to SSL cconfiguration docs for WLS 6.1 by bea. I am wondering about how you set up your testing environment.


    Unfortunately, I did not set up the test environment, so I cannot provide you with an answer to that.

    Vadim: 2. Do you use UserTransaction for cluster wide transaction support?

    We support that in WebLogic 6.1 or later (since you use WLS) as well as on other app servers. That is accomplished through the J2CA adapter that Coherence ships with.

    Vadim: 3. Have you noticed that in the article on session replication for tomcat( I am assuming that you have read it, of course ) Filip dosn't mention how he provides consistency of session id's accross the cluster ("...create a new session and replace the new session id ..." )? I haven't looked at the code yet. Any idea on top of your head, if that presents a problem ( overiding existing valid session )?

    That is not a problem in the Coherence implementation. You can get the source code for the Coherence HTTP Session Replication module with your free developer license. Just drop an email to sales at tangosol dot com requesting a free developer license, and you will get a response asking for whatever information is necessary to provide the license.

    Vadim: 4. Do you exploit least recently used concept in your product? If yes, do you use standard java implentation, a proprietary one or both? Let us say, I wouldn't want to periodically update cache unless there is need for that( need more memory or need to update).

    Coherence 2.2 supports unlimited caches, LRU caches, LFU caches, LRU+LFU "balanced hybrid" caches, pluggable eviction policies, overflow caches, unlimited disk-backed caches, LRU disk-backed caches, NIO direct buffer and file-mapped caches, etc. All of these implementations are part of Coherence, which is a commercial product; Coherence does not utilize any third party code or libraries, including open source.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Easily share live data across a cluster!
  74. Hi Cameron,
               Since JNDI replication is a strict no no as far as distributed cache goes How about using Stateful EJBs for Vadims problem. The stateful EJB acts a wrapper to the cache and would work well in a clustered environment. Your thoughts on this...

    Satish Mandalika
  75. Satish: Since JNDI replication is a strict no no as far as distributed cache goes How about using Stateful EJBs for Vadims problem. The stateful EJB acts a wrapper to the cache and would work well in a clustered environment. Your thoughts on this...

    Stateful EJBs end up on only one server, with the possibility (with WLS) of a backup on a secondary server. They can usually survive the failure of the primary server, but that's not the same as a clustered cache.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Easily share live data across a cluster!
  76. Stateful EJBs end up on only one server, with the possibility (with WLS) of a backup on a secondary server. They can usually survive the failure of the primary server, but that's not the same as a clustered cache.

    so you are saying in a 3 node cluster the solution fails because the SEJB is replcated across the promary and secondary nodes. That is a good point but do think this soln would work if we decide to go with a 2 node cluster

    Does Tangosol mantain cache on all instances participating in the cluster

    Satish
  77. Satish: so you are saying in a 3 node cluster the solution fails because the SEJB is replcated across the promary and secondary nodes. That is a good point but do think this soln would work if we decide to go with a 2 node cluster

    Even in a two-node cluster, one of the two nodes always has to go across the network to talk to the stateful session bean. Further, in the entire cluster, only one thread at a time can access the stateful session bean. It's just not a good fit for most uses.

    Satish: Does Tangosol mantain cache on all instances participating in the cluster

    Coherence replicated caches do exactly that.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Easily share live data across a cluster!
  78. I just installed two instances the Tomcat4.0.3 and Configured Apache using mod_jk. Now Load Balancing is working fine (without session replication).After this
    i have added <Manager> Tag(default specified in the site) in the the server.xml in both servers.

    <Manager className="org.apache.catalina.session.InMemoryReplicationManager" protocolStack="UDP(mcast_addr=228.1.2.3;mcast_port=45566;ip_ttl=32)
    :PING(timeout=3000;num_initial_members=6):FD(timeout=5000):
    VERIFY_SUSPECT(timeout=1500):pbcast.STABLE
    (desired_avg_gossip=10000):pbcast.NAKACK(gc_lag=10;retransmit_timeout=3000):
    UNICAST(timeout=5000;min_wait_time=2000):MERGE2:FRAG:pbcast.GMS (join_timeout=5000;join_retry_timeout=2000;shun=false;print_local_addr=false)">
    </Manager>

    Now the server is not serving the page . Nothing is coming .
    Do i have to do something else ?

  79. I am attempting to set this up using Tomcat 4.03. I get the following error message:

    Exception during startup processing
    java.lang.reflect.InvocationTargetException: java.lang.NoClassDefFoundError: org/apache/catalina/session/StandardManager

    Does anyone know how to resolve this issue?

    Gary Carrington
    gary@naccrra.org
  80. It sounds like you're either missing a JAR in one of the configs or you've put the module too high up in the classloading hierarchy.

    Start with a fresh install of Tomcat 4.0.3 ... make sure you are installing the module as described in the instructions.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  81. Very interesting article.

    I have only one question. How do you comunicate the sessionId to the client?
    It is often done through a cookie named JSESSIONID, but this cookie is returned by the client only to the specified server that sent it, i.e. the server that created this session.
    When the same client connects with a different server in the same cluster, this second server already stores a session for this client, but can't recognises him because the JSESSIONID is not returned to it.

    You tested your installation with two instances on the same machine. So, the JSESSIONID cookie is returned to both instances.
    What about two instances on two different machines?

    Thanks.
  82. Hi Anna,

    Filip mentioned that he didn't actually set everything up as he suggested in his document. You'll need to set up a load balancer (hardware or software) and the cookies will appear to come from it, so they will go back to it and get routed to the appropriate servers.

    If you need a simple load balancer, we just posted one today as part of our Servlet 2.3 support HTTP session replication. Go to http://www.tangosol.com/coherence.jsp and download the item labeled "Coherence HTTP Session-Replication Module for Servlet 2.3 Compliant Servers". Inside is a coherence-loadbalancer.jar which is extremely easy to use (documentation in PDF is included).

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  83. Hi,

    I'm not sure that this is the right place to ask this but please bear with me, I'm a sysadmin and C programmer, only recently come to Java.

    I'm trying to set this up to prove it'll work to my boss, as per the article, using Tomcat 4.0.4, Apache 1.3.26 with mod_proxy, both vanilla installs, and I'm getting a Null Pointer in InMemoryReplicationManager, thus:

    java.lang.NullPointerException
    at org.apache.catalina.session.InMemoryReplicationManager.createSession(InMemoryReplicationManager.java:212)
    at org.apache.catalina.session.InMemoryReplicationManager.createSession(InMemoryReplicationManager.java:242)

    The offending code
     is:
    log("Replicated Session created at "+mChannel.getLocalAddress() + " with ID="+session.getId(),2);

    The error seems to be on the mchannel.getLocalAddress. I'm trying to track it down myself but as a Java newbie, any help would be massively appreciated!


    Thanks, and apologies once again if this is the wrong forum for this.

    Malcolm
  84. Code: log("Replicated Session created at "+mChannel.getLocalAddress() + " with ID="+session.getId(),2);

    Null pointer exception means that either "mChannel" is null (likely) or "session" is null. It's a bug in the code. I think Filip posted his email address somewhere above in this discussion.

    Your other option is to use the Coherence module for Tomcat 4 or the Coherence module for Servlet 2.3 compliant servers. Both available from http://www.tangosol.com/coherence.jsp.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  85. Hello, I had the same problem. It was solved by commentiong out the WARP connector in Tomcat's server.xml

    I have two instances of Tomcat on sigle PC and it looks like there was a clash between those two connectors (maybe it is enough just to put them on different ports). Watch you catalina.out for potential error messages.

    I disabled all connectors except one HTTP which I use.

    Greetings
    Jakub Cerny
  86. Hello Malcolm,

    I had faced the same problem. After a Bit of debugging , I removed the 'min_wait_time' attribute from the UNICAST protocol, I was able to bring up the enable session clustering in Tomcat. This attribute was not supported by the javagoup library (version 2.0.3), which I had in my system.

    You can try without specifying the protocolstack property in the manager, which will result in JChannel getting configured with the default protocol stack. (look at JChannel source code for the default protocol stack).

    regards,

    Priyesh K
  87. I've installed and tested and it seems to work well with simple classes. However, you must be careful of when using classes with properties, since objects are only replicated when setAttribute is called. Suppose on your login page you do:

        User u = new User();
        ...
        session.setAttribute("user",u);

    And on another page you do:

        User u = (User) session.getAttribute("user");
        u.setPagesServed(u.getPagesServed() + 1);

    The update to the user object will not be replicated unless you put it back in the session context as follows:

        session.setAttribute("user",u);

    Failing to do so would cause the object to become out of sync between the various servers. Since this is not necessary in a non-replicated environment I'll bet that most programmers do not not normally do it. Failing to do so could introduce some real subtle errors into the application!

    - Jim Ronan
  88. New project tomcat-javagroups on SourceForge[ Go to top ]

    FYI,

    Bruce Duncan has created a project on SourceForge to maintain the code written by Filip and used in this article. The reason is that the Tomcat folks didn't want to integrate this code into their codebase because of licensing issues with JavaGroups, and we (mainly Filip) were still getting patches from developers. This project is supposed to be the main repository for this code until Tomcat decide to integrate it into their codebase.

    The URL is http://sourceforge.net/projects/tomcat-jg/

    So if you patched the code, please feel free to submit the patch to any of the developers, or become a developer yourself.

    Bruce, Filip and I are currently the developers.

    Just yesterday I implemented *synchronous* replication, check it out (needs to be tested first though)...

    Bela Ban
  89. Greetings -

    Could you please clarify the comment in the text of this article ("In a real world scenario, this is not how you would do load balancing")? Is the intent of the discussed configuration strictly academic, or does the comment pertain only to the one section that it's in ("Testing The Installation")?

    Thanks!
  90. Problem with invalidating sessions[ Go to top ]

    Hello. I am having problems when i invalidate a session. It seems like it is doing circular calls to the expire function. I've modified the session example in tomcat to test this and it shows the same problem.

    if (dataName != null && dataValue != null) {
    session.setAttribute(dataName, dataValue);
                if(dataName.equalsIgnoreCase("invalidate"))
                    session.invalidate();
            }

    The tomcat log look like this:
    ----- Root Cause -----
    java.lang.StackOverflowError
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:149)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:179)
            at org.apache.catalina.session.StandardSession.expire(StandardSession.java:601)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:149)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:179)
            at org.apache.catalina.session.StandardSession.expire(StandardSession.java:601)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:149)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:179)
            at org.apache.catalina.session.StandardSession.expire(StandardSession.java:601)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:149)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:179)
            at org.apache.catalina.session.StandardSession.expire(StandardSession.java:601)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:149)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:179)
            at org.apache.catalina.session.StandardSession.expire(StandardSession.java:601)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:149)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:179)
            at org.apache.catalina.session.StandardSession.expire(StandardSession.java:601)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:149)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:179)
            at org.apache.catalina.session.StandardSession.expire(StandardSession.java:601)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:149)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:179)
            at org.apache.catalina.session.StandardSession.expire(StandardSession.java:601)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:149)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:179)
            at org.apache.catalina.session.StandardSession.expire(StandardSession.java:601)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:149)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:179)
            at org.apache.catalina.session.StandardSession.expire(StandardSession.java:601)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:149)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:179)
            at org.apache.catalina.session.StandardSession.expire(StandardSession.java:601)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:149)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:179)
            at org.apache.catalina.session.StandardSession.expire(StandardSession.java:601)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:149)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:179)
  91. Problem with invalidating sessions[ Go to top ]

    Hello. I am having problems when i invalidate a session. It seems like it is doing circular calls to the expire function. I've modified the session example in tomcat as shown below to test this and it shows the same problem.

    I added the check that if you put in a session value called invalidate it will invalidate the session. This check is found around line 67 in SessionExample.java.
    if (dataName != null && dataValue != null) {
       session.setAttribute(dataName, dataValue);
       if(dataName.equalsIgnoreCase("invalidate"))
           session.invalidate();
    }

    The tomcat log look like this:
    ----- Root Cause -----
    java.lang.StackOverflowError
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:149)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:179)
            at org.apache.catalina.session.StandardSession.expire(StandardSession.java:601)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:149)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:179)
            at org.apache.catalina.session.StandardSession.expire(StandardSession.java:601)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:149)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:179)
            at org.apache.catalina.session.StandardSession.expire(StandardSession.java:601)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:149)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:179)
            at org.apache.catalina.session.StandardSession.expire(StandardSession.java:601)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:149)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:179)
            at org.apache.catalina.session.StandardSession.expire(StandardSession.java:601)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:149)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:179)
            at org.apache.catalina.session.StandardSession.expire(StandardSession.java:601)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:149)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:179)
            at org.apache.catalina.session.StandardSession.expire(StandardSession.java:601)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:149)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:179)
            at org.apache.catalina.session.StandardSession.expire(StandardSession.java:601)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:149)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:179)
            at org.apache.catalina.session.StandardSession.expire(StandardSession.java:601)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:149)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:179)
            at org.apache.catalina.session.StandardSession.expire(StandardSession.java:601)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:149)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:179)
            at org.apache.catalina.session.StandardSession.expire(StandardSession.java:601)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:149)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:179)
            at org.apache.catalina.session.StandardSession.expire(StandardSession.java:601)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:149)
            at org.apache.catalina.session.ReplicatedSession.expire(ReplicatedSession.java:179)
  92. I have session replication configured on two servers with tomcat 4.0.6. I would like for the sessions to be replicated in a shorter amount of time. Is this possible?
  93. I have session replication configured on two servers with tomcat 4.0.6. I would like for the sessions to be replicated in a shorter amount of time. Is this possible?

    No. The replication that Filip did occurs initially (and immediately) on startup, then uses messages while running to stay in sync.

    If you need a higher performance solution, consider Coherence.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Easily share live data across a cluster!
  94. Dear Filip,

    Hi while removing an attribute from TomCat Session, I am facing stack overflow problem. Also the replication is not happenning at all. I am using default configuration specified by filip.

     <Manager className="org.apache.catalina.session.InMemoryReplicationManager"
                        debug="10"
                        printToScreen="true"
                        saveOnRestart="false"
                        maxActiveSessions="-1"
                        minIdleSwap="-1"
                        maxIdleSwap="-1"
                        maxIdleBackup="-1"
                        pathname="null"
                        printSessionInfo="true"
                        checkInterval="10"
                        expireSessionsOnShutdown="false"
                        serviceclass="org.apache.catalina.cluster.mcast.McastService"
                        mcastAddr="228.1.2.3"
                        mcastPort="45566"
                        mcastFrequency="500"
                        mcastDropTime="5000"
                        tcpListenAddress="auto"
                        tcpListenPort="4001"
                        tcpSelectorTimeout="100"
                        tcpThreadCount="2"
                        useDirtyFlag="false">
                   </Manager>

    Since useDirtyFlag is false..any changes to the objects should get replicated. But in my case not even setAttribute() calls are getting replicated.

    Any help ?

    Thanks

    Pankaj
  95. How many hits it can support[ Go to top ]

    Hi,

    I would like to know if we do clustering like this for tomcat servers, how many hits(client requests) at an instance that can be support by apache server-tomcat load balancing techniques?

    thanks,
    sudhakar

    sudhakar_koundinya@yahoo.com
  96. I would like to know if we do clustering like this for tomcat servers, how many hits(client requests) at an instance that can be support by apache server-tomcat load balancing techniques?

    It depends on the application, the implementation of the clustered HTTP session management, and the configuration. You will have to load test your application, and scale up the number of servers as you go, in order to determine that.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Easily share live data across a cluster!
  97. Great article.
    I would like to know if the use of the HttpSessionActivationListener interface of the Servlet 2.3 API, is relevant in this context?

    In order to keep the session object small to minimize the network overhead the big objects are defined transient.
    For each main Tomcat instance I define a Tomcat fail over instance.
    If the main instance crashes, is the sessionDidActivate method for each session of the fail over instance called?
    It would be a great place for rebuilding the transient objects.

    Your opinion would be appreciated.

    Fred