Discussions

News: Monitoring Session Replication in J2EE Clusters

  1. Monitoring Session Replication in J2EE Clusters (26 messages)

    "Clustering has become one of the most common and most emphasized features of J2EE application servers, because, along with load balancing, it is the fundamental element on which the scalability and reliability of an application server rely." Fermin Castro of Oracle discusses a tool for obtaining metrics for the most-common processes involved in an application server cluster: serialization and IP multicast. It also shows how to interpret those metrics and identify potential issues in a cluster. Finally, it briefly presents some hints for solving those problems.

    Conclusion
    There are many reasons why the environment of a cluster may change through the lifecycle of an application. Thus, it is necessary to regularly monitor the cluster's proper functioning and prevent performance and stability issues. This article showed how to use a simple Java tool to obtain various metrics related to correct cluster behavior. It also discussed why it is important that serialization and IP multicast take place rapidly and covered how to monitor both. The next step: apply the necessary corrections once the problem is detected. But that's a subject that's sufficiently complex to merit its own article.
    Read: Monitoring Session Replication in J2EE Clusters

    Threaded Messages (26)

  2. Nice but theory[ Go to top ]

    A nice articel which describes the concepts behind in-memory session replication. The question is, if the results you get from the Test program really help you finding bottlenecks in your clustered application.

    As the author say, the algorithms of the application servers are different and so not compareable to the results of the real application. And if session data doesn't change a lot between two requests the affort for replications is low.

    The author is right, that session should be between 3-5KB, but what I see with our customers is, that sessions exceed this limitation. 30-50KB are "normal" and I've see session that had about 1MB of data.

    The consequence is, that you need goods clustering algorithms and APIs that can manage session data > 5KB. If you save session data to a database, you also need a piece of software that manages this for you - especially a good caching algorithm to reduce the number of database accesses.

    Mirko
    codecentric
    "Your code is our source"
  3. Title a misnomer?[ Go to top ]

    The story is really about testing multicast performance on your network and nothing more. It does not monitor session replication nor does it monitor the performance of session replication, which will be very dependent on the appserver in question.
  4. Nice but theory[ Go to top ]

    (Disclaimer: Our Coherence product includes a session management module, Coherence*Web, which addresses all of the problems discussed in the article, and is used for high-scale applications like the one discussed in this article.)
    As the author say, the algorithms of the application servers are different and so not compareable to the results of the real application. And if session data doesn't change a lot between two requests the affort for replications is low.
    Some approaches send just the modified attributes, some send the entire session. Each has its advantages and disadvantages; for example, while sending the entire session might be slower, it does support complex object graphs (where the same object shows up in more than one session attribute's object graph.) We support both for that reason.

    Furthermore, the mention of multicast should throw up a red flag -- any server using multicast to manage session information will scale extremely poorly under load and probably cause all sorts of network issues. That's why they have to suggest that you use small clusters and tiny sessions (3-5KB):
    # Reduce the number of replicating nodes in your cluster (some application servers allow you to isolate the nodes that work together in small groups).
    # Reduce the HTTP session object to the minimum amount of relevant information.
    # Save your session information to a database or file regularly, freeing the HTTP session object of your application.
    The problem I have with the suggestions is that the server vendor is putting the onus on the application developer for a piece of functionality that should just work (tm).

    The real problem, though, is this statement:
    This implies that whenever your servlet/JSP engine is using multicast for replicating objects across application server instances, the objects will not appear on other nodes before 300 msecs have passed. Consequently, if a failover happens and the load balancing takes place in less than 300 msecs (which you can expect from most application servers), your client request will find an older version of the object in the new node and the application will become inconsistent.
    In other words, even if you follow all of the suggestions, the approach being used will leave a big window of opportunity for data loss. That might be acceptable as an option ("put a check in this checkbox to allow your app to lose data but it will make the server run faster") but it shouldn't be default behavior.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  5. Nice but theory[ Go to top ]

    In other words, even if you follow all of the suggestions, the approach being
    >used will leave a big window of opportunity for data loss.

    So what we have to do? :-)
    What is a latency in Coherence?

    Dmitry
    http://www.servletsuite.com
  6. Nice but theory[ Go to top ]

    To minimize the potential for data loss, the session update should be synchronous, and should be processed by the time the request processing completes.

    It is relatively common for application servers to asynchronously update the session data in the cluster, which is what permits session data to be easily lost.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  7. Nice but theory[ Go to top ]

    Mirko-> I hope I haven't confused you by using the term "session". I am referring to the amount of information directly attached to the HTTPSession object; this is not referring to the memory that a user can consume in one session inside a container. If you are really meaning that an application is placing 1MB in the httpsession, I would suggest to periodically save part of that information to a database or file, and cleaning up the memory, otherwise, I hardly believe that any system like that would scale (500 concurrent users would imply 500MB just in the httpsession, going back and forth between nodes...NO GOOD, no matter what replication mechanism you use, even if you use Cameron's :-) It would be, simply, a bad design.

    Cameron-> I am sure that you are totally aware that even worse scalability issues can arise when using a synchronous model for session replication. Among others: worse performance for "normal" sessions and coupling between nodes... These, need to be taken care of with much care since they affect not only big sessions but any session being replicated.

    If an application really needs to store a lot of information in the session (which is something that I think that can be eluded most of the times) there are several better solutions that application servers provide (directly or thru frameworks) and that you can take advantage of. Saving data to a database (like I suggest in the article) will scale and guaranty no data loss much better than any in-memory mechanism. Even Cameron's :-)

    Cheers!
  8. Nice but theory[ Go to top ]

    Cameron-> I am sure that you are totally aware that even worse scalability issues can arise when using a synchronous model for session replication. Among others: worse performance for "normal" sessions and coupling between nodes... These, need to be taken care of with much care since they affect not only big sessions but any session being replicated.If an application really needs to store a lot of information in the session (which is something that I think that can be eluded most of the times) there are several better solutions that application servers provide (directly or thru frameworks) and that you can take advantage of. Saving data to a database (like I suggest in the article) will scale and guaranty no data loss much better than any in-memory mechanism. Even Cameron's :-)Cheers!
    I think saving data to a database will gain you effectively very little over replicating to a dedicated different cluster node, in the way that, for example, BEA WebLogic works. Instead of bothering all your cluster boxes with session replication, just talk to a single dedicated failover node.

    You would still need sticky sessions, cause otherwise you'd need to pull the session data out of the database for every request, which does not seem to be very clever.

    What you might gain is transactional atomicity, but that is about it. The data needs to travel the wire nevertheless, one way or the other.

    There used to be a quite nice way of preserving session data - stick them into each HTTP request/page. Of course there would be some security issues and more marshalling, but on the other hand, you'd end up with fully independent "cluster nodes".
  9. Nice but theory[ Go to top ]

    Saving data to a database (like I suggest in the article) will scale and guaranty no data loss much better than any in-memory mechanism.
    What you might gain is transactional atomicity, but that is about it.
    Committing session data to the database each time the page is served is inconvenient if the regular data tables are used, because it would be hard to return database to the previous state, so separate tables are needed to keep temporary session data. One can choose to keep database transaction open while a user is browsing back and forth, but this approach may lock shared data for an undertermined time, which is not the best practice in the database world. Database transactions are supposed to be as short as possible.
    There used to be a quite nice way of preserving session data - stick them into each HTTP request/page. Of course there would be some security issues and more marshalling, but on the other hand, you'd end up with fully independent "cluster nodes".
    * Ugly URLs for GET method
    * Necessity to keep strict page order
    * URL cannot be modified by a user; a user cannot select page of his choice
    * Back button would not work, because prior URL does not have newer session data (oh, I forgot, you advocate for showing snapshots of past pages instead of keeping View in sync with the Model)

    This is a very unfriendly approach. But it is not suitable for 1MB data anyway.

    I prefer to store session data in memory and to synchronize it with database at checkpoints.
  10. Nice but theory[ Go to top ]

    Ugly URLs for GET method* Necessity to keep strict page order* URL cannot be modified by a user; a user cannot select page of his choice* Back button would not work, because prior URL does not have newer session data (oh, I forgot, you advocate for showing snapshots of past pages instead of keeping View in sync with the Model).
    Speaking of which: This is EXACTLY what user experience would be all about. If I press "back" on a "page" I would indeed expect that the "state" of the application mimics exactly the state of the page and NOT be in a state that is only known to the server but hidden from the browser.

    Your comment means you either have to re-request the page from the server with your proper "model" data populated or you have to "code your own back button" or "back button protection". Unfortunately users do have some control on how back button and cacheing behaves and it is pretty funny that you choose to ignore that.

    As for 1 meg of session data: You are right, I would not put it in the web page. On the other hand, if you have 1 meg of session data in the first place, something seems to be fairly wrong, in most application scenarios. Not because there is no such amount of session data, but the design would severely limit the numbers of user you'd be able to support, say, on a 4 Gig Box.
  11. Nice but theory[ Go to top ]

    Ugly URLs for GET method* Necessity to keep strict page order* URL cannot be modified by a user; a user cannot select page of his choice* Back button would not work, because prior URL does not have newer session data (oh, I forgot, you advocate for showing snapshots of past pages instead of keeping View in sync with the Model).
    Speaking of which: This is EXACTLY what user experience would be all about. If I press "back" on a "page" I would indeed expect that the "state" of the application mimics exactly the state of the page and NOT be in a state that is only known to the server but hidden from the browser. Your comment means you either have to re-request the page from the server with your proper "model" data populated or you have to "code your own back button" or "back button protection".
    Well, there are two different cases: one is when you browse online information, like a hypertext book or a dictionary or an online newspaper. In this case you expect to return to the previous page and to see the same information. Yes, I said online newspaper not news website. Because newspaper is static and you can go back to the previous page and read it again. News website does not have to be static, so if you started with home page which contains headlines, you may see different information when you return "back" to this page. Which is why there is the difference between a static hypertext and a live application.

    Web application is the clear example of live website. With web application you do not browse some text and pictures, you work with data. Each and every application tries its best to supply the user with the most recent and up-to-date data, but you suggest to go "back" and to work with something which does not exist anymore instead. It should not be possible to return "back" after you submitted the shopping basket and to submit it again simply because the basket is not some information you read from the screen, it is valuable data you work with (and pay for).

    Again, as a user of data, I would prefer to be able to jump to different parts of an application and to retain all my data that I previously collected or entered. Your strict page order does not allow that. It restricts a user to only one path, which is not what a web user got used to. You want to bring back the old times of terminals with "enter this" and then "enter that". Why?
    Unfortunately users do have some control on how back button and cacheing behaves and it is pretty funny that you choose to ignore that.
    Absolutely not. Don't say that you don't know about cache control headers in the HTTP response and about cache control pragmas on the pages themselves. They allow to tell a browser that a particular page is dynamic and should not be cached. Most browsers work by the rules, it is only the Firefox which thinks too smart about itself and caches everything it can. This is simply the Firefox bug.
    As for 1 meg of session data: You are right, I would not put it in the web page. On the other hand, if you have 1 meg of session data in the first place, something seems to be fairly wrong, in most application scenarios. Not because there is no such amount of session data, but the design would severely limit the numbers of user you'd be able to support, say, on a 4 Gig Box.
    The example with 1MB data was not mine, but as you can read from one of the previous posts, there are applications with session data of about 100K, and they work pretty well. Anyway, I doubt that anyone in their right mind would stick something bigger that couple of kilobytes into the request.
  12. Nice but theory[ Go to top ]

    Cache control headers in the HTTP response ... allow to tell a browser that a particular page is dynamic and should not be cached. Most browsers work by the rules, it is only the Firefox which thinks too smart about itself and caches everything it can. This is simply the Firefox bug.
    I want to apologize, this is not correct. In my defense I was not the only one who thought it was a bug. Mozilla/Firefox shows the cached page if a user goes back and forward in page history, even if a page is marked as "no-cache". To ensure that a page is reloaded, use "no-store" response header. MSIE reloads a page if it is marked as "no-cache" while going back and forward in history.
  13. Nice but theory[ Go to top ]

    As for 1 meg of session data: You are right, I would not put it in the web page. On the other hand, if you have 1 meg of session data in the first place, something seems to be fairly wrong, in most application scenarios. Not because there is no such amount of session data, but the design would severely limit the numbers of user you'd be able to support, say, on a 4 Gig Box.
    If the sessions were kept in memory, the problem would be much worse than that, since JVMs need to keep their heap sizes down to avoid GC penalties. However, the sessions don't _all_ have to be kept in memory all the time; that's what overflow caching is for.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  14. Nice but theory[ Go to top ]

    If you are really meaning that an application is placing 1MB in the httpsession, I would suggest to periodically save part of that information to a database or file, and cleaning up the memory, otherwise, I hardly believe that any system like that would scale (500 concurrent users would imply 500MB just in the httpsession, going back and forth between nodes...NO GOOD, no matter what replication mechanism you use, even if you use Cameron's :-) It would be, simply, a bad design.
    Again, you're missing the point. The application server has to provide the API, but it's up to the application server vendor how smart the implementation is behind the API. If you kept all of the HTTP sessions in memory, then yes, you'd run out of memory if they were all 1MB+.

    However, you do have choices, such as monitoring the number / sizes of sessions, and rolling them out to a DB or just to a local disk (which is much faster and less expensive and more scalable than a database.)

    To say "it's a bad design" is a cop-out. If an app server can't handle even 1MB of user data, can you please tell me why someone would pay thousands of dollars per CPU for that app server? You can't just say "well, the customers are stupid to do it that way." If the customers (in this case, application developers) need to store 1MB in a session, then you need to find a way to support it, or you rightly won't sell much software.

    Sorry for the rant, but I think that Oracle has enough resources to solve the problem, instead of blaming the customer. ;-)
    Cameron-> I am sure that you are totally aware that even worse scalability issues can arise when using a synchronous model for session replication. Among others: worse performance for "normal" sessions and coupling between nodes... These, need to be taken care of with much care since they affect not only big sessions but any session being replicated.
    I'm not sure what design you are assuming, but a latency-implicit operation (such as backing up data onto another node in a cluster) is certainly not a "scalability issue". In fact, it is quite similar to doing something with a JDBC connection (high latency but very low resource utilization on the client,) which is why web applications that use a database typically have more threads (since at any time, most will be in a blocking state waiting for the database.)

    As far as the size of the session, that does not necessarily have to impact performance. If you think about it, it is the amount of data being changed during in a request that affects the minimum theoretical cost of making sure that a session has at least one up-to-date backup in the cluster. (We do have customers with 1MB+ sessions, and we explicitly designed to support such situations, including the obvious abilities to roll out to disk and/or databases.)
    If an application really needs to store a lot of information in the session (which is something that I think that can be eluded most of the times) there are several better solutions that application servers provide (directly or thru frameworks) and that you can take advantage of.
    Again, I'm not sure why you would suggest recoding an application to use some proprietary framework when there is a standard API that you (the app server vendor) get to implement called java.servlet.http.HttpSession. That is your personal invitation to provide "a better solution." ;-)
    Saving data to a database (like I suggest in the article) will scale and guaranty no data loss much better than any in-memory mechanism.
    No, saving session data to a database won't scale any better than the database scales, and the database is already typically the single-point-of-bottleneck, and it is almost always the most expensive point in the application infrastructure. Storing non-transactional non-persistent data into an expensive transactional + persistent data store seems like an ideal way to waste money and slow an application down.

    However, you don't have to take my word for it. There are load-testing tools available to test the relative merits of different solutions. If you provide the 100-server cluster and the Oracle licenses, I'll donate the Coherence licenses to do a scalable performance test. ;-)

    BTW - if you're located in the bay area, I'll be in east bay tomorrow (Tuesday) night at the East Bay BEA user group. I'd be glad to discuss the topic further.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  15. Nice but theory[ Go to top ]

    You (and anyone in this thread, of course) can reach me at fermin dot castro at oracle dot com.

    Cheers!
  16. Nice but theory[ Go to top ]

    ...rolling them out to a DB or just to a local disk (which is much faster and less expensive and more scalable than a database.)
    Totally. NFS is the best cluster fabric I know: pervasive, free, simple, fast, and with very useful transactional atomicity guarantees that approach the power of a tuple space. NFS, yet another brilliant Sun invention.
  17. Nice but theory[ Go to top ]

    ...rolling them out to a DB or just to a local disk (which is much faster and less expensive and more scalable than a database.)
    Totally. NFS is the best cluster fabric I know: pervasive, free, simple, fast, and with very useful transactional atomicity guarantees that approach the power of a tuple space. NFS, yet another brilliant Sun invention.
    Right, but even non-NFS local mounts (EXT2, even FATx) will work fine, because it's just "overflow" for the memory on that particular cluster node.

    However, with NFS you could do some even more interesting things ;-)

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  18. Nice but theory[ Go to top ]

    Right, but even non-NFS local mounts (EXT2, even FATx) will work fine, because it's just "overflow" for the memory on that particular cluster node. However, with NFS you could do some even more interesting things ;-)
    Yes, given the right lifecycle modelling, failover can be accomplished using NFS, which is something a private drive can't do. When requests and/or sessions live in NFS, any idle host can pick up where a prior host left off. The required transactional atomicity is satisfied by NFS. Eg, an atomic take of a checkpoint file (eg, serialized HttpSession) from a shared data space can be accomplished with the Unix 'mv' command. NFS guarantees that exactly one worker gets the pickle file. No need to designate a specific failover host, and no need for middleware.
  19. Not just marketing[ Go to top ]

    Cameron, this is the right point - not the app server vendors should decide what is good and what is bad design - the customers should. (But the vendors should give some hints on that) And customers have to live with there developers...

    I've tested Coherence with one of our customers - and I have to say that it worked great. The customer had session sizes with an average of 100K - and Coherence managed them in a very performant and easy way. At the end the customer chosed to go on zSeries and there are some other mechanisms available for clustering (WLM).

    I know many projects with that session size - if you integrate with legacy systems like IMS or CICS customers often put the former "scratch pad area" data in the HTTP session, replacing 3270 with HTML. Making tables fot the data in a database would result in a duplication of the data and in a lot of work. So a "just working" app server is what customers want.

    - Mirko -
    codecentric
    your code is our source
  20. Nice but theory[ Go to top ]

    To minimize the potential for data loss, the session update should be synchronous, and should be processed by the time the request processing completes.It is relatively common for application servers to asynchronously update the session data in the cluster, which is what permits session data to be easily lost.Peace,Cameron Purdy
    Hello Cameron,

    do you know which app-servers do asynchronous in-memory replication and which do not? As far as I know WebLogic replicates session states just between two server instances (primary and secondary server), i.e. point-to-point. But I do not know if that is done synchronous or not.


    Best regards,
        Dirk
  21. Nice but theory[ Go to top ]

    do you know which app-servers do asynchronous in-memory replication and which do not? As far as I know WebLogic replicates session states just between two server instances (primary and secondary server), i.e. point-to-point. But I do not know if that is done synchronous or not.
    No, I don't know. Only the open source ones are obvious in their implementations (being open source ;-) and they all seem to be async, with one even spinning up a new thread to handle each HTTP session modification.

    IBM WebSphere has HTTP session persistence to a database that appears to support both synchronous and asynchronous options, and I think that those options are also available for the in-memory session replication that they do. However, I haven't been able to find anyone using their in-memory session replication yet, so if know anyone using it, drop me an email (cpurdy _at_ tangosol.com).

    BEA WebLogic seems to be async, but it's also lossy under load (it discards messages that don't get through fast enough.) That is (IMHO) a big problem. However, to its credit, it was the first one that was out in the market doing the clustering feature, and it's pretty stable and quick as long as the load (and the cluster size) is reasonable. (I personally learned J2EE clustering on WebLogic, and a lot of the terminology I use etc. comes from there.)

    I haven't tested Orion (which Oracle AS is based on) in a couple of years, but it used to be pretty quick, and also supported ServletContext clustering, which I think is a nice option. Orion was one of my favorite J2EE servers to work with. We don't support Oracle AS yet with Coherence*Web, so I haven't gotten to see how Oracle AS is set up in this respect.

    Caucho Resin is sync over TCP/IP, I think, using a ring algorithm, but I haven't actually configured / tested it myself. However, it is open source (not "free and open" but "open" as in "published".) (We are adding support for Caucho Resin in our upcoming 2.5 release, which should be entering pre-release today.) For a web container (servlet & JSP), I think that Resin is one of my favorites.

    There are some other commercial offerings to do add-in HTTP session management as well. I know some other vendors in our market (e.g. SpiritSoft) have a filter-based module for doing HTTP session management, for example. I don't know any of their technical implementations, and whether they are sync or async or both.

    As I mentioned, the sync approach should in practice be significantly faster than a database approach under light load (non-transactional, API instead of SQL) and can be quite scalable as long as it's point-to-point (not multicast) and spreads the load over the cluster. I believe Caucho Resin also meets all those requirements, for example. I don't have test data in front of me, but with sticky load balancing optimizations (which most sites will want to use) the sync latency on a full HTTP session update can be as low as 2ms or so, and can be higher as the amount of changes to the session increase. That compares favorably with the 300ms number mentioned in the article.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  22. I'm familiar with the clustered environment. Just today I was think "this project sure is a cluster**ck".
  23. You should get one of our "fluster clucked" shirts ;-)
  24. What kind of HTTPSession sizes are people working with ?
    You can qualify your answers with specific scenarios/techniques etc.

    Our session size is 2 Meg and I thought that was outrageously large.(Weblogic 8.1 session affinity turned on)
    I thought keeping them in the 10 to 15KB range was the norm.

    Maybe not ?
  25. You should try to keep session sizes down to 2-4KB range, but that is extremely hard if you are doing anything complex.

    Typical apps are probably in the 50KB range, from what I've seen.

    500KB and even 1MB+ are not that unusual, but for many concurrent users, it can be a big problem.

    There is an article on dev2dev about session management that talks about how to make some attributes transient to avoid the clustering overhead with WebLogic.

    http://dev2dev.bea.com/pub/a/2005/05/session_management.html

    Peace,

    Cameron Purdy
    Tangosol Coherence: High-Scale and HA for HTTP Session Management
  26. Agreed. Keeping an HttpSession size down can be quite hard sometimes. A useful monitoring utility for this (and many other things too!) is MessAdmin. Check it out!
  27. Replication Protocol[ Go to top ]

    I know this is an old post, but I saw some comments on session replication using multicast. Although it is true that WebLogic clusters use multicast for certain replication capabilities (such as JNDI replication), in-memory session replication uses a synchronous RMI call to a predetermined secondary host. As of WebLogic 10gR3, asynchronous session replication has also been added. Also of note is that Coherence is now also an Oracle product, and very complimentary to WebLogic Server.