Discussions

News: High availability Tomcat: Configuration Options

  1. High availability Tomcat: Configuration Options (74 messages)

    Tomcat 5.5 has touted many performance improvements. There are also a lot of questions on how to tweak performance for the server, the machine, and more. When should you front Tomcat with Apache? This article tries to answer the questions.
    If you run only one instance of Tomcat, you lose requests/sessions whenever you upgrade or restart your site. In this article, author Graham King presents simple steps for connecting a pair (or more) of Tomcats to Apache using the JK2/AJP (Apache JServ Protocol) connector and to each other using Tomcat 5's clustering capabilities. Any of the Tomcat servers can be stopped or started without affecting users. With an Apache/Tomcat cluster in place, you can easily adjust your configuration for a range of load-balancing and failover scenarios.

    High availability Tomcat

    Threaded Messages (74)

  2. You can achieve HA tomcat without apache, if you dont need the extra things apache can bring to the table (a typical J2EE application does not need anything from apache) just dont bother with it,it's opening security holes for nothing and add complexity. You can dispatch request using a lot of things, many hardware firewall can load balance http/https requests with sticky connections, iptables can do it also.
  3. You can achieve HA tomcat without apache, if you dont need the extra things apache can bring to the table (a typical J2EE application does not need anything from apache)
    In my experience Apache is very good at handling https and I certainly do not want to redo everything in Java. What is wrong in leveraging the proven platform?
  4. hardware accelerators[ Go to top ]

    many of the heavy https sites i know use hardware accelerators, so using apache for https isn't common for large sites. It might be common for small sites, but large e-commerce setups tend to use hardware accelerators. Some food for thought.
  5. hardware accelerators + Apache[ Go to top ]

    I thougth many hardware accelerators was plugged into
    systems running Apache ?
  6. hardware accelerators[ Go to top ]

    many of the heavy https sites i know use hardware accelerators, so using apache for https isn't common for large sites. It might be common for small sites, but large e-commerce setups tend to use hardware accelerators. Some food for thought.
    Does somebody know what software runs on those 'hardware' accelerators?
  7. from what I know[ Go to top ]

    SSL/HTTPS accelerator handle all the encryption and the webserver like tomcat just see plain HTTP. No software is needed in many cases, unless you're using a SSL/HTTPS ethernet card. Rainbow technologies used to sell network attached accelerators, but I haven't looked at their offerings in a few years.
  8. from what I know[ Go to top ]

    SSL/HTTPS accelerator handle all the encryption and the webserver like tomcat just see plain HTTP. No software is needed in many cases, unless you're using a SSL/HTTPS ethernet card. Rainbow technologies used to sell network attached accelerators, but I haven't looked at their offerings in a few years.
    It is clear that there is no need for additional software on lets say Tomcat.

    My question was about software that runs inside on that 'hardware' accelerator. Could it be that king is naked and there is a stripped down Apache inside?
  9. Hardware Accelerators[ Go to top ]

    The system I'm familiar with is the F5 load balancers, and they run a BSD variant.

    The 'hardware accelerator' portion of HTTPS is a bit-o-hardware that is used to quickly create the keys necessary for an SSL connection. Once the connection is set up and negotiated, the hardware is out of the loop, and overall encryption etc. is done by sofware.

    So, if your application is something like a typical store front that has lots of short term users, then an accelerator can help you, because they can speed up the creation of new channels.

    But if your application is more set up to handle a bunch of static users, users that log in at the beginning of the day, run all day, then log out, like in a remote call center or something, then an SSL accelerator won't gain you much at all.
  10. Hardware Accelerators[ Go to top ]

    Wow, thanks a lot!
  11. Some well known vendors of accelerators[ Go to top ]

    Rainbow technologies and nChipher used to be popular solutions back in 1999-2001. According to rainbow's site, they merged with SafeNet. IBM, and Cisco also sell SSL accelerators. In the case of Cisco, some of the models have SSL cards, while others have it build in. I don't work for any of these companies, but some people might find it useful.

    http://www.safenet-inc.com/
    http://www.ncipher.com/index.php

    I'm totally bias, but the great thing about using SSL accelerators is it makes it easier to stress and load test webapps. BigIP used to a popular solution also, but I don't if that is still true today.
  12. apache[ Go to top ]

    1) I suppose that the Apache was chosen as a free solution as the Tomcat. HW load balancer will cost you something.

    2) IP tables as a load balancer are mentioned in the article. But I have not heard about this solution. Do you know somebody +/- of this solution???

    2) It can be good idea for J2EE app to store some static files (e.g. pictures) on the C/C++ web server (e.g. Apache) on some other machine for performance reasons.

    3) Closed source J2EE servers, which use the Tomcat as a web container, also support reverse proxy plugins from the Apache.
  13. typo[ Go to top ]

    Do you know somebody +/- of this solution???
    Does somebody know +/- of this solution???
  14. I agree with you in a pure Java enviroment. But I think httpd apache server is the only way to go when many different technologies build up the site (Java, Script languages like Perl, Python or PHP, CGI or Streaming media). Many non-java apps just offer an httpd apache -or IIS in some cases- front-end (Mailman, Zope, Webmin, ...). So the most simple way to achive HA of the internal Tomcat apps is through httpd apache.


     P.S.:

     For plain tcp load-balancing I suggest this page:
    http://pythondirector.sourceforge.net/
  15. In my experience Apache is very good at handling https and I certainly do not want to redo everything in Java
    You can leverage apache HTTPS stack, it works quite well.
    What is wrong in leveraging the proven platform?
    Keep it simple, that's all, the less the better, the more layer you have, the more issues will arise.
    It can be good idea for J2EE app to store some static files (e.g. pictures) on the C/C++ web server
    Caching,Gzipping and such are really not a good reason to use Apache, using filter you can gzip and cache ressources pretty easily (oscache,ehcache and so on will do it for you).
    IP tables as a load balancer are mentioned in the article. But I have not heard about this solution. Do you know somebody +/- of this solution???
    Have a look at this link: http://linas.org/linux/load.html There are many ways to load balance a tcp/ip connection under linux leveraging just the OS stack or using a little free daemon.
  16. Bleh, no preview nor edit button ?? Anyway sorry for the typo, i meant You can leverage Tomcat https stack...
  17. Bleh, no preview nor edit button ?? Anyway sorry for the typo, i meant You can leverage Tomcat https stack...
    Less is MORE :), so I like SSL configured once on Apache for a few tomcat instances running on different machines.

    PS: I adopted policy of typing messages in word processor and copying them into forum’s text area…
  18. Shame for TSS[ Go to top ]

    Bleh, no preview nor edit button ??

    It is shame for TSS that there is no post preview and/or edit options when almost all forums on the Internet have it!
    Spellchecker could help too!

    MC
  19. Actually, you're complaining to the wrong party. Browsers simply ship pathetic text areas - spell checking should be their domain. And how often - after having typed for like 30 minutes - have I typed ^-W and closed my window. Any warning from the browser? "The form has been modified. Are you sure you want to close?" would be a valuable addition.

    Browsers as application UIs just don't cut it.
  20. FireFox Extensions[ Go to top ]

    Spell Checking: use SpellBound

    Tab Recovery: Use UndoCloseTab

    Later,
    Rob Misek
    Tangosol, Inc.
    Coherence: It just works.
  21. FireFox Extensions[ Go to top ]

    Spell Checking: use SpellBoundTab Recovery: Use UndoCloseTab

    Thanks. Still, they're extensions. For things that should have been in there for years. Zero install?

    Does UndoCloseTab bring back my form contents? I doubt it. That sounds as good as a ReopenOldVersionOfFileAfterAccidentallyClosingIt-Plugin for Word.

    Matthias
  22. When should you front Tomcat with Apache?

    There are many good reasons to have a "web tier" in front of Tomcat (or Weblogic or Websphere or whatever J2EE container you're using). Apache isn't the fastest, but it seems to work well. Here are some of the reasons to add a web tier:

    1) Buffering stream input (data coming from the brower) - this is important for applications with different speed / quality client connections (e.g. dial-ups vs. DSL).

    2) Buffering stream output (data going back to the browser) - this is even more important for applications with different speed / quality client connections - you don't want to waste a thread in your app tier waiting on data to get acked through a dialup connection.

    3) Connection buffering and throttling (lots of incoming connections) - the handling for these conditions is often more efficient and more graceful in a web server; why waste your app tier resources handling this?

    4) SSL - assuming you don't hardware accelerate it in your hardware load balancer, it's typically much more efficient to handle the SSL in the uber-scalable web tier than in the app tier.

    5) Security - there can be a firewall between the web tier and the app tier, and a firewall in front of the web tier, and only one port (assuming no ssl) open in either firewall.

    A "typical" configuration that we see is:

    1) Firewall
    2) Redundant hardware load balancers
    3) Farm of web servers (e.g. Apache)
    4) Firewall
    5) Redundant hardware load balancers (note: these can even be the same ones as #2 above)
    6) Cluster of app servers

    When I say it's "typical", I mean it's almost a rule of thumb for Internet-facing applications; we'll see a config like this (only minor difs) in 8 or 9 out of 10 companies.

    OTOH, if the app has light load, there's no reason to do any of this complexity, other than for the security reasons.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  23. 2) Buffering stream output (data going back to the browser) - this is even more important for applications with different speed / quality client connections - you don't want to waste a thread in your app tier waiting on data to get acked through a dialup connection.
    I thought this happens even in your "typical" configuration. Once an app server receives request, it responds to the user directly. Isnt it?
  24. 2) Buffering stream output (data going back to the browser) - this is even more important for applications with different speed / quality client connections - you don't want to waste a thread in your app tier waiting on data to get acked through a dialup connection.

    I thought this happens even in your "typical" configuration. Once an app server receives request, it responds to the user directly. Isnt it?

    Yes, and writing bytes to that TCP/IP socket (e.g. when you flush the buffer or exceed its size -- see ServletResponse#flushBuffer() and #getBufferSize()) means that you have to wait until those bytes are acked by the other end of the line. Do you want your app server machine spending threads to wait on slow client connections? It's the "service" thread for each request that will be blocked doing that write, which means that slow clients will act as a form of distributed denial-of-service (DDOS) attack.

    Read the docs on Apache mod_proxy to see how Apache can be helpful for this issue.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  25. I need some help ...

       How reliable and good is it to use a keytool utility from java in case i dont have a expensive ssl certifcate ?
  26. There are many good reasons to have a "web tier" in front of Tomcat
    Many people see tomcat as a servelt container only, it's a web tier as well and then can buffer input/output just as well as apache i believe.
    When I say it's "typical", I mean it's almost a rule of thumb for Internet-facing applications; we'll see a config like this (only minor difs) in 8 or 9 out of 10 companies.
    Well it's not my fault if most companies blindly followed J2EE/EJB hype, remember: " divide to conquer (horizontal scalability: Firewall -> Web Farm -> Firewall -> App Farm -> DB)" they said and advocated pretty much everywhere, ho weird, sounds like your "typical" configuration.....

    Reality is really different, most web apps are in perl/php/asp/jsp/servlet, there is no EJB (so no what's so called app tier) and a database, the common configuration really is:

    1) Firewall which can load balance
    2) Clustered web servers
    3) Database (Master/slave - Multi master)

    My main grief about this apache/tomcat thing is that anytime i hear about tomcat, i see apache in the corner, the old days where tomcat was just a servlet container are gone and i say, give Tomcat a chance to prove that it can handle the job by itself, it does not need to be proxied by apache to either provide performance or/and HA/scalability.
  27. There are many good reasons to have a "web tier" in front of Tomcat

    Many people see tomcat as a servelt container only, it's a web tier as well and then can buffer input/output just as well as apache i believe.

    Christian, you are free to believe what you will. After you read the Tomcat sources and the Apache sources, I doubt you will come to the same conclusion -- unless your faith is stronger than your desire to learn the truth.
    When I say it's "typical", I mean it's almost a rule of thumb for Internet-facing applications; we'll see a config like this (only minor difs) in 8 or 9 out of 10 companies.

    Well it's not my fault if most companies blindly followed J2EE/EJB hype ..

    OK, I'm not sure where the defensiveness is coming from. Nor do I know where the EJB references come from. Systems running non-Java CMS (you know, all the major media sites) also front with a rack of Apaches. What does that have to do with J2EE? Or even Java?
    Reality is really different, most web apps are in perl/php/asp/jsp/servlet, there is no EJB (so no what's so called app tier) and a database, the common configuration really is:1) Firewall which can load balance2) Clustered web servers3) Database (Master/slave - Multi master)

    I prefer to describe reality than to try to define it. What I described above reflects my experience in many different large organizations in many different industries, primarily in the US and Europe. My background for working with these companies is related to Java and J2EE, so I can't speak much to "perl/php/asp". And of course, you forgot tcl .. ;-)
    My main grief about this apache/tomcat thing is that anytime i hear about tomcat, i see apache in the corner, the old days where tomcat was just a servlet container are gone and i say, give Tomcat a chance to prove that it can handle the job by itself, it does not need to be proxied by apache to either provide performance or/and HA/scalability.

    Again, my main issue with your conclusion is that you have made your conclusion because you want it to be true, not because you have any evidence to suggest that it is true. I am not saying that you are wrong -- you could very well be right -- but I am saying that you have done yourself a huge disservice by purposefully avoiding empirical evidence as you have made your case.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  28. Tomcat Standalone[ Go to top ]

    I agree with Cameron that we should back our statements with real life experiences. It will be for the benefit of all.

    Our product solely uses Tomcat as a Web Server, Servlet container -- including the SSL stack. It is live and deployed at atleast 5 sites. It has performed well for us so far with an average transaction of around 500 messages per day -- each message could go upto 100KB with extremes of around 10MB in message size.

    Again -- I cannot speak of how well Tomcat might be able to do well on its own if the transaction count is huge -- I dont have the experience of that. But given our environment and the transaction size described again -- Tomcat on its own has performed admirably.

    Romin.
  29. Tomcat Standalone[ Go to top ]

    Looking through this thread it is pretty easy to guess who has developed Java Server based Applications and who has developed, deployed and production supported Java Server based applications.

    A DMZ arrangement as Cameron described with Two Firewalls, Load Balancers, Apache or IIS as Web Server proxies and application servers (Tomcat, Weblogic) is a pretty standard system architecture for companies operating mid to large scale Java Server Based Applications. Secuity and scaling requirements really push you in this direction.

    As for backing it up with real life experience all J2EE deployments (notice I said J2EE not EJB) I have worked on since 1997 (Aon Insurance, Virgin Mobile USA, one of the large US banks and a few failed dot coms ;-) ) have all used DMZ based system architectures.
  30. Well it's not my fault if most companies blindly followed J2EE/EJB hype, remember: " divide to conquer (horizontal scalability: Firewall -> Web Farm -> Firewall -> App Farm -> DB)"

    nobody is dividing anything ...

    "web farm" does not mean in this context the J2EE servlet container but it does mean the C/C++ web server

    so you have:
    - firewall
    - C/C++ web server
    - firewall
    - J2EE servlet container (+ J2EE EJB container) (+ J2EE MQ)
    - firewall
    - DB

    IMO there is no technical reason to split servlet and EJB containers to different JVMs; scalability of this solution is better and with sticky sessions and no session fail over is almost linear
  31. scalability of this solution is better

    well scalability is similar, but performance will be better

    anyway does somebody know some technical reason to split the servlet and EJB containers from same JVM ???
  32. There are reasons not so much to split the Web Container from the EJB Container but to deploy the business logic code in separate VMs from the Websites that use the logic.

    In these sorts of large organizations the web sites and business logic code are generally developed, deployed and managed by different groups.

    However even in "Business Logic" VMs it is generally necessary to have a Servlet Container to support Web Service access.
  33. does somebody know some technical reason to split the servlet and EJB containers from same JVM ???

    In general, no.

    And just to be clear, this is a totally different question than splitting the "web tier" and the "app tier" as described above in this thread. The "web tier" is just Apache or similar handling static content requests or proxying to other systems, while the "app tier" is the entire J2EE stack (or whatever parts of it you choose to use) -- NOT split out from the same JVM.

    There are some projects that I've seen that split out their servlet work from what they call their "business tier" (and sometimes it's even called a "data tier" but it's not the database, but rather the EJBs or similar that manage the abstract data model), but splitting it out is done for reasons other than purely technical, such as different independent groups owning different parts of an application, or perhaps even for security reasons.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  34. Setting up a DMZ with Apache acting as a proxy in the middle is pretty standard practice these days for most companies. If for no other reason than security. Don't want people hacking onto my JMX console and doing some real damage.

    I would also serve up all static content like images, pdfs etc for the Apache server in the DMZ. Once again the idea being not to tie up those app server threads that should be responding to business logic and dynamic content requests.
  35. JK2 Now Unsupported[ Go to top ]

    I was just playing around with tomcat clustering and load balanacing using apache the other day. I noticed that JK2 was no longer supported.

    See the announcement here.

    For the benefit of the lazy, the basic reason is quoted below:
    JK2 has been put in maintainer mode and no further development will take place. The reason for shutting down JK2 development was the lack of developers interest. Other reason was lack of users interest in adopting JK2, caused by configuration complexity when compared to JK.

    It looks like they are inventing some native module for apache 2.1/2.2 and will support this. But for all other web servers (including Apache 1.3) mod_jk is still actively supported.

    We are using the mod_jk in our company's site since our web server is Apache 1.3 and is used for other purposes (like some php sites). So we use it to communicate to our app server (tomcat) on another machine.
  36. A bit more complicated than that[ Go to top ]

    the discussion on what to do with Jk2 and improving the ease of use was rather long, but the choice was made carefully. The tomcat developers are working more closely with httpd developers to make it more consistent and easier for users to install. If a website is already using a mix of Perl, PHP, CGI and Servlet, than using apache front-end makes a lot of sense. Sites that don't have those requirements are better off using just Tomcat by itself.
  37. A bit more complicated than that[ Go to top ]

    If a website is already using a mix of Perl, PHP, CGI and Servlet, than using apache front-end makes a lot of sense. Sites that don't have those requirements are better off using just Tomcat by itself.

    Hi Peter,

    I think even in that last case, there are chances one would like to use httpd. For example, feeling more confidence by having the Java server behind a trusty and resilient Apache httpd server. Or by using mod_perl to do good filtering for bogus requests, and stop them before they reach the app server. Under some circumstances, even no need to use any connectors, just Apache's mod_proxy. Of course this situations imply we are more comfortable with extra features that exist on httpd than accept those implemented by Coyote when exposing the Tomcat to the outer world, and that we have enough resources (time, people) to support this configuration.
  38. good point[ Go to top ]

    that's a good point, but in terms of filtering, I find that using Cisco to filter can be a desirable option in many cases. Not all cases, but in many cases like filtering out specific url patterns, most routers today can handle that.
  39. Reasons for Using an Apache Front End[ Go to top ]

    When should you front Tomcat with Apache?

    Here are a few reasons (based on our experience hosting a number of Java users).

    It makes sense to use Apache when you are hosting multiple domains on a server and some of the domains are just hosting regular content (i.e. not webapps).

    Tomcat does not 'do' PHP/Perl/Ruby/Mason. Apache is required if you need to use PHP et al. for some part of your site, or for some domains (e.g. running phpMyAdmin).

    Getting SSL working with Apache/Tomcat is easier than doing it with Tomcat.

    Configuring Apache for Virtual Hosts is easier than Tomcat.

    Apache adds some nifty front end features (e.g. mod_rewrite, mod_gzip).

    Getting Tomcat to run on port 80 requires the use of iptables (unless you're willing to run Tomcat as a privileged user, ugh).

    --
    Peter
    RimuHosting - Java Hosting
  40. Reasons for Using an Apache Front End[ Go to top ]

    Probably one of the best reasons to use Apache is because of the simple Unix ethos, use the best tool that was specifically written for that role.

    Apache sits on more webservers then any other http server software, that gives it better development support, and also attracts a large portion of the malicious community (though IIS does a great job of attracting that community as well! ;) ), and as such one can be sure that over time, in terms of security and reliability, httpd is likely to be one of the most resilient pieces of software ever written.

    There is a lot of talk of putting Tomcat out there in the open, however not many people can answer the security concerns that this raises. HTTP on the face of it seems like a simple protocol, but Microsoft has shown over the years just how proficiently malicious people can be, and as such I wouldn't dismiss the hard-earned resilience of Apache.
  41. that is not true[ Go to top ]

    Probably one of the best reasons to use Apache is because of the simple Unix ethos, use the best tool that was specifically written for that role.Apache sits on more webservers then any other http server software, that gives it better development support, and also attracts a large portion of the malicious community (though IIS does a great job of attracting that community as well! ;) ), and as such one can be sure that over time, in terms of security and reliability, httpd is likely to be one of the most resilient pieces of software ever written.There is a lot of talk of putting Tomcat out there in the open, however not many people can answer the security concerns that this raises. HTTP on the face of it seems like a simple protocol, but Microsoft has shown over the years just how proficiently malicious people can be, and as such I wouldn't dismiss the hard-earned resilience of Apache.

    If you look at the critical exploits of Httpd vs tomcat over the last 5 years, tomcat has had fewer. In many cases, tomcat can serve up static files faster than httpd. Though it is much better to off-load all static files to a separate system. Using dedicated image server is standard practice on large sites, so having apache front Tomcat is actually not standard in many cases from my experience.

    this is especially important for sites that handle lots of HTTPS traffic. Putting both httpd and tomcat on the same system with httpd handling encryption is actually a bad recommendation. A 2ghz CPU can handle roughly 15-20 concurrent requests efficiently, more than that it will consume all the CPU resources. For HTTPS traffic, you're much better off getting a cheap SSL/HTTPS enabled ethernet card. Considering how cheap they are now, it's a more reliable solution.
  42. that is not true[ Go to top ]

    If you look at the critical exploits of Httpd vs tomcat over the last 5 years, tomcat has had fewer

    Well obviously, how many people use tomcat as an external web facing http server? the fact that critical exploits are found and dealt with is what makes httpd reliable, if tomcat began to become popular as a webserver, the number of exploits would sky rocket. Its a function of both how many people are trying to crack the software, how well written it is and the installed user base - all of which tomcat doesn't compare well on, and hence its a poor comparison on straight numbers.

    Generally, large sites use httpd to serve the (non-secure) static content, and pass the dynamic content requests through to Tomcat - thats pretty much standard practise in the large scale sites I've seen and has shown to be a very scaleable solution.

    That's a big claim to say that Tomcat can serve static content faster then httpd - I'd want to see real-world stats for that.

    As far as where you deploy Tomcat, thats a matter of architecture, dictated by your scaleability/availability and security requirements - all issues that have been well covered by Cameron et al.
  43. have you bothered to read tomcat code?[ Go to top ]

    >>If you look at the critical exploits of Httpd vs tomcat over the last 5 years, tomcat has had fewerWell obviously, how many people use tomcat as an external web facing http server? the fact that critical exploits are found and dealt with is what makes httpd reliable, if tomcat began to become popular as a webserver, the number of exploits would sky rocket. Its a function of both how many people are trying to crack the software, how well written it is and the installed user base - all of which tomcat doesn't compare well on, and hence its a poor comparison on straight numbers.Generally, large sites use httpd to serve the (non-secure) static content, and pass the dynamic content requests through to Tomcat - thats pretty much standard practise in the large scale sites I've seen and has shown to be a very scaleable solution.That's a big claim to say that Tomcat can serve static content faster then httpd - I'd want to see real-world stats for that.As far as where you deploy Tomcat, thats a matter of architecture, dictated by your scaleability/availability and security requirements - all issues that have been well covered by Cameron et al.

    I'm no expert, but I've spent quite a bit of time going over Tomcat's code. In my bias opinion, the code base of both Apache httpd and tomcat are basically equal. The reason httpd has had more critical exploits is because of design and inherant difficulty of writing C. I suck at C, but it is far easier to write insecure code in C than Java. I've ran over 500 benchmarks on tomcat starting with tomcat 3.3.1 all the way to 5.0.21. If someone is comparing tomcat 3.3.1 to httpd 1.3.x for static content than httpd definitely wins. If you're comparing httpd 2.0 to tomcat 5.5, than it's going to be equal. In some cases, tomcat will be faster for static content. Don't take my word for it, do the test yourself.

    Tomcat has come a long way and the guys have worked their butt off. If you look at recent benchmarks comparing Tomcat 5 to other webservers, it's in the top 5 now. Both httpd and tomcat are mature. Both can be used by itself without something else helping it. Using the two together is generally a good idea, but it's really annoying to hear people make claims based on old perceptions of Tomcat. I know of several large sites that get 10million+ pageviews a day using Tomcat without httpd. Some of the new features like HttpSession replication has come a long way. How many webservers, not counting EJB containers have session replication? IIS still doesn't have built-in session replication, so lets give credit to the tomcat developers. Of course, someone can always use database driven httpsessions, but that is different than built-in httpsession replication in tomcat 5.5.
  44. Reasons for Using an Apache Front End[ Go to top ]

    I've dropped apache in favour of squid as a server accelerator for tomcat. it has made the whole setup much easier, read here for more details.

    BTW, squid can take care of the connection issues that cameron spoke about and will also decrypt ssl for you too. plus, you can create a tomcat user for the servlet container and then start squid as root on port 80, when it starts it will swap over to "nobody". You could leave port 8080 open to test (un)cached versions of the site or use IPtables to restrict access to port 8080.

    Anyone any thoughts? it has been working *really* well for me on a media-intensive site.
  45. Reasons for Using an Apache Front End[ Go to top ]

    I have no experience with Squid, but it certainly sounds promising from your description. I'll take a look at your write-up; thanks for the link.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  46. Squid is just a cache (proxy cache), of course you can get better results if you cache your static ressources but you can achieve this using filter and oscache/ehcache, it will have the same effect....
  47. You're missing cameron's point about connections, tomcat doesn't deal with clients with slower, less reliable connections like people with 56k and less (mobile devices) as well as apache does. Its true that you could write filters, or use existing implementations like oscache to cache content, and/or gzip the response if the user-agent supports it, but why repeat the functionality of an established server like squid?

    Also, how are you getting around the port 80 issue? Are you using a firewall to forward to 8080?

    Tomcat is getting better, there was a time when it simply wasn't a production-ready servlet container, never mind web server. But it still has a way to go until it can provide as reliable a connection as apache can. I suggested using squid because it the simplest method of installing a production platform that meets all my requirements.
  48. Reasons for Using an Apache Front End[ Go to top ]

    I forgot to mention that squid already has a management interface that you would also have to implement..
  49. perhaps some benchmarks?[ Go to top ]

    in 2003, there were several threads on tomcat-user regarding Tomcat5 performance for static content. I don't have the links handy, but I do remember several users having problems setting up JK2 and decided to stress test tomcat with and without apache. In a few cases, tomcat perform just as well as apache. This doesn't mean tomcat is faster than apache for static content. It does show Tomcat5 has come a long way and can handle static content just fine.

    I personally prefer to off load all static content to a dedicated webserver running apache. This way, updating the website is simpler. Rather than copy all images and static files to every single server, I just update apache. This has several benefits. If the traffic spikes 10x, I can generate a static page and serve it from apache without hammering my database or tomcat. This assumes updating the router to redirect all traffic for the dynamic URL to the static file.

    it would be a benefit to have some rough benchmarks of how a website performs with and without squid.
  50. perhaps some benchmarks?[ Go to top ]

    Do you have a particular bench mark that you'd like to see repeated or just some rough figures for (un)cached pages?

    I too prefer to host static and dynamic content on different machines because it allows me to off-load more work to the network, but it is marginally more work on my part ;)
  51. some ideas[ Go to top ]

    Actually, after my last post, I thought about running some benchmarks on my linux box at home comparing Apache 1.3 vs tomcat5. Hopefully it won't take me too much time to run it in the next few days. i was thinking of doing the following tests and posting the results to tomcat's resource page.

    Concurrent threads: 5, 10, 15, 20, 25, 30
    page size in KB: 1, 5, 10, 20, 40, 80, 160
    image size KB: 1, 5, 10, 20, 40, 80, 160
    with and without gzip compression for HTML

    that's not really extensive, but it should provide a baselines of how Tomcat handles static files. If you want to work together on this and write up the results, feel free to email me woolfel AT gmail DOT com.

    peter
  52. some ideas[ Go to top ]

    It's also more interesting to see a mix of user connection speeds -- some coming in on T1s (often low latency / low packet loss) and some coming in over south-pole dialups (high latency / ludicrous packet loss). It's those slow conns that chew up perfectly good threads for no good reason ;-)

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  53. some ideas[ Go to top ]

    It's also more interesting to see a mix of user connection speeds -- some coming in on T1s (often low latency / low packet loss) and some coming in over south-pole dialups (high latency / ludicrous packet loss). It's those slow conns that chew up perfectly good threads for no good reason ;-)Peace,Cameron PurdyTangosol, Inc.Coherence: Shared Memories for J2EE Clusters

    yeah, totally agree. i wish I had the resources to do that kind of testing. In past i've done that by asking friends and family to assist :) Though isn't everyone on broadband now? Just kidding.
  54. some ideas[ Go to top ]

    Cameron: agreed, necessary,but difficult to test. do you have any ideas on how to simulate a slower, less reliable connection?

    Peter: I'll certainly help you out on this, I'll mail you later.
  55. some ideas[ Go to top ]

    Cameron: agreed, necessary,but difficult to test. do you have any ideas on how to simulate a slower, less reliable connection?

    There are software solutions, like those built into some of the load generators, but those only approximate the effect.

    There are hardware solutions, but they run around $100k-$200k for a good one. It's been a while since I saw one, but the idea is that you can simulate tons of different scenarios on the network, including losing packets, etc. (Yeah, for $200k, it better be able to lose some packets ;-)

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  56. some ideas[ Go to top ]

    Cameron: agreed, necessary,but difficult to test. do you have any ideas on how to simulate a slower, less reliable connection?
    There are software solutions, like those built into some of the load generators, but those only approximate the effect.There are hardware solutions, but they run around $100k-$200k for a good one. It's been a while since I saw one, but the idea is that you can simulate tons of different scenarios on the network, including losing packets, etc. (Yeah, for $200k, it better be able to lose some packets ;-)Peace,Cameron PurdyTangosol, Inc.Coherence: Shared Memories for J2EE Clusters

    LOL, man you made me laugh. Back when my friends worked at ISP's, I could borrow some bandwidth. There are companies that specialize in this kind of testing and have a whole bank of accounts with various ISP's for that purpose. Though they are generally very expensive, since it takes a lot of resources to do a full bown "realistic" load test. I find that buffering the page generation and sending the completed page is a good way to minimize the effects of bad dial-up connections. Though even better solution is to gzip the HTML.
  57. some ideas[ Go to top ]

    LOL, man you made me laugh. [..] I find that buffering the page generation and sending the completed page is a good way to minimize the effects of bad dial-up connections. Though even better solution is to gzip the HTML.

    Case in point, Peter:

    1) Five Tomcat servers
    2) 100 service threads (which is a lot for a dual proc Linux) per server
    3) Clients running IE over dialup
    4) Site is graphically intensive
    5) Images are served by Tomcat

    Question: How many clients does it take to cause a DDOS?
    Answer: Roughly one more than 100 (assuming IE is issuing 5 image requests in parallel).

    Potential also exists for issues with HTTP keep-alive. Clients can soak all the threads quite easily.

    "Real world" testing isn't always about "speed" -- sometimes (like the above example) it's about impedence mismatches.

    This is what NIO is supposed to solve, by the way ;-)

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  58. some ideas[ Go to top ]

    LOL, man you made me laugh. [..] I find that buffering the page generation and sending the completed page is a good way to minimize the effects of bad dial-up connections. Though even better solution is to gzip the HTML.
    Case in point, Peter:1) Five Tomcat servers2) 100 service threads (which is a lot for a dual proc Linux) per server3) Clients running IE over dialup4) Site is graphically intensive5) Images are served by TomcatQuestion: How many clients does it take to cause a DDOS?Answer: Roughly one more than 100 (assuming IE is issuing 5 image requests in parallel).Potential also exists for issues with HTTP keep-alive. Clients can soak all the threads quite easily."Real world" testing isn't always about "speed" -- sometimes (like the above example) it's about impedence mismatches.This is what NIO is supposed to solve, by the way ;-)Peace,Cameron PurdyTangosol, Inc.Coherence: Shared Memories for J2EE Clusters

    Right, except that I wouldn't bother serving images from Tomcat. I thought the latest IE will only open 2 connections to any single webserver, so you'd have to open up multiple instances of IE to really produce 2+ concurrent requests to the same webserver. It's rather unlikely for a single user to open more than 3 IE instances to the same website. Even if someone is using an older version of IE, that would be 4 concurrent connections per IE instance according to the HTTP 1.0 spec.

    DDOS a server is rather easy to do :) Which is why most routers have some support for preventing DDOS. Of course, there's always the /. effect, which achieves the same effect as DDOS. In practice, I find the bandwidth gets saturated why before the webserver does a melt down.
  59. some ideas[ Go to top ]

    In practice, I find the bandwidth gets saturated why before the webserver does a melt down.

    What I'm suggesting is that, with Tomcat (thread == connection) you can have a DDOS result from very low load, if the sockets take too much time to flush the data through because of connection speed / reliability issues.

    Take a thread dump of a production site with dialup users hitting Tomcat directly. All the service threads will be in a blocking native write to a socket.

    In other words, all threads "doing something", maybe even hundreds of threads doing something, but CPU 90% free, memory 90% free, T1 90% free, etc. In other words, it's not solvable with a faster server, more memory, more CPUs, or wider pipe on the data center.

    Plain and simple: There's an impedence mismatch .. in Java, only NIO (non-blocking buffered socket I/O) _could_ solve it in some cases, if the servlets have finished all their writing and have already returned, and so the server is just spooling the buffered output to the NIO channels.

    (BTW - my numbers were wrong. I was assuming 5 sockets, instead of 4.)

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  60. some ideas[ Go to top ]

    In practice, I find the bandwidth gets saturated why before the webserver does a melt down.
    What I'm suggesting is that, with Tomcat (thread == connection) you can have a DDOS result from very low load, if the sockets take too much time to flush the data through because of connection speed / reliability issues.Take a thread dump of a production site with dialup users hitting Tomcat directly. All the service threads will be in a blocking native write to a socket.In other words, all threads "doing something", maybe even hundreds of threads doing something, but CPU 90% free, memory 90% free, T1 90% free, etc. In other words, it's not solvable with a faster server, more memory, more CPUs, or wider pipe on the data center.Plain and simple: There's an impedence mismatch .. in Java, only NIO (non-blocking buffered socket I/O) _could_ solve it in some cases, if the servlets have finished all their writing and have already returned, and so the server is just spooling the buffered output to the NIO channels.(BTW - my numbers were wrong. I was assuming 5 sockets, instead of 4.)Peace,Cameron PurdyTangosol, Inc.Coherence: Shared Memories for J2EE Clusters

    yeah you're absolutely right, but I don't see an easy way to multiplex connections with NIO and still remain compliant to the servlet spec. Even though the latest spec has removed the explicit requirement of single threaded model, it's not at all clear to me retrofitting NIO is feasible. From the blogs by jetty developers and my own research on SEDA and Haboob, I would much rather abandon spec compliance than do an graft on NIO.

    I think at some point, to get measurably better scalability and durability to DDOS or massive concurrent load, the servlet spec stops being useful. One intriguing idea I keep thinking about is "would it feasible to use javaspaces or JXTA" as a general purpose application container. I don't really know enough to be able to answer the question, but it's fun to consider wild and bizzar ideas every now and then.

    peter
  61. some ideas[ Go to top ]

    yeah you're absolutely right, but I don't see an easy way to multiplex connections with NIO and still remain compliant to the servlet spec.

    This is the use of Apache in front of Tomcat that I was referring to. Tomcat spits out the result, Apache buffers it (and 10000 other responses deferred by AOL dialups) and handles that with relatively small amounts of resources. I think it does, anyway; even though I've read a lot of code and documentation, I'm always a bit lost in Apache.
    One intriguing idea I keep thinking about is "would it feasible to use javaspaces or JXTA" as a general purpose application container. I don't really know enough to be able to answer the question, but it's fun to consider wild and bizzar ideas every now and then.

    If you like distributed scalable architectures, I hope you'll take a look at our Coherence software. It's a commercial product that's been very successfully used by sites that were suffering from _purposeful_ DDOS attacks. If you have budget, you can put a couple thousand CPUs into one giant geographically-distributed Java/J2EE app server. :-)

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  62. some ideas[ Go to top ]

    Here's something on the topic; see 12.7.3 in http://modperlbook.org/html/ch12_07.html
  63. mod_proxy[ Go to top ]

    Here's something on the topic; see 12.7.3 in http://modperlbook.org/html/ch12_07.html

    a couple of the tomcat and httpd guys have been working on porting the features from jdk2 to mod_proxy, so hopefully as things progress, setting up apache + tomcat will get easier.

    peter
  64. mod_proxy[ Go to top ]

    I looked at mod_proxy initially, but it doens't forward SSL. mod_perl seems to do so (it mentions support for CONNECT) so I'll try that. The only issue is that squid is much easier to configure for this kind of situation, but I need to simulate clients with poor connections to see how it performs. I don't even know anyone with a 56k connection that I could ask to test!
  65. mod_proxy[ Go to top ]

    I looked at mod_proxy initially, but it doens't forward SSL. mod_perl seems to do so (it mentions support for CONNECT) so I'll try that. The only issue is that squid is much easier to configure for this kind of situation, but I need to simulate clients with poor connections to see how it performs. I don't even know anyone with a 56k connection that I could ask to test!

    I'm not sure that mod_proxy has advanced to that point. You're probably better off asking on tomcat-dev to see where mod_proxy is at today. testing with a 56k modem is gonna be a tough one, considering I don't even own a regular modem anymore. come to think of it, I haven't owned a modem for 5 years now, if you exclude the modem that comes with a laptop.

    I'll generate some files tonight and post them.

    peter
  66. Ahh ok, now I see[ Go to top ]

    yeah you're absolutely right, but I don't see an easy way to multiplex connections with NIO and still remain compliant to the servlet spec.
    This is the use of Apache in front of Tomcat that I was referring to. Tomcat spits out the result, Apache buffers it (and 10000 other responses deferred by AOL dialups) and handles that with relatively small amounts of resources. I think it does, anyway; even though I've read a lot of code and documentation, I'm always a bit lost in Apache.
    One intriguing idea I keep thinking about is "would it feasible to use javaspaces or JXTA" as a general purpose application container. I don't really know enough to be able to answer the question, but it's fun to consider wild and bizzar ideas every now and then.
    If you like distributed scalable architectures, I hope you'll take a look at our Coherence software. It's a commercial product that's been very successfully used by sites that were suffering from _purposeful_ DDOS attacks. If you have budget, you can put a couple thousand CPUs into one giant geographically-distributed Java/J2EE app server. :-)Peace,Cameron PurdyTangosol, Inc.Coherence: Shared Memories for J2EE Clusters

    I thought starting with JDK1.4.2 IO classes uses the NIO stuff internally, so in theory it should handle this specific type of issues better? Atleast better than jdk1.3.1. Whether that achieves what Apache does is a different story. I don't know apache httpd code at all. I usually end up feeling very confused and dizzy afterwards.

    hopefully, as the JVM improves, IO performance will improve so that cases where there's a ton of really slow bad connections doesn't impact reliability as much.

    peter
  67. The problem isn't the servlet spec...[ Go to top ]

    ...the problem is in implementation of servlet containers.

    Take, for example, BEA Weblogic - they have had multiplexed IO via native libraries for years, and have recently added an NIO implementation.

    Tomcat & Jetty are limited by the "thread-per-connection" design, and to go to the next level (and be serious in the enterprise solution space) they need to embrace scalability solutions.

    Read the rest of my comments on this topic @ http://chrislee.typepad.com
  68. mod_jk and URI mapping[ Go to top ]

    I find that there is a lot more flexibility in URI mapping to my servlets by using apache mod_jk to get around the limitation that server.xml doesn't support wildcard paths to my servlets.
  69. this is not similar ?[ Go to top ]

    this thing not similar to wadi http://wadi.codehaus.org/, is very bery good
  70. Any suggestions?[ Go to top ]

    I'm currently working on a site with growing pains and not too much money for expensive hardware. Currently we use Apache2, mod_jk2 and Tomcat4 (about 14 Tomcat instances for one Apache server).

    The main reason for using Apache2/mod_jk2 is to use it for the sticky session loadbalancing. I was taken a bit by surprise to see that mod_jk2 is not going to be supported any longer. Now I'm wondering what the correct path forward should be. Mod_jk2 works OK for now, but once in a while we get segmentation faults and everything goes south (this is the main instability problem of the site). Everything is usually happy after a full restart. But it's not ideal to do this very often.

    Here are the options I can come up with:

    1) Inverst in a HW loadbalancer
    2) Wait for Apache 2.1 and use the new mod_ajp featues
    3) Go back to mod_jk

    Suggestions? Other/better options?
  71. My vote for apache[ Go to top ]

    While I do agree that Apache adds a level of complexity to the config, it also provides some redundancy and ease of maint.

    With a few HW loadbalancers monitoring and balancing apache and modjk monitoring and balancing tomcat, I can work on servers or deal with a failure without most users ever knowing about it. If I run everything right off of tomcat and I lose a box, the impact is greater than losing a single tomcat server or a apache server since there are layers of redundancy built into each. This also allows me to take a tomcat server offline for an upgrade and not impact the front end capacity I have for servicing incoming requests. Unless you are running on a shoestring budget and only have enough money for 4 servers and a connection to the net, there is no reason I see for not breaking out the components. I also get the added security of not having any of my hosts directly connected to the net. They all sit in private space and I use the loadbalancer as the public interface.

    I think the argument for wether or not to use tomcat for everything is based on your situation.
  72. some benchmark results[ Go to top ]

    for the last few weeks, I've been running some benchmarks. here are some early results and graphs.

    http://cvs.apache.org/~woolfel/tc_results.html

    enjoy

    peter
  73. Disable output buffering ?[ Go to top ]

    We have a requirement to push data to clients using HTTP tunneling. An Apache HTTP sever as a load balancer in front of Tomcat.

    When we access the data feed URL by connecting directly to Tomcat (on port 8080) the app. works as expected -- data is flushed immediately to clients.
    In contrast when accessing the app. via Apache(mode_jk) the output is buffered (8,192 bytes), and this is not exactly what we need.

    Does anyone know a proper way to overcome this buffering?
  74. Disable output buffering ?[ Go to top ]

    add in mod_jk.conf:

    JkOptions +FlushPackets
  75. Disable output buffering ?[ Go to top ]

    add in mod_jk.conf:JkOptions +FlushPackets

    I have a similar problem but when reading. When the user posts directly to Tomcat I can detect, after a short timeout, if the connection has been broken. When the post is through Apache/mod_jk there is a very long timeout, ~5min, after a break occurs before the inputstream read throws an error .. I have reduced Apache's timeout without success.

    Any idea if mod_jk could be involved, is there an appropriate option I can set?