Scott Oak presents a
good approach to testing container scalability in light of the ongoing debates about performance in the Glassfish and Tomcat containers. Performance measurements and results depend on what's being measured and how, as well as configuration. Scott's methodology is well thought out and the test results are enlightening rather than surprising. He wrote:
What does it mean to scale to N number of users, where N is large? The answer is highly dependent on your benchmark, and in particular to the think time that your benchmark uses. It's very easy to scale to 16,000 users if they each make a request every 90 seconds: that's on the order of 180 requests/second... I'll explore some of the considerations you need to examine in order to benchmark a large system properly.
The test involves scaling from 30 to 5000 concurrent users while monitoring system performance, error rates, and container initialization. In the end, there is no clear winner between Glassfish and Tomcat. How well either scales depends on configuration, environmental, load curve, and problem domain factors rather than the raw container implementation.