|
Sponsored Links
Resources
Enterprise Java Research Library
Get Java white papers, product information, case studies and webcasts
|
Performance and scalability
Performance and scalability
Performance and scalability
|
Messages: 0
Messages: 0
Messages: 0
Printer friendly
Printer friendly
Printer friendly
Post reply
Post reply
Post reply
XML
XML
XML
|
 |
Thread per connection : NIO, Linux NPTL and epoll
I have been benchmarking Java NIO with various JDKs on Linux. Server is running on a 2 CPU 1.7 GHz, 1GB RAM, Ultra160 SCSI 36GB disk
With Linux kernel 2.6.5 (Gentoo) I had NPTL turned on and support for epoll compiled in. The server application was designed to support multiple disptach models :
1. Reactor with Iterative Disptach with multiple selector threads. Essentially the accepted connections were load-balanced between varying number of selector threads. The benchmark then applied a step function to experimentally determine the optimal # of threads and connection per selector ratio.
2. Also a simple concurrent blocking disptach model was supported. This is essentially a reader thread per connection model.
Client application opens concurrent persistent connections to the server and starts blasting messages. Server just reads the messages and does basic un-marshalling to ensure message is ok.
Results were interesting:
1. With NPTL on, Sun and Blackwidow JVM 1.4.2 scaled easily to 5000+ threads. Blocking model was consistently 25-35% faster than using NIO selectors. Lot of techniques suggested by EmberIO folks were employed - using multiple selectors, doing multiple (2) reads if the first read returned EAGAIN equivalent in Java. Yet we couldn't beat the plain thread per connection model with Linux NPTL.
2. To work around not so performant/scalable poll() implementation on Linux's we tried using epoll with Blackwidow JVM on a 2.6.5 kernel. While epoll improved the over scalability, the performance still remained 25% below the vanilla thread per connection model. With epoll we needed lot fewer threads to get to the best performance mark that we could get out of NIO.
Here are some numbers:
(cc = Concurrent Persistent Connections, bs = Is blocking server mode on Flag, st = Number of server threads, ct = Connections handled per thread, thruput = thruput of the server )
cc, bs,st,ct, thruput 1700,N,2,850,1379 1700,N,4,425,1214 1700,N,8,212,1240 1700,N,16,106,1140 1700,N,32,53,1260 1700,N,64,26,1115 1700,N,128,13,886 1700,N,256,6,618 1700,N,512,3,184 1700,Y,1700,1,1737
As you can see the last line indicates vanilla blocking server (thread per connection) produced the best thruput even with 1700 threads active in the JVM.
With epoll, the best run was with 2 threads each handling around 850 connections in their selector set. But the thruput is below the blocking server thruput by 25%!
Results shows that the cost of NIO selectors coupled with OS polling mechanism (in this case efficient epoll VS selector/poll) has a significant overhead compared to the cost of context switching 1700 threads on an NPTL Linux kernel.
Without NPTL of course it's a different story. The blocking server just melts at 400 concurrent connections! We have run the test upto 10K connections and the blocking server outperformed NIO driven selector based server by same margin. Moral of the story - NIO arrives at the scene a little too late - with adequate RAM and better threading models (NPTL), performance gains of NIO don't show up.
Sun's JVM doesn't support epoll() so we couldn't use epoll with it. Normal poll() based selector from Sun didn't perform as well. We needed to reduce the number of connections per thread to a small number (~ 6-10) to get comprabale numbers to epoll based selector. That meant running lot more selector threads kind of defeats the purpose of multiplexed IO. The benchmarks also dispell the myth created by Matt Welsh et al (SEDA) that a single threaded reactor can keep up with the network. On a 100Mbps ethernet that was true: network got saturated prior to server CPUs but with > 1Gbps network, we needed multiple selectors to saturate the network. One single selector's performance was abysmal (5-6x slower than concurrent connections)
For application that want to have fewer number of threads for debuggability etc, NIO may be the way to go. The 25-35% performance hit may be acceptable to many apps. Fewer threads also means easier debugging, it's a pain to attach a profiler or a debugger to a server hosting 1000+ threads :-) . Bottom line with better MT support in kernels (Linux already with NPTL), one needs to re-consider the thread per connection model
Rahul Bhargava CTO, Rascal Systems
|
|
 |
Hot threads
Hot threads
Hot threads
|
More hot threads
More hot threads
More hot threads
|
 |
Brian Goetz continues to lift the lid and peak into the inner workings of Java in Java Urban Performance Legends. In this article he exposes the fallacy behind some of the more common performance myths found in the annals of the JVM.
(93 comments,
last posted
February 06, 2009)
Bruce Tate, author of Better, Faster Lighter Java and Bitter EJB has come out with a new book called Beyond Java. Bruce has an epiphany about the future of software development. Does it include Java?
(770 comments,
last posted
September 23, 2009)
Looks like today AJAX concept have several interpretations. We can distinguish different approaches of AJAX integration. Can they co-exist within the same application? Can we talk about layered AJAX integration?
(68 comments,
last posted
May 08, 2008)
Artima has published a short article describing the Design-Time API for JavaBeans, which was recently approved as JSR 273. This API promises to bring VB-like ease to Java development, but may face a cultural bias among Java developers who tend to think more in terms of class libraries than components.
(225 comments,
last posted
November 19, 2009)
There is plenty of speculation today regarding a potential buyout of Sun Microsystems by Scott McNealy and Silver Lake Partners. How would privatization of Sun affect Java?
(16 comments,
last posted
May 15, 2009)
More hot threads »
|
|