Thread per connection : NIO, Linux NPTL and epoll


Performance and scalability: Thread per connection : NIO, Linux NPTL and epoll

  1. I have been benchmarking Java NIO with various JDKs on Linux. Server is
    running on a 2 CPU 1.7 GHz, 1GB RAM, Ultra160 SCSI 36GB disk

    With Linux kernel 2.6.5 (Gentoo) I had NPTL turned on and support for
    epoll compiled in. The server application was designed to support
    multiple disptach models :

    1. Reactor with Iterative Disptach with multiple selector threads. Essentially
    the accepted connections were load-balanced between varying number of
    selector threads. The benchmark then applied a step function to experimentally
    determine the optimal # of threads and connection per selector ratio.

    2. Also a simple concurrent blocking disptach model was supported. This is
    essentially a reader thread per connection model.

    Client application opens concurrent persistent connections to the server
    and starts blasting messages. Server just reads the messages and does
    basic un-marshalling to ensure message is ok.

    Results were interesting:

    1. With NPTL on, Sun and Blackwidow JVM 1.4.2 scaled easily to 5000+ threads. Blocking
    model was consistently 25-35% faster than using NIO selectors. Lot of techniques suggested
    by EmberIO folks were employed - using multiple selectors, doing multiple (2) reads if the first
    read returned EAGAIN equivalent in Java. Yet we couldn't beat the plain thread per connection model
    with Linux NPTL.

    2. To work around not so performant/scalable poll() implementation on Linux's we tried using
    epoll with Blackwidow JVM on a 2.6.5 kernel. While epoll improved the over scalability, the
    performance still remained 25% below the vanilla thread per connection model. With epoll
    we needed lot fewer threads to get to the best performance mark that we could get out of NIO.

    Here are some numbers:

    (cc = Concurrent Persistent Connections, bs = Is blocking server mode on Flag,
    st = Number of server threads, ct = Connections handled per thread,
    thruput = thruput of the server )

    cc, bs,st,ct, thruput

    As you can see the last line indicates vanilla blocking server (thread per connection)
    produced the best thruput even with 1700 threads active in the JVM.

    With epoll, the best run was with 2 threads each handling around 850 connections in
    their selector set. But the thruput is below the blocking server thruput by 25%!

    Results shows that the cost of NIO selectors coupled with OS polling mechanism (in
    this case efficient epoll VS selector/poll) has a significant overhead compared to
    the cost of context switching 1700 threads on an NPTL Linux kernel.

    Without NPTL of course it's a different story. The blocking server just melts at 400 concurrent
    connections! We have run the test upto 10K connections and the blocking server outperformed
    NIO driven selector based server by same margin. Moral of the story - NIO arrives at the scene
    a little too late - with adequate RAM and better threading models (NPTL), performance gains
    of NIO don't show up.

    Sun's JVM doesn't support epoll() so we couldn't use epoll with it. Normal poll() based
    selector from Sun didn't perform as well. We needed to reduce the number of connections
    per thread to a small number (~ 6-10) to get comprabale numbers to epoll based selector.
    That meant running lot more selector threads kind of defeats the purpose of multiplexed IO.
    The benchmarks also dispell the myth created by Matt Welsh et al (SEDA) that a single
    threaded reactor can keep up with the network. On a 100Mbps ethernet that was true: network
    got saturated prior to server CPUs but with > 1Gbps network, we needed multiple selectors
    to saturate the network. One single selector's performance was abysmal (5-6x slower than
    concurrent connections)

    For application that want to have fewer number of threads for debuggability etc, NIO may be
    the way to go. The 25-35% performance hit may be acceptable to many apps. Fewer threads
    also means easier debugging, it's a pain to attach a profiler or a debugger to a server hosting
    1000+ threads :-) . Bottom line with better MT support in kernels (Linux already with NPTL), one
    needs to re-consider the thread per connection model

    Rahul Bhargava
    CTO, Rascal Systems

    Threaded Messages (3)

  2. History[ Go to top ]


    Before the 2.6 version of the Linux kernel, processes were the schedulable entities, and there was no real support for threads. However, it did support a system callclone — which creates a copy of the calling process where the copy shares the address space of the caller. The LinuxThreads project used this system call to provide kernel-level thread support (most of the previous pthread implementations in Linux worked entirely in userland). Unfortunately, it had a number of issues with true POSIX compliance, particularly in the areas of signal handling, scheduling, and inter-process synchronization primitives.

    To improve upon LinuxThreads, it was clear that some kernel support and a re-written threads library would be required. Two competing projects were started to address the requirement: NGPT (Next Generation POSIX Threads) worked on by a team which included developers from IBM, and NPTL by developers at Red Hat. NGPT was abandoned in mid-2003, at about the same time when NPTL was released.

    NPTL was first released in Red Hat Linux 9. Old-style Linux POSIX threading is known for having trouble with threads that refuse to yield to the system occasionally, because it does not take the opportunity to preempt them when it arises, something that Windows was known to do better at the time. Red Hat claimed that NPTL fixed this problem in an article on the Java website about Java on Red Hat Linux 9.[3]

    NPTL has been part of Red Hat Enterprise Linux since version 3, and in the Linux kernel since version 2.6. It is now a fully integrated part of the GNU C Library.

    There exists a tracing tool for NPTL, called POSIX Thread Trace Tool (PTT). And an Open POSIX Test Suite (OPTS) was written for testing the NPTL library against the POSIX standard.


  3. nice talk about EJB[ Go to top ]

    EJB servers are required to support the UserTransaction interface for use by EJB beans with the BEAN value in the @TransactionManagement annotation (this is called bean-managed transactions or BMT). The UserTransaction interface is exposed to EJB components through either the EJBContext interface using the getUserTransaction method, or directly via injection using the general @Resource annotation. Thus, an EJB application does not interface with the Transaction Manager directly for transaction demarcation; instead, the EJB bean relies on the EJB server to provide support for all of its transaction work as defined in the Enterprise JavaBeans Specification. (The underlying interaction between the EJB Server and the TM is transparent to the application; the burden of implementing transaction management is on the EJB container and server provider.

  4. nice info[ Go to top ]

    just want to sya thanks you for sharing this, i hope i can fix it


    Download software terbaru