NIO not faster than IO, according to... TSS itself?

Home

News: NIO not faster than IO, according to... TSS itself?

  1. Paul Tyma, in "Kill the myth please. NIO is *not* faster than IO," wounds the heart in repeating a post he originally found on TSS, including a test that shows that java.nio is not faster than java.io with the use of the Native POSIX threading library on Linux. This is going to be the basis of a talk of Paul's, at SD West in March 2008. The talk is based somewhat on a test he originally found here on TSS, entitled "Thread per connection : NIO, Linux NPTL and epoll," by Rahul Bhargava. The test is from 2004, but Mr. Tyma is saying the results are still valid. Here's his description:
    [The talk is on...] why NIO is really not the best way anymore (post linux 2.6 kernel and NPTL) and multithreaded I/O is the new old-way of doing things. Its a fun talk and largely discusses the internals of Mailinator and how it runs a few thousand simultaneous threads without breaking a sweat. On top of that, as it turns out, in pure throughput, IO smokes NIO in all tests I tried. And I'm not alone - Rahul Bhargava of Rascal Systems did a very nice analysis of this and posted it, sadly, in some forums at theserverside.com.
    For one thing, ouch -- sad that it was here on TSS, where you could manage to find it, Mr. Tyma? Et tu? It's still interesting data - the original test relied on multiple threads reading the input streams, then a polling mechanism, and the result was that java.io was able to handle far more connections. Without seeing the original code, it's hard to validate the actual test. ("Performance testing is hard!", said Barbie, and it's been known to happen that test harnesses affect the results of tests often.) It's also difficult to tell from Mr. Tyma's post whether he's discussing modern JVMs and OS platforms, or whether he's using the same JVMs and issues from 2004. Even if he's using the 2004 edition of everything, though, the comment has relevance - because many developers are still deploying on older JVMs and OSes. What's the verdict? Without seeing the test itself, it's hard to say. Most modern servlet containers are moving to NIO, because of performance - which would imply that Mr. Tyma is incorrect, that things either have changed over the past few years or that the original test was flawed. In addition, NPTL is only relevant to Linux at first glance, so making an assertion that NIO is slower than java.io would seem to rely on the underlying OS.

    Threaded Messages (26)

  2. The problem hides in plain sight[ Go to top ]

    The problem with the whole discussion hides in plain sight: "Thread per connection". This is where pretty much every web server on the market goes wrong - if you use a thread per connection model, rather than something more SEDA like, you will always severely limit the number of concurrent requests that can be handled (say 150 requests instead of 10000).
  3. Blocking is an operating system's job. It ought to be good at it. One of the temporary problems is that we're in the transition to 64-bit systems, but in a few years running out of virtual memory for the thread stacks should become a non-issue.
  4. Old stuff[ Go to top ]

    The original post from Rahul is from june 2004, discusses JDK 1.4.2 from Blackdown which is pre-historic technology by today's standards... at that time, Sun's support for Linux was not in the level it is today, not to mention that java.nio's implementation was a vast bag of bugs and limitations. It is much better today - perhaps because Sun ate their dog-food with projects like Glassfish/Grizzly, so they had some extra incentive to fix all bugs and performance issues in java.nio that affected these projects. I'm not saying that NIO is better than IO, nor the opposite. I'm just saying that we'd need some updated data, with modern JVMs (Java 6 at least), to make ANY meaningful statement on the performance of I/O in Java. Another thing that changed from 2004 to 2008 is CPU technology, now moving fast into many-core era. It's obvious that a naive, single-threaded NIO system will not scale anymore. We need at least some multithreading, with either library. But it's still possible that NIO, with a thread-per-core model, scales better than IO with a thread-per-connection (much more threads) model. Or that there's no big difference anymore, because operating systems are being forced to scale much better for multithreaded apps, so the cost of managing thousands of threads went down in the last years.
  5. JVM problems?[ Go to top ]

    Asynchronous IO is _definitely_ much faster than multithreaded IO on lots of parallel connections. Try to benchmark Apache (which uses thread/process per connect) and Lighttpd (asynchronous IO) on multiple parallel connections - Apache dies miserably very soon.
  6. The latest edition of Tomcat: The Definitive Guide includes a chapter on the performance of the various connectors. I was quite suprised to see the "regular" connector smoke both the nio and apr ones.
  7. What's the verdict? Without seeing the test itself, it's hard to say. Most modern servlet containers are moving to NIO, because of performance
    Are you sure that's why? I think Willie hit the nail on the head in the first reply. The reason to use NIO is for scalability. It allows you to be able to support more requests than there are threads. It would seem to me that the proper path for high scalability and performance would be multi-threaded NIO. If you look at the scalability of the Erlang-based YAWS web server, it can handle an extremely large number of concurrent requests. I doubt it's single-threaded and I doubt it's running a thread for each request.
  8. Erlang YAWS[ Go to top ]

    Actually, Erlang Yaws IS single-threaded (usually) and it also runs a thread per request :) Erlang uses its own lightweight threads - like 'green threads' in old JDKs but even more lightweight. Thread creation in Erlang is an _extremely_ fast operation (a new thread creation requires about 100 bytes on the heap), so it's not unusual to have hundreds thousands threads running in a typical Erlang application. Erlang also uses non-blocking IO to map its own threads to one OS-level thread.
  9. Re: Erlang YAWS[ Go to top ]

    Actually, Erlang Yaws IS single-threaded (usually) and it also runs a thread per request :)

    Erlang uses its own lightweight threads - like 'green threads' in old JDKs but even more lightweight. Thread creation in Erlang is an _extremely_ fast operation (a new thread creation requires about 100 bytes on the heap), so it's not unusual to have hundreds thousands threads running in a typical Erlang application.

    Erlang also uses non-blocking IO to map its own threads to one OS-level thread.
    But if I have multiple cores available, would I not be able to take advantage of them? My understanding is that Erlang was developed for running apps on machines with a large number of processors. The threads are abstracted from the programmer so fully that it doesn't matter whether they are running concurrently or not. Am I mistaken? I think it's possible for Java threads to be implemented as green threads but are usually backed by the OS for performance reasons.
  10. Erlang YAWS[ Go to top ]

    But if I have multiple cores available, would I not be able to take advantage of them?
    There are two ways to support multicore CPUs: 1. Run several single-threaded Erlang instances on one or several machines and use Erlang's support for transparent distributed systems. 2. Use Erlang VM in multithreaded mode (it's a fairly feature, about 2 years old). Actually, it's sometimes slower then the first option due to locking overhead on all operations. Both choices are completely transparent for Erlang apps.
    My understanding is that Erlang was developed for running apps on machines with a large number of processors.
    No, not quite. Erlang was designed to support massively multithreaded distributed applications.
    The threads are abstracted from the programmer so fully that it doesn't matter whether they are running concurrently or not.
    Yes. In fact, almost all Erlang programs required exactly zero changes when the multithreaded VM was introduced.
    Am I mistaken?

    I think it's possible for Java threads to be implemented as green threads but are usually backed by the OS for performance reasons.
    No, green threads in Java had a lot of issues. For example, a blocking system call (from JNI or VM) can block all threads. Some JNI semantics are also not very green-thread-friendly.
  11. Re: Erlang YAWS[ Go to top ]

    My understanding is that Erlang was developed for running apps on machines with a large number of processors.

    No, not quite. Erlang was designed to support massively multithreaded distributed applications.
    I'm not sure what distinction that you are making in your response but this is what Joe Armstrong (the creator of Erlang) said: "The Erlang philosophy was always to build system with lots of cheap processors and allow them to fail. We don't prevent failure; we live with it and recover when failures occur. That's what Erlang was designed to do." http://www.ddj.com/201001928?cid=RSSfeed_DDJ_All
  12. Re: Erlang YAWS[ Go to top ]

    My understanding is that Erlang was developed for running apps on machines with a large number of processors.

    No, not quite. Erlang was designed to support massively multithreaded distributed applications.


    I'm not sure what distinction that you are making in your response but this is what Joe Armstrong (the creator of Erlang) said:

    "The Erlang philosophy was always to build system with lots of cheap processors and allow them to fail. We don't prevent failure; we live with it and recover when failures occur. That's what Erlang was designed to do."

    http://www.ddj.com/201001928?cid=RSSfeed_DDJ_All
    I think the distinction he was trying to make is that Erlang wasn't necessarily designed for machines with a lot of processors on-board, it was designed to work well in a distributed environment. It just happens to map nicely onto machines with a lot of processors. The paragraph after the one you quoted from seems to contain this distinction: "Today multi-cores are really like "distributed system on a chip" with very high-speed message passing. Since we have share-nothing and concurrency, Erlang programs map beautifully onto multi-cores. Ericsson is shipping products on dual-cores that run virtually twice as fast as the uni-cores with only tiny changes to the code."
  13. Re: Erlang YAWS[ Go to top ]

    My understanding is that Erlang was developed for running apps on machines with a large number of processors.

    No, not quite. Erlang was designed to support massively multithreaded distributed applications.


    I'm not sure what distinction that you are making in your response but this is what Joe Armstrong (the creator of Erlang) said:

    "The Erlang philosophy was always to build system with lots of cheap processors and allow them to fail. We don't prevent failure; we live with it and recover when failures occur. That's what Erlang was designed to do."

    http://www.ddj.com/201001928?cid=RSSfeed_DDJ_All


    I think the distinction he was trying to make is that Erlang wasn't necessarily designed for machines with a lot of processors on-board, it was designed to work well in a distributed environment. It just happens to map nicely onto machines with a lot of processors. The paragraph after the one you quoted from seems to contain this distinction:

    "Today multi-cores are really like "distributed system on a chip" with very high-speed message passing. Since we have share-nothing and concurrency, Erlang programs map beautifully onto multi-cores. Ericsson is shipping products on dual-cores that run virtually twice as fast as the uni-cores with only tiny changes to the code."
    I don't understand why you are interpreting the term 'processor' to mean 'computer'. Do you think he's that loose with terminology? A "system with lots of processors" could be a single machine with many processors, a lot of single processor machines or a lot of machines with multiple processors. In any event the distinction is really irrelevant to the point. A "system with lots of processors" is inherently concurrent. Erlang is inherently concurrent. If YAWS is built on Erlang and Erlang in inherently able to run on multiple processors, it seems very strange to claim that YAWS is single-threaded.
  14. YAWS[ Go to top ]

    If YAWS is built on Erlang and Erlang in inherently able to run on multiple processors, it seems very strange to claim that YAWS is single-threaded.
    YAWS most often IS singlethreaded (i.e. it uses just one OS-level thread). But at the same time, it can have thousands or Erlang processes.
  15. Re: Erlang YAWS[ Go to top ]

    Actually, Erlang Yaws IS single-threaded (usually) and it also runs a thread per request :)

    Erlang uses its own lightweight threads - like 'green threads' in old JDKs but even more lightweight. Thread creation in Erlang is an _extremely_ fast operation (a new thread creation requires about 100 bytes on the heap), so it's not unusual to have hundreds thousands threads running in a typical Erlang application.

    Erlang also uses non-blocking IO to map its own threads to one OS-level thread.
    Couple things: 1) I thought JRockit supported n/m threads? N green threads running on M kernel threads? Wonder if that would improve Tomcat et. al. vs. the Yaws benchmark? 2) Is there any reason the JVM couldn't support the same massively concurrent threading as erlang? Shouldn't we be getting close to an UBER VM that all languages can run on? 3) I saw the YAWS vs. Apache benchmark. One person commenting on it said that yeah, YAWS could scale 10 times more than Apache, but the result was each thread was processing like 10 bytes / second making the server pretty much worthless anyways. -- Bill Burke JBoss, a division of Red hat http://bill.burkecentral.com
  16. Re: Erlang YAWS[ Go to top ]

    2) Is there any reason the JVM couldn't support the same massively concurrent threading as erlang? Shouldn't we be getting close to an UBER VM that all languages can run on?
    Scala has an Actor model (ala Erlang) in it's standard library. It might be worth looking into how that's designed to see what has already been done in this area on the JVM.
  17. What's the verdict? Without seeing the test itself, it's hard to say. Most modern servlet containers are moving to NIO, because of performance


    Are you sure that's why? I think Willie hit the nail on the head in the first reply. The reason to use NIO is for scalability. It allows you to be able to support more requests than there are threads.
    I KNEW someone was going to pick on that - and I hoped they wouldn't. You're right, James - but in my defense I would like to point out that "performance" can mean runtime performace (how many ns it takes to respond) *or* scalability (how many users it can support.)
  18. I would like to point out that "performance" can mean runtime performace (how many ns it takes to respond) *or* scalability (how many users it can support.)
    That's actually a good point. Maybe that's where this all coming from. Whenever people use the same term for different things, there will be disagreement.
  19. hey guys, please excuse me[ Go to top ]

    Excuse me. As I remember in an offical java document, they also rewrite the IO package based on the refactored NIO package. So....
  20. Re: hey guys, please excuse me[ Go to top ]

    Excuse me. As I remember in an offical java document, they also rewrite the IO package based on the refactored NIO package. So....
    Interesting - but difficult to find. I looked at the JDK sources for 1.6, and found a lot of references to java.nio.FileChannel, which might be what you're referring to - but I'm not sure if this is the same as rewriting IO based on NIO. I mean, a filechannel isn't the same as a polling reader... at least, not that I could see. Any further references? I couldn't find any.
  21. My experience seems to agree[ Go to top ]

    We built a device emulation system to stress test a Telecom Lab Automation software (a weblogic based app). Initially we used io for the emulation system and it was not responsing enough when Test automation software used to make thousands of connections to it in a very short time. So we shifted Emulation system to NIO and found that though more socket connections were easily supported than IO based system teh requests read and responses sent were considerably slow. So NIO hellped scalabilitywise but not quite so when it came to performance.
  22. Re: My experience seems to agree[ Go to top ]

    why is NIO so slow then ?
  23. Re: My experience seems to agree[ Go to top ]

    why is NIO so slow then ?
    NIO is not slow. There were some bugs in the early JDK 1.4 releases, but those got cleaned up by 1.4.2_06 or something like that. There are lots of ways to write bad code using NIO, while there aren't as many easy and obvious ways to write bad code using blocking sockets ("old" IO). So NIO isn't slow, although for a single connection it may be marginally slower than blocking IO as a trade-off for its flexibility. Peace, Cameron Purdy Oracle Coherence: Data Grid for Java and .NET
  24. Re: My experience seems to agree[ Go to top ]

    So NIO isn't slow, although for a single connection it may be marginally slower than blocking IO as a trade-off for its flexibility.
    I had to write a server for an online game some time ago. I didn't even think about synchronous IO, because the server had to support thousands of concurrent players. Creating a thread for each of them would be crazy. Of course, it was multi threaded - several IO threads were reading data from ServerSocketChannel and pushing commands to a thread pool queue. BTW I improved server's performance a lot (about 20%) when I switched from jdk 1.5 to 1.6 and enabled biased thread locking. I am very sceptical when I see suggestions that creating thousands of threads is a good thing. Sooner or later overhead of context switching would outweigh all performance advantages that synchronous IO provides.
  25. Re: My experience seems to agree[ Go to top ]

    We built a device emulation system to stress test a Telecom Lab Automation software (a weblogic based app).

    Initially we used io for the emulation system and it was not responsing enough when Test automation software used to make thousands of connections to it in a very short time.

    So we shifted Emulation system to NIO and found that though more socket connections were easily supported than IO based system teh requests read and responses sent were considerably slow.

    So NIO hellped scalabilitywise but not quite so when it came to performance.
    If you don't mind my asking, what version of the JDK and JVM were used? My impression of Sun is that they don't worry about optimizing a first release but put a lot of effort into optimization afterward. NIO was introduced in 1.4 and they're now up to 1.6 with 1.7 looming. I have to wonder if optimizations have been made in the NIO libraries.
  26. Version[ Go to top ]


    If you don't mind my asking, what version of the JDK and JVM were used? My impression of Sun is that they don't worry about optimizing a first release but put a lot of effort into optimization afterward. NIO was introduced in 1.4 and they're now up to 1.6 with 1.7 looming. I have to wonder if optimizations have been made in the NIO libraries.
    It was JDK 1.4.2.
  27. Re: Version[ Go to top ]

    It was JDK 1.4.2.
    I have to wonder if using 1.6 would produce different test results.