Java Development News:

EmberIO - Dispelling NIO Myths

By Mike Spille

28 Apr 2004 | TheServerSide.com

Apparently I've gotten XA stuff completely out of my system - and now I'm in a full-blown NIO obsession to compensate :-). I'm talking about EmberIO, of course, which as I briefly reported yesterday has been released on Sourceforge at version 0.3 Alpha. This blog entry will talk some more about EmberIO's strengths, more about the vision behind it, and try to show where EmberIO succeeds where other similar attempts have failed in the past.

To review, the main high-level features of EmberIO include:
  • Support for many different threading models, and for blocking and non-blocking I/O.
  • Automated NIO buffer management, including automated support for partial reads and writes in a non-blocking environment
  • Management of one or more NIO Selectors for performing multiplexed I/O. This management is invisible to the application code. This is new to EmberIO as of 0.3 Alpha.
  • Ultimately, to provide support for both NIO-based I/O using NIO channels, and non-NIO I/O resources such as multicast sockets. Differences between NIO and non-NIO resources will be completely hidden from application code by default. Note the use of the future tense here ;-). EmberIO doesn't do this yet, but the stage is now set for me to provide such support easily.
  • Different I/O models give different performance characteristics, with each model representing a tradeoff between resource utilization, latency, throughput, and the physical size of data being read or written.
  • Working extra hard to be both high-performance and correct - so you don't have to :-)
The biggest bang for the buck you'll get out of EmberIO is in the buffer management, built-in support for a variety of threading models, and from generally getting rid of the pain usually associated with NIO. Also keep in mind that while one of the initial motivations behind EmberIO is to make using NIO easier, it is being expanded to handle non-NIO I/O resources as well. EmberIO is moving towards being a general low-level I/O sub-system that puts a single consistent interface on top of disperate I/O resources and threading models.

EmberIO Threading Models

One of the most interesting EmberIO features is the threading model support - it offers a boatload of them, each of which is targetted towards different I/O scenarios. Almost any scenario may be configured to be used in either a blocking or non-blocking manner (with a few exceptions where the model constrains you to one or the other). On top of this, EmberIO supports both traditional blocking Java I/O semantics (which I'll call BIO here - thanks to James Strachan for keying me on this term!) and newer NIO asynchrous/multiplexed/non-blocking semantics (which I'll just call NIO).

A non-exhaustive list of threading models include:
  • BLOCKING_ACCEPTOR
    Creates a dedicated thread for handling accept() calls for a socket server in a blocking manner.

  • NONBLOCKING_ACCEPTOR
    Handles socket server accept() calls in the traditional NIO manner e.g. by using the OP_ACCEPT interest bit in a Selector. For most people this is more of a pain in the ass than anything else, and BLOCKING_ACCEPTOR would be more appropriate.

  • USE_RWP_POOLS
    Uses seperate thread pools for handling READ, WRITE, and PROCESSING events. Each of these will be handled indepedently (at least from a threading perspective). Most useful if you anticipate doing large READs or WRITEs or both. Any of the *_POOL variants aggressively use a PROCESS and WRITE FIFO queues to decouple read/write events from the events themselves and actually performing the physical I/O (e.g. READs and WRITEs happen completely asynchronously).

  • USE_ONE_POOL
    Uses a unified thread pool for all events. Often useful if you have small READs and WRITEs and want to reduce latency significantly.

  • DEDICATED_READER
    Sets up EmberIO threading and I/O to be performed in like old-style BIO. Individual sockets (or what have you) are setup with dedicated blocking reader threads, and writes are performed in a blocking manner directly to the I/O resource. The DEDICATED_READER model is interesting because it opens the door for supporting non-NIO resources. With DEDICATED_READER in place and bypassing the NIO Selector, EmberIO is in a position to deal with any old BIO-style socket-like component - and to make such a resource look the same as NIO resources from an EmberIO user's perspective.

  • SELECTOR_READER
    Performs READ operations within the context of a Selector's main event loop thread. This forces all reads to be done in a non-blocking manner (after all, we don't want to block our selector loop!) and can only be used for NIO-enabled I/O resources (e.g. right now, only Sockets).
The above models are configurable per connection type. So, for example, you could configure one type of connection using the DEDICATED_READER strategy for the lowest possible latency, and another type of connection could be configured for USE_RWP_POOLS where many connections are expected with variable I/O sizes, some of them potentially large, and meanwhile setup all your server sockets using the BLOCKING_ACCEPTOR strategy.

The models tie into how the default I/O events that EmberIO supports: READ, WRITE, PROCESS, and ACCEPT. Leaving ACCEPT out of the picture for the moment, EmberIO is setup so that data flows in the following manner:
  • READ events trigger physical reads into ByteBuffers (which are under control of the app). When a READ gets a full object, it stuffs on the object on a the PROCESS FIFO queue. These READ events are generally triggered by an NIO Selector.
  • PROCESS events get triggered when something gets put in the PROCESS queue. This is independent of the Selector.
  • WRITE events get triggered when a user calls write() on a ReadWriteEndpoint. What happens here is that we stick the object to write on a WRITE FIFO queue, and then fire the WRITE event. In practice, if we're in non-blocking WRITE mode we try the write first, and if that doesn't work (or is incomplete) we then add OP_WRITE interest to the endpoint and wake up the Selector to deal with it.
EmberIO takes advantage of the fact that I/O and PROCESS often flow in this sort of manner:
  • Server reads a request
  • Server processes request
  • Server writes a response to the request
Since this is so common, EmberIO is setup to try to accomodate this directly. Handlers for various events are ordered in READ-PROCESS-WRITE order (for use when a single common thread pool is used) and EmberIO tries to execute events directly before getting the Selector involved. So it's common for EmberIO to get triggered for a READ, and then perform the READ, the PROCESS, and the WRITE in one chunk - with the Selector involved only in the initial READ triggering. Adding in greedy operations (described below) means we can burstily do "N" such operations without involving the Selector.

Of course, the above does not always hold true - some of the BIO-like models circumvent things like the WRITE FIFO queues entirely. Different needs, different strategies, different trade-offs.

Related Features

In addition to the various threading models, EmberIO includes a few other features to help the developer tune the model towards the best balance of resource usage/throughput/latency that matches their needs. Most of these are configurable, but a few are hard-coded into the library to work around known NIO (or plain BIO issues) or to boost performance.

One thing to keep in mind about EmberIO is that all features are to some extent intertwined, and some features negate the use of others. As one example, use of DEDICATED_READER effectively negates the greedy operations feature, since DEDICATED_READER channels bypass the Selector entirely. Keep this in mind when you're trying out the various EmberIO configruations, and above all use your head!
  • Greedy operations.
    EmberIO inherently is pretty aggressive in its use of threads, but at the same time naive approaches to NIO tend to lead to excessive thread switching. To make matters worse, Selector performance isn't always what it could be. Because of this, greedy operations were born. Greedy operations basically enable a thread to do "N" units of work once it has control. For example, a READER could be configured with greedy ops set to "20", which would mean that the READER would try to read up to 20 objects before relinquishing control. Judicious use of greedy ops can significantly boost your server's throughput by doing work more outside of the Selector, and minimizing use and interruption of the Selector.


  • Throttles.
    It's fairly common for I/O to run away out of control in Java applications. This is true in particular when you're using asynchronous READ and WRITE operations. You can end up in a situation where either your I/O threads are eating all of your CPU time, or just as bad, your PROCESS or WRITE queues start filling up monstrously and you start running out of memory. The initial implementations of EmberIO didn't address this problem at all, and as result it was common under my stress tests to end up with a WRITE queue containing several hundred thousand objects waiting to be written, for the network to be saturated with a storm of packets, and for PROCESS workers to be starved (or at least seriously undernourished). The end result was a serious unbalanced utilization of available resources, out of memory errors under long lasting heavy loads, and latencies in the tens of seconds. Yuck.

    As of 0.3 Alpha, EmberIO supports throttles to address this problem. You can configure EmberIO to throttle READs or WRITEs so that they shut off if either the PROCESS or WRITE queues fill up past a certain threshold.

    On the READ side, for most models we turn off READ interest if the PROCESS queue passes this throttle threshold, which effectively means we stop reading. This state persists until a restart threshold is reached - basically, the PROCESS queue is drained sufficiently, at which point we re-inject READ interest. If the DEDICATED_READ model is used, the job is much simpler - we just block the READ thread until the restart threshold is reached.

    On the WRITE side, we physically block a thread trying to WRITE in most models if the WRITE queue threshold is exceeded. One gotcha here is that we can't do this if the writer is an EIO thread, since we'd be potentially deadlocking ourselves. To get around this, right now EIO will only block on non-EIO threads (this is done by checking the ThreadGroup of the writing thread). A more sophisticated model may be implemented in the future, since this current one is a bit white-trash and naive. For the DEDICATED_WRITE scenario, we don't do anything special right now - we assume the I/O resource itself will throttle us, and in case no WRITE queue is used, so there's nothing to fill up. This should probably be changed in the future, to avoid flooding the network.

  • Thread Priority
    EmberIO lets you set worker threads' thread priorities along with the pooling strategy you're using. Changing such priorities can have very dramtic results for certain situations - and can also lead to massive thread malnutrition if you're not careful! First-class support for thread priorities are provided in recognition that sometimes not all threads are created equal.

  • Read and Write Optimizations
    EmberIO has a bunch of optimizations to try to efficiently get data in from your I/O resources and back out again. In particular, we try reads and writes a few times if they are incomplete, even in non-blocking mode. This works extraordinarily well because most JDKs and TCP/IP stacks will often return/write just one byte, even if all the data (or buffer space) you need is there. Quite often a subsequent read/write immediately after will snarf/write the rest of the data.

  • Auto Management of Blocking Semantics
    EmberIO auto-manages the blocking configuration of your channels so you don't have to. More importantly, it does this in a safe manner that avoids the dreaded IllegalBlockingModeException. While you should generally try to use non-blocking I/O for everything, this isn't always a feasible option, and EmberIO makes sure you can write blocking code within it just as easily as you could in the old BIO model.
The above features, in combination with the various threading models, are designed to get the most out of the Java I/O system, and to allow users to easily change the runtime model to match their needs. And while doing so, dispelling some myths about NIO along the way.

Using NIO Intelligently

NIO has gotten a pretty bad rap in a number of circles. A number of people have gone out, written some NIO code, and concluded: "NIO sucks donkey wood, man!". Invariably the critiques have boiled down to a few observations:
  • Latency is too high in NIO - the cost of the Selector and other threads are so much higher than with BIO that it's not worth it.
  • Threads are free! Who cares if BIO requires a thread per connection?
  • I tried to use NIO and started getting all sorts of exceptions - IllegalBlockingModeException, CancelledKeyException, EOFException, YoMamaWearsArmyBootsException. NIO sucks!
  • I switched to NIO and my server throughput plummed by X%
To some extent these critiques are true - programming NIO directly is more difficult than BIO, and the "obvious" way to code NIO is highly inefficient. But the real problem is that most people who have tried NIO have only tried it the "obvious" way, and never delved very deeply into tuning their code to how NIO really works under the covers.

I've looked at several people's NIO solutions, and almost universally they make the following mistakes:
  • The OP_ACCEPT interest op is there, so they assume they must use it. So each new connection requires you to pop out of the Selector, find the right thread for your ServerSocket, and then call accept() on it. And yeah, this is slow. The solution to this slowness is simple: don't use OP_ACCEPT, just stick your ServerSocket accept() in a dedicated thread, and watch accept latency disappear. EmberIO supports this directly and transparently with the BLOCKING_ACCEPTOR strategy.
  • They lock themselves into one threading model. They either always do non-blocking reads in the Selector thread, or they always do reads from a seperate thread gotten from a thread pool. They tend to always do writes directly to the channel. Then they step back and look at the results - and they see that sometimes their hard-coded model model works well, and in other cases it's not so good. They dither back and forth for awhile and conclude that it's not worth it. EmberIO doesn't lock you into any specific thread model - you can go thread-crazy with seperate thread pools for everything, or you can do non-blocking READ from the Selector, or you can piggyback items onto a single thread pool, or you can go BIO and have a dedicated thread per connection. And you can configure each type of connection you're using differently, so you can have a mix at runtime.
  • They use only one Selector. Why not make your code configurable and use "N" Selectors? Why assume that every Selector implementation will scale from 1 connection up to thousands? EmberIO automanages Selectors for you so that you never even have to see them, and can use just 1 or as many as you think is appropriate.
  • They get whacked by Socket read and write realities. Most people don't seem to realize that Java sockets really love to deal with just one byte at first, and then open the flood gates immediately afterwards. For example, if you do a read quite often your socket will give you just one byte, or just a few - but a read immediately afterwards will give you a buttload of data. Likewise, writes often will write only a byte or two - and another write right afterwards will blam out a couple of K. The problem that many people run into is that they code their non-blocking I/O code rather literally. They try to the read/write, and if it didn't complete they chuck the socket back onto the Selector with the appropriate interest ops set. The end result looks like this for reads in many implementations:
    • OK, you're sitting in select().
    • You pop out of Select, with OP_READ set to ready.
    • You delegate to some thread.
    • The reader thread sets up their ByteBuffer with N bytes, does channel.read(). Gets 1 byte - nuts, that's not all my data!
    • Throw the thing back on the Selector since we didn't get all the data. This requires you to wakeup the Selector with wakeup().
    • Selector goes back into select(), immediately pops out with OP_READ ready again!
    • You delegate to some thread - again.
    • You do channel.read() again on the pre-setup ByteBuffer. _Now_ you get all the rest of your bytes!
    So each READ, even for small amounts of data, ends up requiring two selector wakeups, two thread delegations, and two actual read operations. Yuck!

    Writes follow a similar pattern. Except that they first add WRITE-interest, then wakeup the selector to get that to "take", then end up going through the cycle twice just like for reads. EmberIO was coded with full awareness of this odd quirk of sockets, and if a non-blocking read or write does not complete on the first try it tries it again immediately. Knowing what you know now, you won't be surprised to hear that this tiny optimization boosts throughput by more than 50%.
  • At best, they do one operation at a time. Once a read or write is successful - they bolt right back into the Selector. Argh! Why do that? EmberIO supports "greedy operations", so it'll try "N" operations in a thread before popping you back into the Selector (with "N" configurable, and of course it gives up if it just can't read/write/process/whatever). This little optimization again boosts throughput by about 25%
  • They fight against NIO instead of bowing to reality. The threading rules for Selectors and blocking modes and the like drive them crazy, and they howl at the moon in frustration as they keep getting various Exceptions. I'll be the first to admit that threading rules for NIO are a pain in the ass, but I'm not going to pull out my hair and rage around the room because of it. NIO is what it is, and EmberIO is coded with NIO realities in mind. It deals with all the rules so you don't have to.
As I said, lots of people have tried out NIO and gave it a big stinky thumbs down - but that thumbs down was typically based on writing a couple of hundred lines of code (or even tens of lines!) and giving up early when the results weren't encouraging. EmberIO approaches all of this from a pragmatic perspective - I want to do some I/O. I want it fast, I want it correct, and I want it configurable to varying I/O needs. Increasingly, EmberIO is actually become NIO-agnostic - if you want Selectors and lots of threads, it'll give you that. But if you want something closer to the old BIO model, it will give you that, too. And most interestingly (to me at least), it can also give you compromises somewhere in the middle between BIO and super-aggresive NIO. I'm sure a group of people would say that EmberIO "cheats" - it does little more than encode a number of tricks to work around NIO limitations. To which I'd say "Yeah - and so what?!". The important point is that it gets the job done. And in many scenarios EmberIO can give you the benefits of NIO - direct buffers, easy non-blocking I/O, configurable thread pools - with almost no sacrifices in throughput and latency. And it scales - you can do servers on little machines with limited resources using a pooled non-blocking strategy, and scale it up to bigger loads and hardware by configuring more Selectors, changing the I/O strategy, changing the threading model, tweaking parameters like the throttles and greedy operations. And if the situation calls for it, you can even set it up to do plain old BIO. In the end, with EmberIO the point isn't a religious argument of NIO vs. BIO, or blocking vs. non-blocking, or wars over different threading models - or even about bitching at Sun. The point is to give you options and let the user decide what works best for their situation.
NOTE: Mike's entry lives on JRoller, the free, Java-powered weblogs brought to you by Javalobby.org

Related Resources