Java Development News:
EmberIO - Dispelling NIO Myths
By Mike Spille
28 Apr 2004 | TheServerSide.com
To review, the main high-level features of EmberIO include:
- Support for many different threading models, and for blocking and non-blocking I/O.
- Automated NIO buffer management, including automated support for partial reads and writes in a non-blocking environment
- Management of one or more NIO Selectors for performing multiplexed I/O. This management is invisible to the application code. This is new to EmberIO as of 0.3 Alpha.
- Ultimately, to provide support for both NIO-based I/O using NIO channels, and non-NIO I/O resources such as multicast sockets. Differences between NIO and non-NIO resources will be completely hidden from application code by default. Note the use of the future tense here ;-). EmberIO doesn't do this yet, but the stage is now set for me to provide such support easily.
- Different I/O models give different performance characteristics, with each model representing a tradeoff between resource utilization, latency, throughput, and the physical size of data being read or written.
- Working extra hard to be both high-performance and correct - so you don't have to :-)
EmberIO Threading ModelsOne of the most interesting EmberIO features is the threading model support - it offers a boatload of them, each of which is targetted towards different I/O scenarios. Almost any scenario may be configured to be used in either a blocking or non-blocking manner (with a few exceptions where the model constrains you to one or the other). On top of this, EmberIO supports both traditional blocking Java I/O semantics (which I'll call BIO here - thanks to James Strachan for keying me on this term!) and newer NIO asynchrous/multiplexed/non-blocking semantics (which I'll just call NIO).
A non-exhaustive list of threading models include:
Creates a dedicated thread for handling accept() calls for a socket server in a blocking manner.
Handles socket server accept() calls in the traditional NIO manner e.g. by using the OP_ACCEPT interest bit in a Selector. For most people this is more of a pain in the ass than anything else, and BLOCKING_ACCEPTOR would be more appropriate.
Uses seperate thread pools for handling READ, WRITE, and PROCESSING events. Each of these will be handled indepedently (at least from a threading perspective). Most useful if you anticipate doing large READs or WRITEs or both. Any of the *_POOL variants aggressively use a PROCESS and WRITE FIFO queues to decouple read/write events from the events themselves and actually performing the physical I/O (e.g. READs and WRITEs happen completely asynchronously).
Uses a unified thread pool for all events. Often useful if you have small READs and WRITEs and want to reduce latency significantly.
Sets up EmberIO threading and I/O to be performed in like old-style BIO. Individual sockets (or what have you) are setup with dedicated blocking reader threads, and writes are performed in a blocking manner directly to the I/O resource. The DEDICATED_READER model is interesting because it opens the door for supporting non-NIO resources. With DEDICATED_READER in place and bypassing the NIO Selector, EmberIO is in a position to deal with any old BIO-style socket-like component - and to make such a resource look the same as NIO resources from an EmberIO user's perspective.
Performs READ operations within the context of a Selector's main event loop thread. This forces all reads to be done in a non-blocking manner (after all, we don't want to block our selector loop!) and can only be used for NIO-enabled I/O resources (e.g. right now, only Sockets).
The models tie into how the default I/O events that EmberIO supports: READ, WRITE, PROCESS, and ACCEPT. Leaving ACCEPT out of the picture for the moment, EmberIO is setup so that data flows in the following manner:
- READ events trigger physical reads into ByteBuffers (which are under control of the app). When a READ gets a full object, it stuffs on the object on a the PROCESS FIFO queue. These READ events are generally triggered by an NIO Selector.
- PROCESS events get triggered when something gets put in the PROCESS queue. This is independent of the Selector.
- WRITE events get triggered when a user calls write() on a ReadWriteEndpoint. What happens here is that we stick the object to write on a WRITE FIFO queue, and then fire the WRITE event. In practice, if we're in non-blocking WRITE mode we try the write first, and if that doesn't work (or is incomplete) we then add OP_WRITE interest to the endpoint and wake up the Selector to deal with it.
- Server reads a request
- Server processes request
- Server writes a response to the request
Of course, the above does not always hold true - some of the BIO-like models circumvent things like the WRITE FIFO queues entirely. Different needs, different strategies, different trade-offs.
Related FeaturesIn addition to the various threading models, EmberIO includes a few other features to help the developer tune the model towards the best balance of resource usage/throughput/latency that matches their needs. Most of these are configurable, but a few are hard-coded into the library to work around known NIO (or plain BIO issues) or to boost performance.
One thing to keep in mind about EmberIO is that all features are to some extent intertwined, and some features negate the use of others. As one example, use of DEDICATED_READER effectively negates the greedy operations feature, since DEDICATED_READER channels bypass the Selector entirely. Keep this in mind when you're trying out the various EmberIO configruations, and above all use your head!
- Greedy operations.
EmberIO inherently is pretty aggressive in its use of threads, but at the same time naive approaches to NIO tend to lead to excessive thread switching. To make matters worse, Selector performance isn't always what it could be. Because of this, greedy operations were born. Greedy operations basically enable a thread to do "N" units of work once it has control. For example, a READER could be configured with greedy ops set to "20", which would mean that the READER would try to read up to 20 objects before relinquishing control. Judicious use of greedy ops can significantly boost your server's throughput by doing work more outside of the Selector, and minimizing use and interruption of the Selector.
It's fairly common for I/O to run away out of control in Java applications. This is true in particular when you're using asynchronous READ and WRITE operations. You can end up in a situation where either your I/O threads are eating all of your CPU time, or just as bad, your PROCESS or WRITE queues start filling up monstrously and you start running out of memory. The initial implementations of EmberIO didn't address this problem at all, and as result it was common under my stress tests to end up with a WRITE queue containing several hundred thousand objects waiting to be written, for the network to be saturated with a storm of packets, and for PROCESS workers to be starved (or at least seriously undernourished). The end result was a serious unbalanced utilization of available resources, out of memory errors under long lasting heavy loads, and latencies in the tens of seconds. Yuck.
As of 0.3 Alpha, EmberIO supports throttles to address this problem. You can configure EmberIO to throttle READs or WRITEs so that they shut off if either the PROCESS or WRITE queues fill up past a certain threshold.
On the READ side, for most models we turn off READ interest if the PROCESS queue passes this throttle threshold, which effectively means we stop reading. This state persists until a restart threshold is reached - basically, the PROCESS queue is drained sufficiently, at which point we re-inject READ interest. If the DEDICATED_READ model is used, the job is much simpler - we just block the READ thread until the restart threshold is reached.
On the WRITE side, we physically block a thread trying to WRITE in most models if the WRITE queue threshold is exceeded. One gotcha here is that we can't do this if the writer is an EIO thread, since we'd be potentially deadlocking ourselves. To get around this, right now EIO will only block on non-EIO threads (this is done by checking the ThreadGroup of the writing thread). A more sophisticated model may be implemented in the future, since this current one is a bit white-trash and naive. For the DEDICATED_WRITE scenario, we don't do anything special right now - we assume the I/O resource itself will throttle us, and in case no WRITE queue is used, so there's nothing to fill up. This should probably be changed in the future, to avoid flooding the network.
- Thread Priority
EmberIO lets you set worker threads' thread priorities along with the pooling strategy you're using. Changing such priorities can have very dramtic results for certain situations - and can also lead to massive thread malnutrition if you're not careful! First-class support for thread priorities are provided in recognition that sometimes not all threads are created equal.
- Read and Write Optimizations
EmberIO has a bunch of optimizations to try to efficiently get data in from your I/O resources and back out again. In particular, we try reads and writes a few times if they are incomplete, even in non-blocking mode. This works extraordinarily well because most JDKs and TCP/IP stacks will often return/write just one byte, even if all the data (or buffer space) you need is there. Quite often a subsequent read/write immediately after will snarf/write the rest of the data.
- Auto Management of Blocking Semantics
EmberIO auto-manages the blocking configuration of your channels so you don't have to. More importantly, it does this in a safe manner that avoids the dreaded IllegalBlockingModeException. While you should generally try to use non-blocking I/O for everything, this isn't always a feasible option, and EmberIO makes sure you can write blocking code within it just as easily as you could in the old BIO model.
Using NIO IntelligentlyNIO has gotten a pretty bad rap in a number of circles. A number of people have gone out, written some NIO code, and concluded: "NIO sucks donkey wood, man!". Invariably the critiques have boiled down to a few observations:
- Latency is too high in NIO - the cost of the Selector and other threads are so much higher than with BIO that it's not worth it.
- Threads are free! Who cares if BIO requires a thread per connection?
- I tried to use NIO and started getting all sorts of exceptions - IllegalBlockingModeException, CancelledKeyException, EOFException, YoMamaWearsArmyBootsException. NIO sucks!
- I switched to NIO and my server throughput plummed by X%
I've looked at several people's NIO solutions, and almost universally they make the following mistakes:
- The OP_ACCEPT interest op is there, so they assume they must use it. So each new connection requires you to pop out of the Selector, find the right thread for your ServerSocket, and then call accept() on it. And yeah, this is slow. The solution to this slowness is simple: don't use OP_ACCEPT, just stick your ServerSocket accept() in a dedicated thread, and watch accept latency disappear. EmberIO supports this directly and transparently with the BLOCKING_ACCEPTOR strategy.
- They lock themselves into one threading model. They either always do non-blocking reads in the Selector thread, or they always do reads from a seperate thread gotten from a thread pool. They tend to always do writes directly to the channel. Then they step back and look at the results - and they see that sometimes their hard-coded model model works well, and in other cases it's not so good. They dither back and forth for awhile and conclude that it's not worth it. EmberIO doesn't lock you into any specific thread model - you can go thread-crazy with seperate thread pools for everything, or you can do non-blocking READ from the Selector, or you can piggyback items onto a single thread pool, or you can go BIO and have a dedicated thread per connection. And you can configure each type of connection you're using differently, so you can have a mix at runtime.
- They use only one Selector. Why not make your code configurable and use "N" Selectors? Why assume that every Selector implementation will scale from 1 connection up to thousands? EmberIO automanages Selectors for you so that you never even have to see them, and can use just 1 or as many as you think is appropriate.
- They get whacked by Socket read and write realities. Most people don't seem to realize that Java sockets
really love to deal with just one byte at first, and then open the flood gates immediately afterwards. For example,
if you do a read quite often your socket will give you just one byte, or just a few - but a read immediately
afterwards will give you a buttload of data. Likewise, writes often will write only a byte or two - and another
write right afterwards will blam out a couple of K. The problem that many people run into is that they code
their non-blocking I/O code rather literally. They try to the read/write, and if it didn't complete they chuck
the socket back onto the Selector with the appropriate interest ops set. The end result looks like this for
reads in many implementations:
- OK, you're sitting in select().
- You pop out of Select, with OP_READ set to ready.
- You delegate to some thread.
- The reader thread sets up their ByteBuffer with N bytes, does channel.read(). Gets 1 byte - nuts, that's not all my data!
- Throw the thing back on the Selector since we didn't get all the data. This requires you to wakeup the Selector with wakeup().
- Selector goes back into select(), immediately pops out with OP_READ ready again!
- You delegate to some thread - again.
- You do channel.read() again on the pre-setup ByteBuffer. _Now_ you get all the rest of your bytes!
Writes follow a similar pattern. Except that they first add WRITE-interest, then wakeup the selector to get that to "take", then end up going through the cycle twice just like for reads. EmberIO was coded with full awareness of this odd quirk of sockets, and if a non-blocking read or write does not complete on the first try it tries it again immediately. Knowing what you know now, you won't be surprised to hear that this tiny optimization boosts throughput by more than 50%.
- At best, they do one operation at a time. Once a read or write is successful - they bolt right back into the Selector. Argh! Why do that? EmberIO supports "greedy operations", so it'll try "N" operations in a thread before popping you back into the Selector (with "N" configurable, and of course it gives up if it just can't read/write/process/whatever). This little optimization again boosts throughput by about 25%
- They fight against NIO instead of bowing to reality. The threading rules for Selectors and blocking modes and the like drive them crazy, and they howl at the moon in frustration as they keep getting various Exceptions. I'll be the first to admit that threading rules for NIO are a pain in the ass, but I'm not going to pull out my hair and rage around the room because of it. NIO is what it is, and EmberIO is coded with NIO realities in mind. It deals with all the rules so you don't have to.
NOTE: Mike's entry lives on JRoller, the free, Java-powered weblogs brought to you by Javalobby.org