Ever wonder what your code looks like once it has been digested by the javac compiler or the JIT? In his blog
Mark Lam shows you all of this in the process of explaining how adaptive code can be faster than Java-specific processors.
The article starts with a description of hardware acceleration and how it has been helpful. For example, co-processors, processors for sound and graphics have proven to be quite useful in accelerating applications. Mark asserts that the success experienced in these specific cases has lead to the misconception that hardware acceleration can match the performance provided by a JIT.
To demonstrate just how a JIT can out perform hardware acceleration, Mark does an analysis of what happens when one executes a = a + b;. First, the byte code and then the assembler is listed just to show what exactly takes place. With these listings in place, Mark simply counts the number of instructions that both the accelerator and the JIT will execute. The assumption is that fewer instructions will execute in less time. There are mitigating factors.
Based on the StrongARM 1110 processor, under optimal conditions where all accessed memory is already in cache, each of the instructions will take at least one machine cycle (if I remember correctly). If there is a cache miss, memory accesses can take up to 4 machine cycles assuming no virtual memory paging stalls. Otherwise, it'll take a lot more time.
Using a number of assumptions Mark proceeds to calculate the number of machine cycles needed to complete the addition.
The results, summarized in a table, clearly demonstrate that the general purpose CPU with a JIT consistently out performed hardware acceleration. Even though Mark recognizes that one may never realize the the speedups one as high as 83x), there is a recognizable performance boost.
Mark moves on to demonstrate how the situation is even better for JITs when one considers complex bytecodes, inlining and other tricks employed to optimize code. With all of these advantages one would begin to wonder if it is worth it to consider hardware accelerators for Java.
Regardless, the JPU does have value. While it cannot compete with a JIT in terms of performance, it will perform much better than a software interpreter. Another reason to use the JPU is for extremely memory constraint environments where you have absolutely no memory budget to spare. The JIT adds 10s to 100s of kilobytes of memory consumption. If this is not a cost that can be tolerated, a JPU may be the best solution for performance.
There are reasons to be hopeful that Java processors may become useful outside of a few special cases. Mark ends by postulating that with an advanced Java processor there are some possibilities of the situation changing. While it may not speak for all cases, it will help in a number of common cases.