It turns out that running application servers on the latest 64-bit Itanium II systems may not offer a sufficient performance boost to justify the cost over existing 32-bit Xeon hardware. Just-in-time (JIT) compilers for both Java and .net do not yet produce machine code that can take advantage of unique features in Itanium processors.
Read Itanium stumbles on software code
I found this article on Computing Weekly and thought I'd post it here to see what people think? and does anyone know whether Sun / BEA etc. are addressing this?
-
64-bit Itanium II not much better than Xeons for serverside apps (18 messages)
- Posted by: Nadeem Shabir
- Posted on: July 08 2003 10:45 EDT
Threaded Messages (18)
- Itanium II rules by Mileta Cekovic on July 10 2003 12:03 EDT
- Itanium II rules? by Cameron Purdy on July 10 2003 12:30 EDT
- JRockit 8.1 Supports Itanium II by Nadeem Shabir on July 11 2003 02:53 EDT
- Itanium II rules by Tom Gardner on July 10 2003 16:34 EDT
- Itanium II rules by Peter English on July 10 2003 04:39 EDT
- EPIC compilers by Artem Kornilov on July 10 2003 05:49 EDT
- Itanium II rules? by Cameron Purdy on July 10 2003 12:30 EDT
- 64-bit Itanium II not much better than Xeons for serverside apps by Krzysztof Swietlinski on July 10 2003 12:46 EDT
- 64-bit Itanium II not much better than Xeons for serverside apps by Brian Miller on July 10 2003 16:05 EDT
- Memory is the most important aspect of 64-bit computing by Krzysztof Swietlinski on July 10 2003 07:22 EDT
- 64-bit Itanium II not much better than Xeons for serverside apps by Brian Miller on July 10 2003 16:05 EDT
- BEA JRockit, Itanium by Sean Sullivan on July 10 2003 14:44 EDT
- The comparison is not valid by Ricardo Morin on July 10 2003 19:46 EDT
- The comparison is not valid by Tom Gardner on July 11 2003 08:16 EDT
-
The comparison is not valid by Ricardo Morin on July 11 2003 03:53 EDT
-
The comparison is not valid by Cameron Purdy on July 11 2003 04:52 EDT
- The comparison is not valid by Ricardo Morin on July 11 2003 05:32 EDT
- The comparison is not valid by Tom Gardner on July 11 2003 06:41 EDT
-
The comparison is not valid by Cameron Purdy on July 11 2003 04:52 EDT
-
The comparison is not valid by Ricardo Morin on July 11 2003 03:53 EDT
- The comparison is not valid by Arvind Jain on July 11 2003 20:00 EDT
- impressive either way by Cameron Purdy on July 11 2003 11:44 EDT
- The comparison is not valid by Tom Gardner on July 11 2003 08:16 EDT
-
Itanium II rules[ Go to top ]
- Posted by: Mileta Cekovic
- Posted on: July 10 2003 12:03 EDT
- in response to Nadeem Shabir
Hi all,
Itanium II just needs an optimized JVM. BUT, nobody has enough interest in making it. Sun obviosly does not have interest, nor Microsoft.
HP could make one.
MC -
Itanium II rules?[ Go to top ]
- Posted by: Cameron Purdy
- Posted on: July 10 2003 12:30 EDT
- in response to Mileta Cekovic
The only benchmarks that really show the Itanium family "ruling" are SSL connection handling benchmarks. Since (for high-connection sites) that is mostly done by the DSP accelerator cards that plug into the hardware load balancers, I'm not sure where Itanium is going to shine.
There was an interesting article a while back on Ace's about Java performance on various processors, including the new 64-bit AMD processor. Unfortunately, it did not include benchmarks on Itanium. The only Java benchmarks I've seen on Itanium were very poor, but you're right, that could be the early state of the JVMs.
Peace,
Cameron Purdy
Tangosol, Inc.
Coherence: Easily share live data across a cluster! -
JRockit 8.1 Supports Itanium II[ Go to top ]
- Posted by: Nadeem Shabir
- Posted on: July 11 2003 02:53 EDT
- in response to Cameron Purdy
JRockit 8.1 supposedly now supports Itanium II, I'm not sure to what degree but at least they seem to be moving in the right direction. -
Itanium II rules[ Go to top ]
- Posted by: Tom Gardner
- Posted on: July 10 2003 16:34 EDT
- in response to Mileta Cekovic
Itanium II just needs an optimized JVM. BUT,
> nobody has enough interest in making it.
It isn't that simple. The IA64 concept is based on using extreme compile-time optimisation, plus a number of techniques that avoid the worst of the problems that arise when trying to optimise C/C++ and similar languages. The compile-time optimisation is based on computationally-expensive static code analysis, whereas Hotspot presumes computationally-cheap run-time dynamic analysis.
For a number of years I have presumed, without evidence, that IA64 would turn out to be good for "number-crunching" applications (possibly where the inner-loops are hand-coded), and possibly for hand-optimised conventional JIT JVM operations. I have been unconvinced about its benefits for gerneral-purpose "server type" codes.
Those presumptions are weakly supported by the remarkably few benchmarks that have been released for any of the IA64 family, e.g. SSL and the JVM benchmarks. -
Itanium II rules[ Go to top ]
- Posted by: Peter English
- Posted on: July 10 2003 16:39 EDT
- in response to Tom Gardner
It would be nice to see how well the AMD Opteron based servers compare. -
EPIC compilers[ Go to top ]
- Posted by: Artem Kornilov
- Posted on: July 10 2003 17:49 EDT
- in response to Tom Gardner
To put things into a perspective with respect to compilers, consider the following.
There was a computer named Elbrus 3, which had an EPIC architecture somewhat similar to the Itanium. So, the highly optimizing Fortran compiler for it was making more than 60 passes over the code representation to produce an efficient code.
Taking that into consideration, I do not think that the dynamic compilation would be that much cheaper. What the dynamic analysis might bring is the information about the most often executed code, memory access patterns and virtual function call substitution. After dynamic analysis is performed a JIT would still need to fall back onto the classical optimization and compilation techniques in order to produce the efficient code.
Artem -
64-bit Itanium II not much better than Xeons for serverside apps[ Go to top ]
- Posted by: Krzysztof Swietlinski
- Posted on: July 10 2003 12:46 EDT
- in response to Nadeem Shabir
One main advantege of using Itanium over Pentium would be amount of memory JVM can use. That can enable much bigger caches or even solutions like object prevalence that could have significant impact on overall application performance - much bigger than any raw processing speed gain.
-- Krzysztof -
64-bit Itanium II not much better than Xeons for serverside apps[ Go to top ]
- Posted by: Brian Miller
- Posted on: July 10 2003 16:05 EDT
- in response to Krzysztof Swietlinski
One main advantege of using Itanium over Pentium would be amount of memory JVM can use. That can enable much bigger caches or even solutions like object prevalence...
I don't understand. What's address space got to do with object prevalance? -
Memory is the most important aspect of 64-bit computing[ Go to top ]
- Posted by: Krzysztof Swietlinski
- Posted on: July 10 2003 19:22 EDT
- in response to Brian Miller
I don't understand. What's address space got to do with object prevalance?
Applicability of bbject prevalence is limited by amount of memory. You can only put so many objects in 1.6GB you can address on Pentium (at least on Windows). If you run JDK 1.4 on Itanium II and any 64-bit OS you can address (and thus use) much more memory (I'm not sure what's the exact limit, but in theory it could be in 10^9 GB order of magnitude).
I personally believe that not very long from now we gonna buy computers with 100s of GB of RAM (at least on enterprise level), and that will change many architectural patterns we are using now significantly.
-- Krzysztof -
BEA JRockit, Itanium[ Go to top ]
- Posted by: Sean Sullivan
- Posted on: July 10 2003 14:44 EDT
- in response to Nadeem Shabir
Has anybody tried BEA's JRockit on Intel's Itanium?
http://news.com.com/2110-1001-984878.html
http://edocs.bea.com/wljrockit/docs81/certif.html
http://www.intel.com/ebusiness/affiliates/bea/index.htm -
The comparison is not valid[ Go to top ]
- Posted by: Ricardo Morin
- Posted on: July 10 2003 19:46 EDT
- in response to Nadeem Shabir
The Computer Weekly article asserts that "Intel's latest 64-bit systems will give little or no advantage over 32-bit systems running application server software based on code from Java or Microsoft," citing SPECjAppServer2002 Java benchmark results for Itanium 2-based and Intel Xeon processor-based systems as the basis for this assessment.
The comparison of a multi-node configuration with a dual-node configuration results in a skewed perspective of the two processors' performance and price/performance (in fact, such comparisons violate the SPEC Fair Use Rules; see http://www.spec.org/jAppServer2002/docs/RunRules.html#S3_5 for more information). There are good reasons why these types of comparisons across categories are not allowed. For one, the scaling characteristics of dual node configurations and multiple node configurations are different (scale out versus scale up). In addition, price/performance comparisons across categories are not really meaningful. One of the reasons why results are categorized is to group systems according to their maintenance and support requirement similarities, thus ensuring that price/performance comparisons are as fair as possible.
Two other aspects make the comparison in the article invalid: First, the Itanium 2-based result the article referenced uses the older Itanium 2 processor 3M @ 1 GHz, not the newest Itanium 2 processor 6M @ 1.5 GHz(introduced last week). Second, the software stacks used by the two publications are completely different: WebLogic versus WebSphere, IBM JVM versus HP JVM and Windows versus HP-UX.
Unfortunately, because SPECjAppServer2002 is a relatively new benchmark, and it is complex and expensive to run, there are a limited number of results posted as of today to use in fair comparisons. We hope this will change as we move forward.
Alternatively, SPECjbb2000 is another Java benchmark that has been in use longer and has more results available for comparison (http://www.spec.org/jbb2000/results/jbb2000.html). Note that the Itanium 2 processor holds the highest four-processor performance, with a result of 116K ops/s (vs. 76K ops/s for an Intel Xeon processor MP-based system).
Finally, the article also incorrectly asserted that "the EPIC architecture does not lend itself well to just-in-time compilers used for Java because these generate a continuous stream of instructions." In fact, just-in-time (JIT) compilers do have the ability to schedule instructions taking full advantage of the EPIC architecture. Moreover, JIT technology has the advantage of being able to dynamically generate code that is optimized for the actual run-time characteristics of the application, as the JIT has the ability to profile and select the hottest methods to optimize. While there is an added burden to the compilation process as compared with non-EPIC processors, there is nothing in the EPIC architecture that would prevent a JIT from scheduling instructions that take advantage of the exposed processor parallelism.
Ricardo Morin
Software and Solutions Group
Intel Coporation -
The comparison is not valid[ Go to top ]
- Posted by: Tom Gardner
- Posted on: July 11 2003 08:16 EDT
- in response to Ricardo Morin
Isn't benchmark(et)ing fun! There are just _so_ many holes
that "the other side" can claim you have fallen into :)
>In fact, just-in-time (JIT) compilers do have the ability to
>schedule instructions taking full advantage of the EPIC
>architecture. Moreover, JIT technology has the advantage
>of being able to dynamically generate code that is optimized
>for the actual run-time characteristics of the application,
>as the JIT has the ability to profile and select the hottest
>methods to optimize. While there is an added burden to the
>compilation process as compared with non-EPIC processors,
>there is nothing in the EPIC architecture that would prevent
>a JIT from scheduling instructions that take advantage of
>the exposed processor parallelism.
> Ricardo Morin
Are you distinguishing classic JIT technology from
Sun's HotSpot technology in that respect?
How would you characterise and quantify the extent
of the "added burden"? -
The comparison is not valid[ Go to top ]
- Posted by: Ricardo Morin
- Posted on: July 11 2003 15:53 EDT
- in response to Tom Gardner
Hi Tom:
> Isn't benchmark(et)ing fun! There are just _so_ many holes
> that "the other side" can claim you have fallen into :)
>
We need to make sure that all claims are supported with valid data.
> Are you distinguishing classic JIT technology from
> Sun's HotSpot technology in that respect?
No, I was not specifically referring to Sun's HotSpot technology. Any modern JIT compiler needs to include some form of profiling to selectively optimize code. There is an interesting article here, which illustrates many of the tasks a JIT needs to accomplish (includes some discussion of JITing for Itanium):
http://www.intel.com/technology/itj/2003/volume07issue01/art02_starjit/p01_abstract.htm
(BTW, there are other articles on the same issue that may be of interest to this community: http://www.intel.com/technology/itj/2003/volume07issue01/ )
> How would you characterise and quantify the extent
> of the "added burden"?
One of the key characteristics of the EPIC architecture is the shift from the hardware to the compiler (static or JIT) for exploiting instruction level parallelism. Itanium 2 is an in-order machine, allows multiple instructions to be issued in parallel, and it includes a number of explicit features that enable low level compiler optimizations, such as predication, speculation, and a large register set (among others). So there is more work for the compiler to do as compared with other architectures, but at the same time, the opportunities for improving performance over time through more advanced optimization techniques are much greater.
This is another reason why it is important for the JIT to rely on profiling, as you do not want to waste time optimizing code that would not yield performance benefits (e.g. rarely executed).
Thanks,
Ricardo -
The comparison is not valid[ Go to top ]
- Posted by: Cameron Purdy
- Posted on: July 11 2003 16:52 EDT
- in response to Ricardo Morin
So why doesn't Intel publish the "hotspot" for the Itanium? The traditional "JIT" isn't going to do the trick, when the code can be so much more "hyper optimized" for the Itanium. Since the hotspot JVM already does some profiling as it runs, that model should be able to be extended to produce some relatively large bundles for epic (i.e. it should be possible to make relatively good use of the processor.)
Peace,
Cameron Purdy
Tangosol, Inc.
Coherence: Easily share live data across a cluster! -
The comparison is not valid[ Go to top ]
- Posted by: Ricardo Morin
- Posted on: July 11 2003 17:32 EDT
- in response to Cameron Purdy
Hi Cameron:
BEA JRockit is optimized for the Itanium Processor Family. Check out this article: http://cedar.intel.com/media/pdf/Java_64bit_final.pdf
Thank you,
Ricardo -
The comparison is not valid[ Go to top ]
- Posted by: Tom Gardner
- Posted on: July 11 2003 18:41 EDT
- in response to Cameron Purdy
So why doesn't Intel publish the "hotspot" for the Itanium?
You may find that "has" to come from HP.
> The traditional "JIT" isn't going to do the trick, when
> the code can be so much more "hyper optimized" for the Itanium.
I've seen nothing to demonstrate that conjecture is true (or
false); I would *really*& like to see an dependable answer.
I suspect the key issue is how many processor cycles
have to be expended on optimising down to the bundle level.
I also expect that the time rises as the square of the
number of instructions being optimised.
Certainly the optimisations depend critically of the
*details* of the processor's internal micro-organisation
and on that of the memory controller. I've seen somebody
discuss how they spent months hand-coding the inner loops
of vector number-crunching algorithms; they were
distressed when Intel modified the internal pipeline since
they had to recode everything from scratch. Maybe the
optimising compilers have reached the state in which
the optimisation is fully automated, but it is known to be
a seriously hard compiler problem.
> Since the hotspot JVM already does some profiling as
> it runs, that model should be able to be extended
> to produce some relatively large bundles for epic
> (i.e. it should be possible to make relatively good
> use of the processor.)
One would think so, but the *practical* ability to
do such "extension" is not clear in my mind. If it
isn't practical, then I conjecture that hand-optimisation
of a classic JIT would be almost as good. -
The comparison is not valid[ Go to top ]
- Posted by: Arvind Jain
- Posted on: July 11 2003 20:00 EDT
- in response to Ricardo Morin
Alternatively, SPECjbb2000 is another Java benchmark that has been in use
> longer and has more results available for comparison
> (http://www.spec.org/jbb2000/results/jbb2000.html). Note that the
> Itanium 2 processor holds the highest four-processor performance, with a
> result of 116K ops/s (vs. 76K ops/s for an Intel Xeon processor MP-based
> system).
We have recently been involved with a couple H/W OEMs to publish new SPECjbb2000 results on the following two system configs based on the latest Intel processors running with the latest version of BEA WebLogic JRockit:
(1) 4-way Itanium 2 (1.5 GHz) system running the 64-bit JRockit JVM (optimized for Itanium 2)
(2) 4-way Xeon (2.8 GHz) system running the 32-bit JRockit JVM (optmized for IA32 Xeon)
Since they are still under review, I canmot disclose the actual results until they are published (sometime within the next couple weeks). But, for the sake of this discussion, the latest 64-bit JRockit JVM, that is optimized to take advantage of the EPIC architecture of Itanium and does the necessary code scheduing and optimizations, _outperforms_ the latest 32-bit JRockit JVM on the Xeon system.
Given that JRockit has already been the fastest 32-bit JVM on IA32 Xeon systems, one can safely conclude that JRockit will be the fastest 64-bit JVM on Itanium systems as well.
So, aside from realizing the true benefit of 64-bits, which is the ability to address heaps >2GB, Java performance with JRockit on the latest Itanium system is actually better than on the latest Xeon system.
Arvind Jain
BEA Systems -
impressive either way[ Go to top ]
- Posted by: Cameron Purdy
- Posted on: July 11 2003 23:44 EDT
- in response to Arvind Jain
(1) 4-way Itanium 2 (1.5 GHz) system running the 64-bit JRockit JVM (optimized for Itanium 2)
(2) 4-way Xeon (2.8 GHz) system running the 32-bit JRockit JVM (optmized for IA32 Xeon)
I'm very curious to see 1-way, 2-way, 4-way and 8-way numbers. The Itanium scales up much (much!) better than the Xeon for SMP machines. The Xeon "sweet spot" is a 2-CPU configuration, and even there, it is (with the Intel chipsets) severely hobbled compared to other server systems.
However, the price/performance of the Xeon is and has been (in the 1x and 2x CPU servers) simply amazing, and the performance in and of itself is amazing. (There are a lot of applications that will run faster on a 1-CPU Xeon than on a "big name" 2-CPU Unix server.)
So if Itanium can squeeze out more performance than the Xeon, then it's doing pretty well, because that's a hard bar to get over already.
On a side topic, any plans to support the weird >32-bit memory extensions that IA32 supports? So that you could support >2GB heap on a 32-bit JVM? Any plans for x86/64 support?
Peace,
Cameron Purdy
Tangosol, Inc.
Coherence: Easily share live data across a cluster!