News: IBM posts new SPECjAppServer2004 score
IBM has posted a new SPECjAppServer2004 benchmark result with WebSphere, with a score of 2921.48.
- Posted by: Joseph Ottinger
- Posted on: November 14 2005 06:22 EST
The Standard Performance Evaluation Corp.'s (SPEC) SPECjAppServer2004 benchmark reflects the rigors of complex applications and high-volume transaction processing that are typical in today's customer environments. The test spans all major components of the application server including Web serving, Enterprise Java Beans and messaging and includes hardware, application server software, Java Virtual Machine software, database software and a systems network.
IBM's results submission involved more than 22,000 concurrent clients and produced roughly 2,921 complex business transactions per second. The IBM submission represented a complete IBM solution bringing together the latest version of WebSphere Application Server software, DB2 Universal Database and IBM System p5 550 servers running SUSE Linux.
However, as recent discussions of benchmarks show, while having a standard benchmark is useful, it's not always something that translates into meaningful data. In other words, doing well on a benchmark doesn't mean that the software and/or hardware configuration will give the same kind of results in the field without careful tuning, which makes the benchmarks interesting, but not always applicable. They show what's possible more than what is likely.
That said, such a high score (1200 more TPS than the next highest reported score) is impressive.
What do you think?
- Does this kill BEA WebLogic 9.0 ?? by Stephan Pratt on November 14 2005 07:47 EST
- Does this kill BEA WebLogic 9.0 ?? by Joseph Ottinger on November 14 2005 07:51 EST
- Does this kill BEA WebLogic 9.0 ?? by peter lin on November 14 2005 09:34 EST
Analysis of the IBM Results by Eric Stahl on November 14 2005 01:04 EST
Re: Analysis of the IBM Results by Will Hartung on November 14 2005 01:36 EST
Re: Analysis of the IBM Results by Eric Stahl on November 14 2005 01:51 EST
Re: Analysis of the IBM Results by Kirk Pepperdine on November 14 2005 02:29 EST
Re: Analysis of the IBM Results by Craig Blitz on November 14 2005 03:04 EST
Re: Analysis of the IBM Results by Frank Bolander on November 14 2005 03:44 EST
Re: Analysis of the IBM Results by Will Hartung on November 14 2005 05:02 EST
- Re: Analysis of the IBM Results by Rich Sharples on November 17 2005 01:12 EST
- Re: Analysis of the IBM Results by Will Hartung on November 14 2005 05:02 EST
- Re: Analysis of the IBM Results by Frank Bolander on November 14 2005 03:44 EST
- Re: Analysis of the IBM Results by Craig Blitz on November 14 2005 03:04 EST
- Re: Analysis of the IBM Results by Kirk Pepperdine on November 14 2005 02:29 EST
- Re: Analysis of the IBM Results by Steve Realmuto on November 14 2005 05:02 EST
- Re: Analysis of the IBM Results by Eric Stahl on November 14 2005 01:51 EST
Eric What is really eating at you? by Michael McCarthy on November 14 2005 07:02 EST
- Eric What is really eating at you? by Eric Stahl on November 14 2005 07:34 EST
- Analysis of the IBM Results : the real figures by bruno chevalier on November 15 2005 03:50 EST
- Analysis of the IBM Results by J Katz on November 15 2005 12:18 EST
- Come on BEA dudes - dont insult our intelligence by David Carew on November 15 2005 04:14 EST
- Re: Analysis of the IBM Results by Will Hartung on November 14 2005 01:36 EST
- Analysis of the IBM Results by Eric Stahl on November 14 2005 01:04 EST
- They're dead, they just don't know it yet by J Katz on November 15 2005 12:11 EST
- IBM posts new SPECjAppServer2004 score by Juozas Baliuka on November 14 2005 07:56 EST
- Go IBM! by Robert Peterson on November 14 2005 18:10 EST
- IBM posts new SPECjAppServer2004 score by Jacques Talbot on November 15 2005 05:27 EST
- general experience by Hank Li on November 15 2005 13:45 EST
... does this mean that big blue shot BEA diablo app server ??
... does this mean that big blue shot BEA diablo app server ??not by a long shot. I know plenty of shops that use Weblogic and quite happy with it. plus, competition is good.
It's important to not just look at the top line throughput from these benchmarks. You need to look under the covers to see hardware and software was used to achieve them to determine if they are impressive or not.
Here are the prices that IBM pulled together to tout their results;
AppServer Hardware: $192,000
AppServer Software: $480,000
Database Hardware: $377,000
Database Software: $181,000
Total = $1,230,000
Total $/JOPS = $1,230,000/2921 JOPS = $421/JOP
AppServer Hardware: $38,000
AppServer Software: $100,000
Database Hardware: $752,000
Database Software: $566,000
Total = $1,456,000
Total $/JOPS = $1,456,000/1781JOPS = $817/JOP
As you can see, BEA only used $100k in app server software licenses vs. IBM needing $480,000 in application server software licenses. That’s 4.8x the cost for only 64% more throughput.
BEA only consumed $38,000 in hardware on the app server tier to IBM's $192,000. That’s more than 5x the cost for only 64% more throughput.
Further exacerbating the problem, when you apply annual support and maintenance fees, which are usually calculated as a % of software license and hardware cost, the difference would expand further over a 3 or 5 year TCO comparison.
As you can see from IBM's own pricing information, the difference in the total price/transaction is in the database software cost and the database hardware. That’s an IBM DB2 vs. Oracle issue. You can’t sandbag us with those costs. As far as we are concerned the Oracle database is overpriced. We can get the same numbers with Microsoft SQL Server or MySQL.
WebLogic Server runs on IBM hardware, SuSE Linux and DB2 as well as WebSphere does. We have thousands of customers using that configuration. We have done benchmarks on Intel 32, Intel 64, AMD, SPARC and PA-RISC processors, Windows, Linux, Solaris and HP-UX operating systems, and Oracle, Microsoft SQL Server and MySQL databases. Our hardware and database independence is a big part of our value proposition. The only reason we haven’t done benchmarks on IBM hardware and DB2 is because we are competitors.
If you re-read their press release you see that they push on the fact that WebSphere is the source of these numbers when the reality is that they get their TCO benefit from their hardware and DB2 pricing.
Our current benchmark configurations target IBM’s prior scores to explicitly make this point.
Now we’ll do the same with their new numbers.
P.S. Has anyone noticed that Oracle hasn't submitted a SPECjAppServer2004 number? I recently blogged about Oracle's performance and TCO claims at;
Eric, can you tell us a bit about the SPEC benchmarking process? About how much effort it took to port the SPEC code to run on WLS, and what kinds of tuning decision you made to get it to run on WLS, and perhaps what parts of the the benchmark get in the way of performance?
I ask because from the SPEC website, it seems that the application is a "generic J2EE" application and that folks who wish to run it as a benchmark are limited to mostly deploying it, tuning the container, and tweaking container specific XMLs (bean-weblogic-ejb-jar.xml, etc.). So, I'm curious how much the portability of the benchmark may punish it for performance.
I understand you may not be able to give full details perhaps limited by agreements made when choosing the SPEC code, but I think it would be very enlightening to understand a bit more about this benchmark and what it takes to not only get it running, but to get it to perform on a particular container, as on its surface it represents the "promise" of a container portable J2EE application.
Great questions. The trade off of portability vs. using some app server performance enhancing features is an interesting topic. The same is true of security, management and developer productivity features.
I will see if I can get someone from BEA performance engineering to answer your question about this particular benchmark.
There are many interesting pros and cons of using an industry defined benchmark. At the end of the day Sj2004 is a much more rigorous test of all of the subsystems in an app server but it still leaves it up to the individual user to interpret the output and make conclusions about the applicability to their own application workload, portability and other requirements. In the end there is no generic benchmark that can satisfy all of these types of questions. It is one solid datapoint.
Will,Great questions. The trade off of portability vs. using some app server performance enhancing features is an interesting topic.
The benchmark is generic enough that I don't really see any trade-offs. However the implementation of Spec benchmarks in general is... .interesting.
My main problem with these types of benchmarks is with the definition of typical. I have yet to see an application that is "typical". Given this, I don't see how one is to get meaningful results from these extrememly granular benchmarks. I have expressed more thoughts on this in the Microsoft benchmarking thread. Just as microbenchmark often yields lessons that are difficult to apply, this type of benchmarking yields results that difficult to understand. If you consider the average response time of your system to be the sumation of the weighted average of the response time of all of the relevent components in the system, then what happens if your weights are different then the ones used in this benchmark (and most likely they will be).
My main problem with these types of benchmarks is with the definition of typical. I have yet to see an application that is "typical". Given this, I don't see how one is to get meaningful results from these extrememly granular benchmarks.
Certainly no benchmark is "typical". The original posting states that this benchmark mimics "the rigors of complex applications and high-volume transaction processing that are typical in today's customer environments." In other words, the benchmark is rich enough to be considered more than a micro-benchmark, and tells us something about how app servers perform in one such complex environment. One data point is better than none, and two would be better than one.
As the jAppServer benchmarks get richer (and they have through jAppServer 2004), it gets less likely that a good result on the benchmark is a result of some performance anomaly unique to the benchmark. It is not easy to perform well on jAppServer 2004; to do so, many aspects of the app server must be scalable and performant.
In the end, the benchmark is useful to the degree it matches your environment's needs. But the benchmark does tell us something useful about performance and is much better than no information at all.
I agree with Kirk.
All these benchmarks measure is the time averaged response time of the edge of the configuration -- the whole configuration seems to be considered a black box. It would be more helpful to breakdown the timeline of any given unit transaction to see where the impedances are down the entire process chain.
Given that these specs seem to be used to promote a given vendor configuration; where are the value add points? Did DB2 advance to dominate increased throughput,did Websphere, did the Edge Load Balancer, the Fibre Channel, the Multicore CPU advances... you get the idea.
I'll agree that some sort of baseline is better than no baseline, but the value of these benchmarks have little value other than maybe bragging rights.
In fact if you look at the nearest competitor in their list, it is a Sun/Weblogic combination. At first blush, it looks like the IBM configuration dominates. Even Joe posted
That said, such a high score (1200 more TPS than the next highest reported score) is impressive.
However, the IBM configuration contained 60% more processing elements and acheived, surprise surprise, around 60%(63 to be exact) higher response rates.
Maybe a better spec would be to dictate the configuration and have the vendors post results. This would provide a more normalized metric to evaluate the vendors.
However, the IBM configuration contained 60% more processing elements and acheived, surprise surprise, around 60%(63 to be exact) higher response rates.Maybe a better spec would be to dictate the configuration and have the vendors post results. This would provide a more normalized metric to evaluate the vendors.
As a swag (and an imprecise swag at that), I like to reduce the JOPS to a JOPS/CPU number. As most anyone knows, while the Appservers can be A bottleneck in any design, the heavy lifting pretty much in fact resides on the DBMS tier, as it tends to be the tier that is most expensive and difficult to scale, whereas the app and web tiers tend to scale horizontally with reasonable efficiency.
So, the second assumption for these benchmarks is simply that they were wise enough to provide enough DB bandwidth on the back end so that the app/web tier was, in fact, the bottleneck. Or, specifically, throw enough app servers at the configuration until your DB can't take any more.
You can look at the Sun app server numbers. One is, like, 1200-1300 JOPS, while the other is 2-300 JOPS (I think, I haven't looked recently). But the configurations were dramatically different.
But when you go with a JOPS/CPU, the numbers are quite close (like high 80's for their 4 machine MySQL vs versus high 90's for their Oracle on a 24 CPU monster version). There is certainly a difference, but the machine choice as well as MySQL vs Oracle all probably have some effect on that.
Obviously, CPU is not a reliable indicator, as CPUs differ in architecture and speed, but then you can compare and contrast the actual app server machines to make a better judgment.
Of course, now with things like Suns new Niagra (8 core, 4 threads per core), CPU, and even core, is going to be a meaningles measurement soon.
I like to reduce the JOPS to a JOPS/CPU number
As I blogged recently :
you can't normalize on CPU - it isn't a universal constant - they all have different characteristics and even change over time (they get cheaper).
The only sensible normalization you can do is based on price - ie. $/JOPS. This is also a pretty useful metric if you're actually interested in buying stuff. So I'm encouraged to see Eric now using that metric - it's a very useful metric that really exposes the costs in the equation - I think that has to benefit customers.
Now that we're looking at the benchmarks in a more sensible way and not just captivated by the biggest number hopefuly we can start looking at how to drive down the total cost of Java EE (aka J2EE) - I think we can all benefit from that - the other server-side camp isn't standing still on price / performance.
To date I believe that Sun has the lowest $/JOPS number (ie. best price / performance) albeit on a lower end configuration - that's our focus - price / performance and we welcome any competition.
You can look at the Sun app server numbers. One is, like, 1200-1300 JOPS, while the other is 2-300 JOPS (I think, I haven't looked recently). But the configurations were dramatically different.But when you go with a JOPS/CPU, the numbers are quite close (like high 80's for their 4 machine MySQL vs versus high 90's for their Oracle on a 24 CPU monster version).
Our submissions reflect what we see customers deploying on - as well as demonstrating our price / performance lead we actually hope that the different configurations might be representative and therefore serve as a useful guide for customers.
And congratulations to IBM - that's a big number.
Competition is a good thing.
I just cannot understand why it should be impossible to compare appservers on the same hardware.
Just take an off the shelf high-end Intel hardware, put a database and the most recent JDK onto it. Then keep this system unchanged for the next 5 years (maybe buy several as backup).
Now you can compare appservers and see which one runs fastest. In 5 years they might run dead slow (compared to 2010's standards) but this will affect each of the appservers in the competition in the same way.
Maybe you have to update this system if new appservers won't run anymore on the JDK or database, but then again keep it the same for all participants.
As I blogged recently :http://blogs.sun.com/roller/page/sharps?entry=performance_at_a_costyou can't normalize on CPU - it isn't a universal constant - they all have different characteristics and even change over time (they get cheaper).The only sensible normalization you can do is based on price - ie. $/JOPS. This is also a pretty useful metric if you're actually interested in buying stuff.
That's true, but that doesn't discount the JOPS/CPU mark wholesale. The $$/JOPS is certainly more accurate, but $$/JOPS doesn't take in to account the unreported costs of the machines and such. Also, the JOPS/CPU price gives a more definative "which is faster" result. If WLS is processing twice the JOPS on Opteron than Sun, then WLS is faster. Maybe not twice as fast (hardware plays here -- unless Suns Opterons are 1/2 the speed of WLS's Opterons. Yea, I didn't think so either.), but faster. Yea, it's more expensive, but I can assure you that if/when someone like JBoss publishes a SPEC, for their free server, the JOPS/CPU number is going to the number that folks are looking at because thats the number that helps explains the faster server. When both servers are free, then JOPS/CPU will be an important number.
Of course even then that hides things like perhaps SJAS is easier to manage than JBoss, so there's those hidden costs again.
Another example, if the SJAS is cheaper per JOP, but requires twice as many machines to do the job, there's a hidden cost in maintenance and running of those machines (something Sun is quite intimately familiar and, in fact, is on the forefront of that battle of making efficient servers and improving utilization).
So, while in some contrived example SJAS may be in fact cheaper, WLS may well simply fit the data center ergonomics, and the price of WLS may be cheaper than expanding the data center. YMMV. Run your own numbers, do your own testing, etc.
A lot of this argument goes moot when you look at things like the new Niagra processors you guys are working on, which will give machines a lot of "torque" if not necessarily the absolute best performance for a single process. But if you can replace 32 1U machines or blades with a 4U Niagra running 32 simultaneous threads, that's a nice compact package.
Anyway, while $$/JOPS is a useful statistic, JOPS/CPU is also useful, as you get a little closer to Red Delicious and Golden Delicious. Not the same apples, but close.
... but $$/JOPS doesn't take in to account the unreported costs of the machines and such. Also, the JOPS/CPU price gives a more definative "which is faster" result.
First, let me start with the caveat - $/JOPS is not a formal measure - there is no formal definition and it's not part of SPEC's language. However, Sun, BEA and now IBM seem to have used it consistently - it is basically the cost of the BOM (for the app tier or the whole system) divided by the JOPS reported in the submission. The BOM (Bill of Materials) is the entire list of stuff *required* by SPEC for the submission - it includes license cost, hardware, software and support over 3 years. That definition means that it is about as representative as you can expect for a benchmark - it is also pretty easy to calculate because the BOM is freely available for anyone to read. It doesn't include cost to develop and maintain - but those things a pretty hard to measure.
Let me address another post; I very much doubt whether you'll ever see a standard (fixed) deployment platform so that we can compare all products. You'll never see IBM submit results on SPARC, or Sun submit results on HP, etc. and I doubt if any third party will step up and take on that burden - for one thing many of the licenses don't allow it without the vendor's consent.
So, for now I think that $/JOPS is the only meaningful measure.
Perhaps I missed something, but it looks to me that Sun published numbers prove that WebLogic performs better than Sun Java. My favorite metric is JOPS/core (and not JOPS/CPU - Power5 for example is dual core since the beginning)
WebLogic on 20 Opteron 2.4Ghz (dual core) cores: 1781/20 = 89 JOPS/core
Sun Java on 26 Opteron (single core) cores 2.2GHz: 1201 JOPS
let's add 10% to compensate 2.2 vs 2.4 = 1321/26= 51 JOPS/core
Note: in reality, Opteron dual core does not perform as well as 2 single cores, so WebLogic is at a disadvantage.
So Weblogic wins by +75% (at least)
PS: I do not understand people insisting on $/whatever metric since this is so easy to manipulate.
Eric, can you tell us a bit about the SPEC benchmarking process? About how much effort it took to port the SPEC code to run on WLS, and what kinds of tuning decision you made to get it to run on WLS, and perhaps what parts of the the benchmark get in the way of performance?I ask because from the SPEC website, it seems that the application is a "generic J2EE" application and that folks who wish to run it as a benchmark are limited to mostly deploying it, tuning the container, and tweaking container specific XMLs (bean-weblogic-ejb-jar.xml, etc.). So, I'm curious how much the portability of the benchmark may punish it for performance.I understand you may not be able to give full details perhaps limited by agreements made when choosing the SPEC code, but I think it would be very enlightening to understand a bit more about this benchmark and what it takes to not only get it running, but to get it to perform on a particular container, as on its surface it represents the "promise" of a container portable J2EE application.
The SPECjAppServer2004 benchmark *IS* a portable J2EE application that any J2EE 1.3 (or later) compliant application server should be able to deploy with absolutely no source code changes to the benchmark application or driver. In fact, the benchmark license agreement and run rules generally require that the benchmark be run unmodified (with some minor exceptions for application server specific deployment and database compatibility).
Any tuning is limited to the deployment and runtime configuration, which must be fully disclosed in the result's Full Disclosure Report (FDR). The application specific deployment descriptors and any other files necessary to reproduce the results are included in the Full Disclosure Archive (FDA) for the result. Both the FDR and FDA are publicly available for all published SPECjAppServer2004 results on http://www.spec.org/jAppServer2004/results/jAppServer2004.html .
This benchmark is designed to test the performance of a J2EE application server, not to maximize performance. As the design guide says, "SPECjAppServer2004 stresses the ability of J2EE application servers to handle the complexities of dynamic Web page generation, memory management, connection pooling, passivation/activation, object persistence, caching, message queuing, etc." Many of the "operations" included in the "jAppServer Operations Per Second" (JOPS) result are very heavyweight. The "browse" operation, for example, includes browsing forward and backwards a total of 13 pages of vehicle quotes. In other words, it is a "real application." Other application may get more or less throughput based on their particular workload, but his benchmark is useful for comparing application servers against a standardized, realistic workload. More detailed information about the benchmark is available on http://www.spec.org/jAppServer2004/ .
That said, the total SPECjAppServer2004 JOPS number for any result is also dependent on the underlying hardware, so you need to look this number in the context of the hardware required to achieve it. The design of this benchmark is such that the application server tier is infinitely scalable. Essentially, you can achieve any desired result by throwing enough hardware at it.
What value, then, are SPECjAppServer2004 benchmark results in comparing J2EE application servers? Ideally, you'd compare results on identical hardware configurations. It would be interesting to see what results IBM could achieve using the same hardware that BEA used for a previous record holder using 5 2-way Intel Xeon DP application servers. See this result and the closest match from IBM using twice the number of CPUs (5 4-way Intel Xeon MP application servers):
In the absence of a reference hardware platform, comparisons of results on different configurations need to be normalized to account for this. Price/performance is a commonly accepted way to do this. Unfortunately, SPEC chose not to include a price/performance metric in SPECjAppServer2004 (although a Bill of Materials that enables a reader to calculate this for themselves is included). Eric's post addresses comparisons from a cost perspective.
Another way to normalize results across different hardware platforms is to look at the number of application server CPU cores required to achieve a given result. For example, how many CPUs per 1000 JOPS were used. This method is far from perfect, but it can provide some useful insights. IBM's latest result achieved 2921.48 JOPS@Standard using 32 Power5 application server CPU cores or 10.95 CPUs per 1000 JOPs. The previous high result from IBM achieved 1343.47 JOPS@Standard using 20 Intel Xeon application server CPU cores or 14.89 CPUs per 1000 JOPS. For comparison, BEA's WebLogic Server achieved 1664.36 JOPS@Standard using 12 Intel Xeon application server CPU cores or 7.21 CPUs per 1000 JOPS.
The total JOPS achieved in this latest result is impressive, but so is the hardware required to achieve it. The application server's performance needs to be judged in this context.
Clearly something has you in a twist! Is it that BEA worked so hard to achieve the numbers and then IBM crushes them? Tell the truth..how long did it take BEA to get to the Spec? How is BEA market share doing in the App server space? I know and it might explain your comments.
Most people know or should know that this spec is a leap-frog game and that in the end the consumer wins. At IBM we are delighted. Our published results show that we are not standing still and that we have R&D money to spend on advancing our solutions to meet or exceed requirements. Does BEA?
Please feel free to write a response and do it quickly. This market is moving so fast, I am not sure who is going to be bought next (look what happended to Seebeyond). Also, I often find the material from your company amusing.
"Our published results show that we are not standing still and that we have R&D money to spend on advancing our solutions to meet or exceed requirements. Does BEA?"
As a matter of fact we do. BEA increased the R&D investment from $90 million in fiscal 01 (calendar 00) to $120 million the next year – a 33% increase in R&D investment in the face of the industry wide recession. We have continued to increase our R&D investment every year since that time, with a 60% increase in R&D investment since FY 01. In the first half of this year, we have re-accelerated our R&D investment. For the first half of this year, R&D is up 17%. BEA headcount is also at an all time high.
This year we launched the AquaLogic product line, WebLogic Communications Platform and the WebLogic Real Time Edition, we acquired Plumtree, SolarMetric, M7 and ConnecTerra and have reved new versions of WLS, Tuxedo and JRockit.
Your question that implies that BEA is short on R&D resources is clearly misguided.
Congrats on the new benchmark numbers. We look forward to a spirited competition.
The most critical thing about these standardized benchmarks that makes them more relevant than all the other ones out there (read MS benchmarks on TSS recently or other one off “I’ll prove mine is faster than yours in this one obscure test case just to get some marketing publicity” benchmarks) is just the simple fact they ARE standardized.
This standardization requires that all vendors play fair and ensures they have agreed to execute the benchmark in a specific manner and/or helped build the benchmarks that are being run to quantify the performance of the various setups via a rigorous community process. This process ensures one vendor does not dominate the benchmark into being more apt to perform on their platform. Standardized benchmark focus on the main features of J2EE in this case so yes they don’t apply directly to everyone’s environment but give you a good idea of how the system under test performance in the general case. So basically what I am rambling about is that standardized benchmarks are the next best thing to building a benchmark that mirrors your own environment in your IT infrastructure.
First off I want to preface my comments here by saying that standardized benchmarking no matter if it is a benchmark from TPC, SPEC, or someone else is a leap frog game for the vendors actively participating. Everyone is always shooting for the top spot and it’s much easy to gun for number one because you have a target to shoot for than to continue to be number one. For example IBM was the ONLY vendor to publish on this benchmark for over a year! My question is what was WebLogic v8.1 broken during that time? Or did it not run standard J2EE 1.3 applications? Or did you just not want to publish inferior results? I notice that Eric from BEA calls out Oracle and asks why Oracle has not yet published results on SPECjAppServer2004 so I figure if he expects an answer from them he should be able to answer my question because I bet the answer could well be the same. Makes me wish I had a forum in May 2004 to ask the same question to all the other J2EE vendors out there.
The folks from BEA also raised some very interesting points and made a lot of leaps of faith in their comments on IBM’s published results. Once again everyone knows statistics can be twisted in a billion different ways by marketing teams. Bottom line is our results beats they best result by any application server, not just BEA, by 64% on the benchmarks sole supported and SPEC organization agreed upon metric JOPS. Eric from BEA says they will beat this number (which I am sure they will eventually do as this is a leap frog game as I stated above) my only question is will it take a year like it took you to beat IBM the first time?
Eric from BEA also makes the comment:
“As you can see from IBM's own pricing information, the difference in the total price/transaction is in the database software cost and the database hardware. That’s an IBM DB2 vs. Oracle issue. You can’t sandbag us with those costs. As far as we are concerned the Oracle database is overpriced. We can get the same numbers with Microsoft SQL Server or MySQL.”
To which I have the following three comments:
1) I say you can and should contribute database cycles to the middleware tiers efficiency. Anyone who has developed applications for or administered an application server can tell you that the middleware affects the databases performance no matter which application server you run or DBMS there is always an impact. The middleware is responsible for generating all the SQL used to access the backend in this benchmark, as well as performing connection management and a plethora of other database resource consuming functions that impact the database overhead. So saying that you can look at just the application server tier of this benchmark in isolation is like saying you can evaluate efficient a car is without actually looking at how much gasoline it consumes.
However contrary to this if I am reading into Eric’s comments correctly he is placing ALL the blame on Oracle saying Oracle is a very poor performing database compared to IBM’s DB2 which I would find to be an interesting statement for a BEA representative to make considering the large amount of customers deployed on Oracle unless he really believes it to be true (is this what you are saying Eric? Are you saying that us needing an 8-core database machine versus BEA’s publishes 32-core database machine is all because of Oracle?). Or else could Eric be wrong and it could possibly be that there are issues with BEA WebLogic that are contributing to the configuration needing a massive database tier?
Eric or someone else from BEA it would be really helpful to administrators and developers all over the world to understand if your statement about Oracle performance is really true because it would affect a lot of project sizing I am guessing.
2) Second I agree Oracle is overpriced, but the majority of cost comes from the 32-core machine it takes to run it for your publish.
3) If you can match our number (or even your old number) with SQL server or MySQL I would love to see you do it. I think it would be great to see MySQL running at throughput rates like those being generated when this benchmark is running at these levels.
The last comment I would like to make on this topic before I have to get back to real work here is why is it that BEA publishes with configurations on this benchmark that they do not recommend or attempt to sell real customers. For example the BEA publishes most of the time use WebLogic Advantage Edition which is equal to IBM’s WebSphere Base server. By running this low end version of WLS they are able to keep cost down but have to setup an environment that is unrealistic to customers and one they I sure hope would not recommend to customers. I cannot imagine to many customers use the method BEA does of using DNS round robin to load balance all requests (HTTP and RMI) into the cluster instead of using a REAL load balancer that a customer would use in production for various QoS reasons, they also in this basic mode of operations have no management across the multiple machines or any type of possible failover configuration even possible. IBM uses the WebSphere ND product to run this benchmark because it is what we sell to customer who would be running in clusters like this benchmark is testing. Why bait and switch the customer BEA? Why not just be honest up front and run this benchmark in a manner you would want to have your customers running with? We do it why can’t you?
Whew that’s a lot of typing but I feel worth while. I welcome comments as those of you that know me already know and I would really like to have someone from BEA answer my questions posed above, especially those about Oracle because I think it would really benefit the community in general! Overall this fun leap frog game more than anything benefits all of you folks out there running on our products more than anything else because it makes every vendor products significantly better than they would be without a benchmark to compete with each other on. Competition is what drives innovation and if you look at the amount of innovation that has gone into application servers improving function that the benchmarks use I bet it benefits almost everyone in TSS today in some manner.
John Stecher – IBM
*My views do not directly reflect those of IBM and are my own opinion.*
This standardization requires that all vendors play fair and ensures they have agreed to execute the benchmark in a specific manner and/or helped build the benchmarks that are being run to quantify the performance of the various setups via a rigorous community process. This process ensures one vendor does not dominate the benchmark into being more apt to perform on their platform.
What I like about this particular benchmark is that, on top of being standard, it's really pretty darn rigid. Limiting the amount of tuning you can do to essentially the deployment descriptors and choice of JDBC drivers.
First off I want to preface my comments here by saying that standardized benchmarking no matter if it is a benchmark from TPC, SPEC, or someone else is a leap frog game for the vendors actively participating.
Ah, for the old Informix/Oracle TPC wars of the early '90's.
Everyone is always shooting for the top spot and it’s much easy to gun for number one because you have a target to shoot for than to continue to be number one. For example IBM was the ONLY vendor to publish on this benchmark for over a year! My question is what was WebLogic v8.1 broken during that time? Or did it not run standard J2EE 1.3 applications? Or did you just not want to publish inferior results?
Well, you should know as well as anyone that these public standardized benchmarks are marketing in their purest form. Sure they're all shrouded in technical verbages, numbers, charts, and such, but they're marketing through and through. They're the "soundbites" of the software industry.
I give kudos for Sun publishing a benchmark for a 4 server configuration (their lower end servers to boot) running against MySQL, certainly not shooting for any spot whatsoever on the charts, but at least it's a dart on the board.
I eagerly await IBM sponsoring a similar run of Geronimo on Linux against MySQL. And, of course, where's JBoss at all, for any level of scale. Just to put a dart on the board.
But the key to make these benchmarks have any value whatsoever is to publish something. If for no other reason than to wonder why the others vendors publish nothing at all.
I notice that Eric from BEA calls out Oracle and asks why Oracle has not yet published results on SPECjAppServer2004 so I figure if he expects an answer from them he should be able to answer my question because I bet the answer could well be the same. Makes me wish I had a forum in May 2004 to ask the same question to all the other J2EE vendors out there.
Yup, too late now.
3) If you can match our number (or even your old number) with SQL server or MySQL I would love to see you do it. I think it would be great to see MySQL running at throughput rates like those being generated when this benchmark is running at these levels.
A big Hell Ya on that.
Why not just be honest up front and run this benchmark in a manner you would want to have your customers running with?
I applaud this attitude. There are benchmarks, then there are benchmarks on machinery and in configurations customers actually run. Many of the early TPC benchmarks were performed "to the letter" of the benchmark, but in modes that were simply stupid in the field, so perhaps not in the "spirit" of the benchmark.
Overall this fun leap frog game more than anything benefits all of you folks out there running on our products more than anything else because it makes every vendor products significantly better than they would be without a benchmark to compete with each other on. Competition is what drives innovation and if you look at the amount of innovation that has gone into application servers improving function that the benchmarks use I bet it benefits almost everyone in TSS today in some manner.
What also goes unsaid here is that the particular power of this benchmark is that it is a portable J2EE application. Understanding that pretty much nobody writes a fully portable J2EE application today (they use either container or DB specific code somewhere, and with good reason), but still, the SPEC benchmark is a rather large, reasonably complicated system that IS portable. That CAN run on IBM, BEA, Oracle, etc. on AIX, Linux, Solaris, Windows, etc.
Most people rarely simply give up their DBMS once it is entrenched. They certainly transition over time, but it's a long cycle (5-10 years). Despite the ubiquitous nature of SQL, it's remarkably un-portable. But the Appservers today, as the standards and implementations gets better and better, while they offer unique value add (mostly in administration), they are more and more interchangable than databases ever were. No doubt IBM and BEA and others are migrating applications from competitors app servers. I would like to think they rather enjoy getting a J2EE app redeployed on their server vs rewriting a boatload of stored procedures from DB X to DB Y.
In fact, this appears to be IBMs tact on Geronimo. Develop and do small scale deploys with it, but Super Size the application with WebSphere. Stick to the standards, and the process is less painful. Combined with something like the SPEC mark, that means that to compete, Appservers need to run standard deployments well without relying as much on server specific changes (save to the server based descriptors, but those aren't portable by definition).
So, with the appservers competing on performance using standard deployments, the app servers are that much more interchangable, which makes benchmark results potentially painful, or, even worse for vendors, testing against competetive environments less expensive.
If someone sees IBM running N% better than BEA on similiar hardware, a user may well consider the possibility of swapping out their app server, and testing that, on their own hardware, may well become less and less involved.
And it's good to know that not only can someone swap out hardware to improve performance, but you may be able to swap out the middleware tier as well.
That makes the appsever market much more competetive, and benchmark marketing that much more important to the space. Since I "know" that I can move my entire software stack running Java from Sun servers to IBM servers (running BEA middleware against Oracle, say), the raw cost/performance numbers of an IBM vs Sun server is a basic commodity choice. IBM and Sun know that too.
Appservers, again, at the performance for standard deployment level, are commoditizing themselves as the standards get better as well, and we get closer to considering appserver benchmarks in the same way we consider server hardware benchmarks. It's not quite "What Appserver should we run?" "Whatever's on sale", but, who knows, someday.
Are there other factors involved? Of course there are, but the playing field is leveling out day by day. And we customers win that game.
Thanx for your views John.
I certainly appreciate your point of view and appreciate the questions.
Your first question was about the timing of Sj2004 numbers from BEA. I can assure you that WebLogic Server 8.1 was not "broken". There is a very logic explanation for our benchmark publication strategy. The introduction of SPECjAppServer2004 overlapped with the SPECjAppServer2002. IBM dove right into 2004 while BEA chose to continue to publish 2002 numbers. Why? Three reasons.
First, BEA strongly objected to dropping the price/performance metric in SPECjAppServer2004. We preferred publishing results that included this metric. Second, SPECjAppServer2002 had a large base of submissions from BEA, IBM, Oracle and Sun, which makes all data more useful because there are more points of comparison across the different hardware, O/Ss, database, VMs, processors, etc. All of the SPECjAppServer2002 scores can be seen here;
Note that BEA has the lowest price performance in both categories we participated in and the high water mark in the DualNode category.
Third, once Sj2002 closed we shifted our focus to Sj2004, which coincided with the roll out of WLS 9.0.
The reason I call out Oracle is because they are not only still using Sj2002 data in their performance and TCO claims, they are actually misrepresenting the data. I show one of their slides and explain this step by step in my blog. No emotion, no FUD, no debatable analysis, just the raw data.
Your next point was about the effect of the middleware platform on the database. All I'm trying to do is stimulate some discussion about the different products in the stack that make up these numbers. Like I said before, you can't just look at the throughput number and make intelligent statements about them. I will back off to a certain degree on this point. While we have published Sj2002 configurations using MySQL and MS SQL Server, none have been this scaled, so we'll dig in and see what we find.
Your last point is about the products and configuration used in the BEA configuration. I can assure you that there is no conspiracy going on here. First, WLS Advantage Edition is not a low end offering. For stateless applications, like this benchmark, this is the version of WebLogic that customers should use. Second, since the majority of vendors on the SPEC committee voted to remove price/performance from the benchmark, your point about us using unrealistic configurations to reduce cost is moot since the “sole supported and SPEC organization agreed upon metric” does not include pricing anymore.
To respond to a few other comments about the pricing information, I want to be clear that it was the pricing information provided by IBM. I didn't pull it together or even confirm it.
Finally, the comments about R&D and market share are pure FUD.
I agree that customers win as we duke it out. Congrats again on this result.
Eric, your analysis is somewhat wrong. You can not rely only on prices to compare hardware!
In the IBM benchmark, the app servers hardware are 8 IBM with 2 processors each(1.9MHz) with 4GB RAM on each.
In the BEA benchmark, the app servers hardware are 5 HP with 3.6 Mhz processor (1 per machine) with 4GB RAM on each (average).
Here are the links to compare the hardware used in those benchmarks :
If BEA wants to prove that they have can have better results, the best way is to do the same benchmark with the same hardware, OS and database (DB2).
I think it will be far more honest than comparing the pricess of the server to analysis those results!
List prices? Geez, you don't even buy a car at list. IBM can provide discounts as steep at 90%+ if they want. But the big picture is not even being discussed here. Integration with middleware apps, enterprise applications on the mainframe and on the distributed side is saving companies millions. Nobody does mainframes like IBM and certainly not BEA. It seems to me that people that might have considered BEA at one time are now using JBOSS.
... BEA only used $100k in app server software licenses vs. IBM needing $480,000 in application server software licenses. That’s 4.8x the cost for only 64% more throughput.BEA only consumed $38,000 in hardware on the app server tier to IBM's $192,000. That’s more than 5x the cost for only 64% more throughput....
I like others question the validity of these benchmarks but don't insult our intelligence by trying to "spin" the numbers. Bottom line is that you spent more money than they did and you got creamed !!! Saying that IBM spent more on this and that just says to me that they were smarter about the way they allocated their resources. Saying that you could have achieved the same result using a cheaper database says to me that the people running your benchmarks are morons and should be fired for incompetence.
"IBM can provide discounts as steep at 90%+ if they want",
and then: http://nobodygotfired.blogspot.com/2005/09/how-ibm-conned-our-it-execs-out-of.html
I have similar experience as stated in the blog. And the management team's reponse (in a private conversation) to the failure is that.
"If IBM could not complete the project, then none of us can be blamed"
I suggest theserverside.com conduct a benchmark with bea and ibm reps around. easy isn't it?
(i assume theserverside.com hasn't been acquired by ibm yet).
hey, btw, let's have Hani around too, you know. just in case.
Ever see Dawn of the Dead? Zombies walking around... BEA's share of the market will continue to erode as IBM pulls ahead.
I think IBM software performance is IBM problem and there are better way to sell it.
Bah, everyone is always so negative toward IBM. If their big expensive app server performs best then more power to them.
Find below my personal 2 cents assessment comparing WAS and WLS.
This has not been done in the thread, as far as I am able to browse.
I am not with IBM or BEA, just using both.
Reality is vendors are doing everything possible to confuse poor users, always choosing a different HW (even when dealing with Intel chips) to avoid fair J2EE servers comparisons.
On TPC-C, P5 beats Opteron or Xeon by a factor of 2, meaning one P5 core performs twice as much tpmC as a Xeon core.
So there is obviosily a chip architecture advantage.
I am talking here about the best chip available at one point in time; chip frequency is irrelevent.
Let's have a look at SpecJBB to eliminate the Java factor:
- P5 at 16 400 BOPS/core
- Opteron at 11 100 BOPS/core
- Xeon at 12 100 BOPS/core
This leaves a 1.5x advantage to P5, less than the 2x on TPC-C.
This can be attributed to (1) the chip architecture or (2) the JVM. I would favor (1). You cannot be optimized on every workload. P5 is designed to win on TPC-style workloads.
Looking now at SpecJAppserver, we have:
- P5 at 91 JOPS/core with WAS
- Opteron at 89 JOPS/core
Equality instead of the P5 1.5x advantage on SpecJBB.
So all things considered, WebLogic is 1.5 times better than WebSphere on equivalent HW, based on these figures.
Less than the 2x actor that BEA marketing claims, but still significant.
Note: as noted, "$ per whatever to performance metric" is usually misleading since it is SO easy to distort this (was there, dit it!)
Not a long time ago IBM claimed that DB2 is faster than Oracle, the IT industry laugh to it in silent.
Now it is WebSphere's turn to claim it is the fastest.
Do you know why Microsoft pickup WebSphere instead of WebLogic to compare with their .net server? because they know WebSphere is the slowest app server in the J2EE market.
Please ask the experience of any developer who has experience with both application servers.
IBM is good at catching the decision makers of the big companies; BEA is good at attracting developers. The final result is that the decision makers help IBM put more WebSphere in the market so that more developers can suffer.
But one thing I agree with IBM is that the R&D budget is important for the product.