Discussions

News: New Microsoft and WebSphere Application Server Benchmarks

  1. Our companion site TheServerSide.net has posted new set of benchmark results from Microsoft. The benchmark is based on a modified version of an IBM WebSphere sample application, PlantsByWebSphere. The results published in the first benchmark shows that .NET enjoys a 183% performance advantage over WebSphere 6.0.
    The second benchmark looks at the affect of 64-bit memory addressability on middle-tier application servers by comparing the performance of a simple web application that pulls image data from a database and caches it on the server. The benchmark shows that at high transaction throughputs, that 64-bit Windows Server 2003 with .NET 2.0 can handle about 3% more transactions per second than 64-bit WebSphere.
    In a demonstration of fairness, Microsoft has published all of the source and data needed to run this benchmark. The downloadable msi includes instructions to configure Oracle as well as SQL Server. However the claim of full disclosure is weakened by the failure to provide critical information regarding the technical configurations used during their testing.

    Do you think that this is a fair comparison? What should the reaction be from Java developers, if any?

    Threaded Messages (150)

  2. "Do you think that this is a fair comparison? What should the reaction be from Java developers, if any? "

    I think you must stop to publish this crap.
  3. Do you give up ?[ Go to top ]

    It could be easily seen as an admission of failure...

    It would be a lot better to have some benchmarks with the full settings of both servers available and then we could discuss which server is the fastest, for whatever it means...
  4. If you compare "like" implementations E.G. .net2.0 and JDBC you find this

    1) Oracle/Websphere is faster than Oracle/.net on windows
    2) Linux is faster than Windows when using Webspher/Oracle
    3) WebSphere/Oracle is Faster than Websphere/SQLServer
    4) Linux/SqlServer/Websphere is faster than Windows/Sqlserver/Websphere
    5) JDBC is faster than EJB access.
    6) EJB's seem to be slow.

    And Finally, Microsoft has written a Terrifc SQLServer driver for .net 2.0 ! I only hope that MS will release a terrifc driver for JDBC !


    What's wrong with this ?
    ( I know the above statements are not "truly" correct ) but from the data one could draw these conclusions.

    I appluad MS for expanding this bench mark to include the JDBC version. While I differ from their written conclusion, I think that this bench mark is better than their other ones. In MS typical style their headlines /conclusions compare their implementation to the EJB implementation,
  5. Great compilation of the benchmark combinations. But are we missing something, performance is one thing which drives the decisions of the management. Equal important issues are the TCO of the application, its development. I don't like MS approach of development. I saw the smartclient information page. They say on their page many times "DLL Hell"... well, who created that stupid thing? Sun microsystem or some aliens? Huh!! The company creates some technology and declares it as crap few days after. They exaggerate some things beyond necessary. The java evolution route is pretty stable. They never gave such spikes(or even they don't try to). Hype don't drive business and MS should understand it by the partial success of C#. The environment of .Net launch first was something like they will wipe out Java from the earth. But what happened, I still see java is in business and not much problem to it. But I never deny the potential of MS too. They have given excellent user appealing things and will look forward to see more. But a honest expectation from Microsoft is to stop exaggerating simple things and be more to the earth.

    Regarding the original benchmarks, I belive that websphere in full throttle when its in cluster environment. I will also love to company Cameron Purdy for rewriting the application.

    ~~Sachin~~
  6. They say on their page many times "DLL Hell"... well, who created that stupid thing?

    They admint their mistakes and provide a solution.
    Sun NEVER admits ANY mistake, and drags those mistakes from version to version for "compatibility".
    EJB for example. Sun will never say EJB Hell, even when most java developers cry about it.
    And java vendors will HAVE to implement EJB 2 in order to be "compliant" with Sun's certification.

    You prefer this approach.
    I clearly prefer Microsoft's
  7. Nothing To See Here[ Go to top ]

    So Microsoft's cache API outperformed some cache API provided by IBM? Ummm... why is this pissing contest news? I wouldn't brag too much about these results if I were MS. It shows that switching from Windows to Linux will give any memory intensive application a 10-15% performance boost.

    Also, .NET apps should outperform Java apps. The .NET ones only have to work on one OS and it happens to be an OS that the .NET framework developers should have unique and intimate knowledge. For example, in .NET you have direct access to the OS's thread pool. In Java you can create your own thread pool, but that's about it. If a .NET app did not outperform a Java on the same hardware, then there would have to be a significant flaw in .NET.
  8. Am I missing something here[ Go to top ]

    This bench marks says Websphere is FASTER than .NET.
     In short it says

    LINUX is FASTER than WINDOWS
    Websphere is FASTER than .NET

    And that MS must have an OPTIMIZED database driver for SQLSERVER !!!! That's the only thing this benchmark says.

    What are people complaining about ? What am I missing ?
  9. Robert,

    Yes, you may be missing something.

    1) Look at the price/perf bars. This says a lot about the relative expense of .NET/Windows Server vs. WebSphere (least expensive edition) on a supported copy of Linux.

    2) Look carefully at the perf differences. Lots of folks in the past have complained that SQL Server has not been a good option as a DB when using J2EE on middle tier becuase of lack of a good JDBC provider. We are working on fixing that with this new driver, and the perf looks basically in line (minor differences in this test) with JDBC perf for WebSphere talking to Oracle. That's good news for folks that want J2EE on middle tier but like SQL Server as a backend.

    3) In terms of our Oracle perf, it totally depends on whether you are comparing EJB implementation or the straight JDBC implmentation (which we created, not IBM, and this JDBC only version crushes the EJB implementation in terms of perf for *this* web-based scenario). .NET against Oracle is much much faster than WebSphere EJBs against Oracle (again, for this scenario, but its a very common scenario).

    If you goto a straight JDBC scenario with no EJBS (why buy WebSphere in the first place then, might be a viable question), then our perf against Oracle is close to the JDBC/WebSphere perf against Oracle (but our price/perf continues to be much better). So you are not sacrificing any real perf, at the potential gain of productivity, lower costs and the like (more subjective, but we find lots of customers switch to .NET and VS becuase of productivity, once they determine .NET and Windows Server can scale really well and are reliable).

    If you go back 2 years when we did our first .NET provider for Oracle, our Oracle perf was not this good...its gotten much better. If you go back 4 years when we went through OLEDB to get to Oracle from VB6, C++ and the like (before .NET), our Oracle perf was really not very good. We have come a long, long way. By the way, Oracle makes their own .NET provider for Oracle 10G, it may be even better/faster than ours that ships with .NET, we did not test it.

    4) As for the best perf in this scenario, its .NET against SQL Server 2005. Why? Is this a conspiracy? Well, we can optimize for SQL Server becuase we own it, we have full access to the API, and we can work with the SQL team on .NET integration. With Oracle, we do not own it, and the level of integration cannot be as tight. For exampple, we go through OCI. I believe that even so, there are some scenarios now involving large datasets where we are actually faster than JDBC/Oracle with ADO.NET/Oracle. We have done very well here with progress over time, and performance should not be a blocking issue when deciding on .NET 2.0 if Oracle is going to be the DB.

    5) WebSphere does test out here as faster (but not by a lot) on Linux than Windows. This should tell you we did a pretty good job tuning Linux, and we did not falsify results :-). It's not a lot faster, and even given this .NET wins on absolute best perf (with SQL as backend), and still wins every price/perf comparison since WebSphere is so expensive, even when running on Linux.

    My personal belief as to why it tests out faster is that Apache (IBM HTTP SERVER) is simply slightly more optimized for Linux than Windows right now. For example, the IBM HTTP Server on Windows is multi-threaded, but not multi-process. Its a completely different codebase on Linux (multithreaded and multiprocess version), and since most Apache installations (vast majority) are on Linux right now, not Windows, it makes sense its more optimized right now for Linux. In this test its not a big difference, but the difference is there. Would be nice to see a multi-process version of Apache out there for Windows---maybe it exists but is not currently shipped as part of the WebSphere stack.

    -Greg
  10. 1) Look at the price/perf bars. This says a lot about the relative expense of .NET/Windows Server vs. WebSphere (least expensive edition) on a supported copy of Linux.

    The benchmark is talking about performance and makes the claim about .NET being 183% faster NOT 183% faster per $

    As for cost, you and I both know how rubbery these figures can be. Decide on the hardware you want and negotiate is how I buy things. But how about this cheap shot, how much is .NET on Linux going to cost me.
    2) Look carefully at the perf differences. Lots of folks in the past have complained that SQL Server has not been a good option as a DB when using J2EE on middle tier becuase of lack of a good JDBC provider. We are working on fixing that with this new driver, and the perf looks basically in line (minor differences in this test) with JDBC perf for WebSphere talking to Oracle. That's good news for folks that want J2EE on middle tier but like SQL Server as a backend.

    And this is something you should be emphasising in your benchmark.
    3) In terms of our Oracle perf, it totally depends on whether you are comparing EJB implementation or the straight JDBC implmentation (which we created, not IBM, and this JDBC only version crushes the EJB implementation in terms of perf for *this* web-based scenario). .NET against Oracle is much much faster than WebSphere EJBs against Oracle (again, for this scenario, but its a very common scenario).

    Everything I have seen about .NET implementations is that it is architecturely closer to JDBC type apps than it is to EJB. Apples and Oranges AFAIK.

    Hmm, common scenario. maybe 2 years ago. Nowdays it's all POJO's, Spring and Hibernate don't you know.
    4) As for the best perf in this scenario, its .NET against SQL Server 2005. Why? Is this a conspiracy? Well, we can optimize for SQL Server becuase we own it, we have full access to the API, and we can work with the SQL team on .NET integration.

    Exactly so where is the IBM WebSphere with IBM DB2 on IBM AIX scenario in this test???????

    Look, congrats on finally writing a JDBC driver for your database and once it is out of beta it would be good to retest the SQL Server 2005 scenario and see if the team managed to find out why they are behind.

    Cheers
    David
  11. Performance and Price Performance[ Go to top ]

    David, in response:
    The benchmark is talking about performance and makes the claim about .NET being 183% faster NOT 183% faster per $As for cost, you and I both know how rubbery these figures can be. Decide on the hardware you want and negotiate is how I buy things.

    Actually, the benchmark is talking about both, if you read the paper complete price/perf metrics are presented. The pricing is based on actual prices if you buy WebSphere Express online from IBM---all their published prices (the MS costs come from an actual quote from a reseller). If they cut certain customers better deals, good for those customers. Frankly, most enterprise customers likely are always encouraged to get the more expensive Network Deployment Edition ($15K per CPU which would have cost $60K for our 4-cpu server as tested). But many large WebSphere customers I have talked to (and I have talked to a few) have no idea what they are paying for WebSphere; its buried in a large multi-million dollar contract inclusive of large nunmber of IBM GS consultants.
    And this [jdbc driver for SQL Server] is something you should be emphasising in your benchmark.


    Its called out in the intro and conclusion.
    Everything I have seen about .NET implementations is that it is architecturely closer to JDBC type apps than it is to EJB. Apples and Oranges AFAIK.Hmm, common scenario. maybe 2 years ago. Nowdays it's all POJO's, Spring and Hibernate don't you know.

    Yes, it is more architecturally similar to JDBC than EJB....we stayed away from the stateful EJB model becuase we didn't feel it would ever scale as well and introduced too many complexities as well as reliability issues. The point here is that a large part of what you pay for when you get WebSphere is an EJB container and lots of code that manages EJBs/CMP etc. Yet for a common scenario (and yes, data driven web apps are still a pretty common scenario in the year 2005!) the EJB approach seems slower, and certainly no simpler/easier than JDBC or ADO.NET. Doing a benchmark of Spring and/or Hibernate would be *very* interesting! But as for customers no longer using EJBs.....that's just wrong--IBM, BEA etc all still heavily encourage use of EJBs, most large enterprises still use them (some have backed them back out). And EJB 3.0 pretty much is a do-over of the technology.
    Exactly so where is the IBM WebSphere with IBM DB2 on IBM AIX scenario in this test???????Look, congrats on finally writing a JDBC driver for your database and once it is out of beta it would be good to retest the SQL Server 2005 scenario and see if the team managed to find out why they are behind.Cheers -David

    Anyone can download the code and test on AIX and DB/2 for themselves. That would be a great thing for AIX customers to do. See what perf they actually get, and if they can't significantly reduce their costs by moving to less expensive Intel or AMD based machines running .NET/Windows (or even WebSphere/Linux--albeit more expensive than .NET/Windows)...and get just as good or better performance.

    -Greg

    PS: As for JDBC provider, the beta release performance is pretty much as good (~90%) perf of Oracle JDBC for this Plants/JDBC test with WebSphere. So do not know what you mean by *far* behind...
  12. Performance and Price Performance[ Go to top ]

    Yes, it is more architecturally similar to JDBC than EJB....we stayed away from the stateful EJB model becuase we didn't feel it would ever scale as well and introduced too many complexities as well as reliability issues. The point here is that a large part of what you pay for when you get WebSphere is an EJB container and lots of code that manages EJBs/CMP etc. Yet for a common scenario (and yes, data driven web apps are still a pretty common scenario in the year 2005!) the EJB approach seems slower, and certainly no simpler/easier than JDBC or ADO.NET. Doing a benchmark of Spring and/or Hibernate would be *very* interesting! But as for customers no longer using EJBs.....that's just wrong--IBM, BEA etc all still heavily encourage use of EJBs, most large enterprises still use them (some have backed them back out). And EJB 3.0 pretty much is a do-over of the technology.

    I think a little perspective is important here. Since I have a little bit of experience in the financial sector, there's a couple of real reasons for using EJB.

    1. there are multiple data models
    2. there's multiple databases ranging from oracle to old btree database
    3. there's multiple OS platforms from mainframe to PC's
    4. a simple update from the user perspective actually involves a dozen or more distributed transactions, which must follow strict business rules
    5. fault tolerance is absolutely a requirement
    6. loosing a single transaction is not acceptable

    In cases like these, I wouldn't care if one server is 10-80% faster, but can't gaurantee 100% failover and replication of sessions. The reasoning is this. If there is a chance of loosing a transaction, what happens if a large transaction is lost?

    In many cases, there are serious legal issues. It's not as simple as, "oh well, stuff happens." It's not like a simple internet retailer that primarily executes credit card transactions.

    My question is this. Does Microsoft address these issues? What is microsoft's answer to situations have cannot simply do away with complexity, because the complexity is required by law. Since microsoft prefers a stateless application layer, what happens when a business process is long running and requires state transition management. By state transition I mean this.

    Say I buy something, the funds aren't transfered immediately and have to go through a waiting period by law. this means the backend system has to analyze transactions change the state as events occur. In some cases, the system has to transfer money from an over-draft account. In other cases, it should delay the transfer until a deposit clears.

    does microsoft have an anser to these complex cases? is that solution the new biztalk server and BPEL? Clearly, the application server is a tiny piece of the puzzle for large financial institutions. the bigger question is meeting all the business, functional and legal requirements. An platform that is built for this type of application can make development considerably easier than one built for simple data driven web applications.

    peter
  13. Performance and Price Performance[ Go to top ]

    Actually, the benchmark is talking about both

    The name of the document is ".NET 2.0 vs IBM WebSphere 6.0 Data-Driven Web Application Server Performance Comparison" Perhaps you should amend your document to be more specific.

    Not that it matters, nothing like having a monopoly to help undercut your competitors pricing.

    It is also a lot of fun to look at price inflection points. "We supply you with a piece of software that runs on 1, 2 or 4 CPU's. If you run it on 1 CPU it costs $x, 2 CPU's it cost $x*10, 4 CPU's $x * 1000000000".......Oh yes it is the same software why do you ask?
    Yes, it is more architecturally similar to JDBC than EJB

    So Apples and Oranges. It is allways best to benchmark like for like.
    Anyone can download the code and test on AIX and DB/2 for themselves
    It's your benchmark, your the company trying to tell us how much better you are.
    PS: As for JDBC provider, the beta release performance is pretty much as good (~90%) perf of Oracle JDBC for this Plants/JDBC test with WebSphere. So do not know what you mean by *far* behind...

    .NET went from 1804 TPS (3rd place) to 2915 TPS by using SQL Server 2005. WebSphere on Linux went from 2105 down to 2053. Perhaps the .NET team should talk to the JDBC driver team and tell them what they did. The bottleneck looks to be the JDBC driver.
    And EJB 3.0 pretty much is a do-over of the technology
    Ooh! Thems fighting words. Your going to get those JBoss people mad at you :-)

    Cheers
    David
  14. Performance and Price Performance[ Go to top ]

    In response, respectfully:
    So Apples and Oranges. It is always best to benchmark like for like.

    Actually, then we should only benchmark Java against Java and maybe even only WebSphere against WebSphere. Forget about comparing Linux to Windows, or Intel to RS6000. Come on, benchmarks are all about comparing *different* things. That's the point. A customer can choose to use EJBs for a data-driven web app or not, performance testing vs. JDBC and/or .NET can help them make a decision, even if perf is not the *only* factor. The point is, the app in all cases is the functionally precisely the same, benchmarking can help you make decisions about architecture choices and/or platform choices. Its completely fair to compare this, although agree (see Peter's comments) that you need to constrain what conclusions you draw since they are limited to the functional scenario tested.

    We didn't, by the way, have to create an optimized JDBC version--we could have just tested the EJB version alone since this is what IBM ships (to your point on no one using EJBs anymore---better tell IBM)...but it would not have painted as complete a picture. And based on PetStore, the use of EJBs was always contentious and people wondered what JDBC-only would look like. In this scenario, and we are not claiming all scenarios, clearly JDBC is simply a lot faster than EJBs.
    It's your benchmark, your the company trying to tell us how much better you are.

    That's the American way....build a better product, promote it, let the customer decide and the best product be chosen for the job. No different than IBM, Oracle, Sun etc who have all done benchmarks of .NET. Customers get to decide what's BS and what is not. Hence the downloadable kit and full disclosure. Any customer can, with about 1 day effort, run this benchmark against whatever platform they are running WebSphere on, and compare to .NET on whatever intel-based (or AMD-based) box they think makes for a good comparison. I bet the results would surprise many, once they compare the cost of the AIX box to the x86 (or x64) Intel/AMD box.
    .NET went from 1804 TPS (3rd place) to 2915 TPS by using SQL Server 2005. WebSphere on Linux went from 2105 down to 2053. Perhaps the .NET team should talk to the JDBC driver team and tell them what they did. The bottleneck looks to be the JDBC driver.

    As I think I mentioned earlier, with SQL Server we can optimize the best becuase SQL Server is an MS product. We can make our Oracle perf as good as Oracle's Client (OCI) allows. We will continue to make it even better over time, but as of today its pretty close to Oracle/JDBC perf for this scenario at a fraction of the cost. And customers can use Oracle's .NET Data Provider and .NET tools if they want--it may be faster than ours. As for the .NET team talking to the JDBC team becuase the JDBC driver is the bottleneck---of course it's slower, its based on Java! Ok, just kidding and a cheap shot.... In reality, there are a couple of things in play here in this benchmark:

    1) data access speed--a large part of the test since every page involves a query.

    2) backend Servlet/JSP/ASPX processing speed---almost every page involves the JSP/ASPX engine generating the displayed content.

    3) Backend Web/network stack of .NET vs. WebSphere. We have a great integrated stack in IIS/ASPX...its fully process isolated yet very, very fast. IBM uses a plug-in model when deploying to their recomended IBM HTTP server to get equivalent process isolation like ASP.NET/IIS provide. I think the Web/network stack of WebSphere is much slower than ours, but the test does not isolate this however, so its just an obervation after doing many web-based benchmarks of WebSphere.

    Put 1, 2 and 3 together, and what do you have? An actual end-to-end working application! Which is pretty interesting to benchmark, becuase its what customers actual deploy.

    -Greg
  15. Performance and Price Performance[ Go to top ]

    Actually, then we should only benchmark Java against Java and maybe even only WebSphere against WebSphere. Forget about comparing Linux to Windows, or Intel to RS6000. Come on, benchmarks are all about comparing *different* things.

    I use benchmarks to find things out and if I want to look at performance I want as many things to be equal as possible. Because I know that the more things are different the less usefull the result is.

    I also know that comparing applications that are architectured differently is extremely difficult.
    although agree (see Peter's comments) that you need to constrain what conclusions you draw since they are limited to the functional scenario tested

    Then your document should be more carefull about it's conclusions. 183% indeed!
    That's the American way....build a better product, promote it, let the customer decide and the best product be chosen for the job.

    The only thing Americans have to teach is chutzpah. But considering how Microsoft treats it's competitors perhaps you are being ironic.
    1) data access speed--a large part of the test since every page involves a query.

    Considering the large performance boost .NET got by going from an indirect connection to a direct connection I would say the app was limited by the database connection. The WebSphere app may also be limited but since we have no way to test a direct connection I guess we will never know.

    But don't worry Marketing will have no problems telling people about how a data bound direct connected appplication beat a different architectured indirect connected application. Except they might just take a few words out to make it simpler for all those customers.
    becuase its what customers actual deploy

    My customers want to deploy on linux, what have you got for them?

    Cheers
    David
  16. Performance and Price Performance[ Go to top ]

    Do not believe statistics, which you didn't fake by your own.

    René
  17. Performance and Price Performance[ Go to top ]

    I use benchmarks to find things out and if I want to look at performance I want as many things to be equal as possible. Because I know that the more things are different the less usefull the result is. I also know that comparing applications that are architectured differently is extremely difficult.

    Then great, you ought to like the Plants benchmark comparing JDBC to EJB access on WebSphere...everything is *exactly* the same except one uses EJBs, one uses JDBC. Couldn't be any cleaner. And you ought to like the WebSphere/Windows vs. WebSphere/Linux comparison....everyting is exactly the same, except one is on Windows, one on Linux...etc, etc down the line. Agree benchmarking different architectures can be difficult, you need to make sure you are *precisely* matching functional behavior (inclusive of caching/data staleness, tx behavior, and the like).

    And you need to understand that a benchmark does not have to *isolate* the difference in perf at a technical level to be useful. If app X built to vendor x's best practice is 1/3 the perf of app Y built to vendor y's best practice, that tells you a lot. If vendor x is twice as expensive, this tells you even more (for the scenario tested). Now if you were using some funky DB driver with app server x, that could be a big issue. In our case, we are using all the mainstream, recommended stuff and all that info is fully disclosed.

    Then your document should be more carefull about it's conclusions. 183% indeed!

    This is what the paper says, exactly:

    "The results show that the .NET 2.0/Windows Server 2003 implementation of the benchmark application running against SQL Server 2005 outperforms the Java EJB-based WebSphere 6.0.2.3/RedHat Linux implementation by up to 183%"

    This is 100% accurate, and we stand by it. It constrains the conclusions to "the benchmark application." The paper goes on to show how JDBC and using no EJBs improves perf "for the benchmark application", and draws those conclusions as well. Again, accurate. Comparing .NET perf to the EJB/CMP implementation is important, since that is how IBM coded the app and is their recommended best practice.

    Considering the large performance boost .NET got by going from an indirect connection to a direct connection I would say the app was limited by the database connection. The WebSphere app may also be limited but since we have no way to test a direct connection I guess we will never know.But don't worry Marketing will have no problems telling people about how a data bound direct connected appplication beat a different architectured indirect connected application. Except they might just take a few words out to make it simpler for all those customers.

    You misunderstand the benchmark. What do you mean by a 'direct connection?" All implementations use the same connection model---a pool of database connections maintained by the app server. No different for any implementation, they grab a conneciton from the pool, do their thing with the DB, and release the connection back to the pool. Nothing fancy, tricky here, and the same across all implementations, including .NET against SQL Server. Both .NET/ADO.NET and WebSphere/JDBC have similar connection pooling capabilities. In no case was lack of database connections a bottlneck, or was the database itself a bottleneck, as the paper clearly points out. If that were the case, it would quickly be evidenced by an inability to push the app server to full CPU utilization---the first sign there may be an external bottleneck in a middletier load test. As the paper shows, in all cases the test scripts were able to push the app server to full CPU utilization (by increasing concurrent clients making requests, which increases TPS until the app server is saturated...this is all just standard testing procedure).

    But again, anyone can download the code and easily perform their own tests, change the code, tune differently, whatever.
    My customers want to deploy on linux, what have you got for them?CheersDavid
    The following link:

    http://www.microsoft.com/windowsserversystem/facts/default.mspx

    Sorry, kind of a cheap shot again :-).

    Good night!

    Greg
  18. Performance and Price Performance[ Go to top ]

    Then great, you ought to like the Plants benchmark comparing JDBC to EJB access on WebSphere...everything is *exactly* the same except one uses EJBs, one uses JDBC. Couldn't be any cleaner.

    Greg,

    I am shocked, how Microsoft staff has so low level of knowledge about the data access in modern enterprise applications. It is said that high paid Microsoft specialists do not read at least one good design book every year. And it is not enough, Microsoft must publish this crap, benchmark this crap and even argue about this crap on the TSS.

    Please spend some time and money, buy the Rod Johnson book: “J2EE Design and Development” and try to understand it. Published in 2003, 2 years old, but still these news did not arrive to Microsoft.

    You can find (hopefully):

    It is not about EJB against JDBC !!!

    It is about using DAO and carefully planning, which DAO is implemented by EJB (or JDO, Hibernate, TopLink or any other object relational mapping) , which one by JDBC SQL and which one by JDBC Stored Procedures.

    IT IS NOT ABOUT ONLY EJB OR ONLY JDBC, BUT ABOUT CARFULLY PLANNED AND TUNED COMBINATION OF EJB AND JDBC !!!!!!!!!!!!!!!

    Hopefully even Microsoft will understand this small, but crucial DIFFERENCE !!!
  19. Performance and Price Performance[ Go to top ]

    "I am shocked, how Microsoft staff has so low level of knowledge..."

    I am not shocked.
    I've seen in action small army of MS consultants trying to tune W2K3 TCP/IP stack.
  20. Performance and Price Performance[ Go to top ]

    Yes, it is more architecturally similar to JDBC than EJB....we stayed away from the stateful EJB model becuase we didn't feel it would ever scale as well and introduced too many complexities as well as reliability issues.
    So why do you use EJB for you benchmark if you do not know how to use it ? Or do you yust looking for idiots to trust your benchmark ? Compare MTS to WebSphere if you want to compare distributed transaction processing performance.
  21. General note on the benchmark[ Go to top ]

    I understand the reason for optimizing the plants sample to store just the URL and serving the image from the webserver, but I think there's something important being missed here.

    The example does this because many financial institutions are required by law to keep a copy of every check and make it available to either the merchants or the account holder. Because of security concerns, it's not acceptable to store the images in plain view where someone can easily read the account information. Storing the images and serving from the database is done specifically to show how an application might perform under "heavy" requirements. Optimizing the image serving makes sense in the context of benchmarking, but it fails to address the real issue of "how does one meet heavy requirements and still have acceptable performance."

    I would be interested to see the results of the benchmark without the image optimization. I say this because Websphere is clearly designed for a specific market and type of application. In doing so, it probably makes it less efficient for other types of applications. Just some food for thought.

    peter
  22. Re: General note on the benchmark[ Go to top ]

    Peter,

    agree testing with the image servlet and equivalent image-serving aspx web form would be interesting to do as well. While not the mainstream approach for an app like Plants, certainly the banking scenario you mention and others would do this. I logged onto my Bank of America account and found the image size for a check to be about 15K. The 64-bit benchmark paper also published at http://msdn.microsoft.com/vstudio/java/compare uses images in db as the test scenario.....it could easily be run simply by using the provided loading page to load the DB with a 15K image and see if there is a big perf difference between .NET/SQL; .NET/Oracle and WebSphere/Oracle - WebSphere/SQL. I'll try this, I am always curious by nature.

    One question here is, would you use object or servlet caching (and ASPX equivalent object caching/page output caching) or not? For Plants obviously you would (but I think an app like Plants would not store images in db typically); but for banking where an image must be in a backend DB it seems to me the cache hit ratio would be very low, considering the volume of checks out there and how unlikely it is for a customer or teller to repeatedly ask to see the same check (??). Hence likely better to run w/o any caching of the queries images?

    -Greg
  23. caching may not be allowed[ Go to top ]

    Peter,agree testing with the image servlet and equivalent image-serving aspx web form would be interesting to do as well. While not the mainstream approach for an app like Plants, certainly the banking scenario you mention and others would do this. I logged onto my Bank of America account and found the image size for a check to be about 15K. The 64-bit benchmark paper also published at http://msdn.microsoft.com/vstudio/java/compare uses images in db as the test scenario.....it could easily be run simply by using the provided loading page to load the DB with a 15K image and see if there is a big perf difference between .NET/SQL; .NET/Oracle and WebSphere/Oracle - WebSphere/SQL. I'll try this, I am always curious by nature.

    One question here is, would you use object or servlet caching (and ASPX equivalent object caching/page output caching) or not? For Plants obviously you would (but I think an app like Plants would not store images in db typically); but for banking where an image must be in a backend DB it seems to me the cache hit ratio would be very low, considering the volume of checks out there and how unlikely it is for a customer or teller to repeatedly ask to see the same check (??). Hence likely better to run w/o any caching of the queries images?-Greg

    there's no single answer to this. depending on the banks policy, the application might use pragma-no-cache and prevent the browser from caching the scans of the checks. So in that case should the app server cache the image? Honestly, I don't know and each bank has it's own policies.

    when one consider the connection is already using SSL, there's already a huge over head. at that point, it doesn't matter how fast the webserver can serve an image. the over head from the legal and security requirements is so big, reliability and clustering become far more important than all out speed. that's my limited experience.

    peter
  24. Re: General note on the benchmark[ Go to top ]

    but for banking where an image must be in a backend DB it seems to me the cache hit ratio would be very low, considering the volume of checks out there and how unlikely it is for a customer or teller to repeatedly ask to see the same check (??). Hence likely better to run w/o any caching of the queries images?-Greg

    For some bits of data, a cache hit > 0 can give you signficant savings. It will take millions of "if not in cache" checks to out weigh the cost of a single query and image pull over a network.

    regards,
    Kirk

    kirk[at]javaperformancetuning.com
  25. I don't think I'm missing anything ?[ Go to top ]

    Greg, I think you're missing something here, among the J2EE community, the comparision of with EJB is a NON-STARTER. I think that You (Microsoft) don't seem to get this ! I will say it again, It is USELESS, and It DAMAGES ***YOUR*** CREDITABILITY when you do comparisons with EJB. It is not a "like" comparison and I will say it again. IT DAMAGES YOUR CREDITABILLITY IN THE J2EE community among inform readers.

    On the flip side, it's good hype for the uninformed community.

    Because you don't see this "GREAT" performance of .net with ORACLE, and because you see the below average performance of Websphere With SQLServer this tells you that .NET is not the reason for the performance gain, and it tells you that SQLServer is NOT the reason. So neither of these products alone are buying you much. It does "appear" to tell you that it is a DRIVER issues. So the Article should read "MS has a SUPER FAST DRIVER FOR SQL SERVER."

    Now as for productivity, It clear that You don't have any personal experience with J2EE development. Your vaugue description about your customers says that you are not in the trenches. Here is what I've found from personal experience.

    I work with Websphere /WSAD.

    WSAD is infinitely more productive than .net envrironment for SERVER SIDE DEVELOPMENT( 1.0-1.1) I have not use 2.0 which I hear is better.

    For FrontEnd development, the .NET envrionement is infintely more productive than WASD/JSF combination. Sun has drop the ball on Web GUIs....

    At my company, we like the .NET WEB GUI/ Server side java combination. And I am seeing a lot of ads in the NYC are for this type of Architecture.

    The rest of your stack has very little creditability on the server side. For example MSMQ ..... ( I think that was joke people use to tell a few years back)

    But you are correct about one thing, I applaud microsoft for how far it has come from a performance standpoint for .NET.

    As for Why use EJB's, and Why use Websphere. That as GOOD question. The answer is this. EJB have appropriate useage when you are trying to support Transactional clients particularly in XA situation. EJBs are good when you need to REMOTE an application.

    ** The PlantsByWebshpere example DOES NOT NEED EJBs. **

    The PlantsByWebsphere is an example to show you all the features of Websphere. It is NOT an EXAMPE of "how to do things correctly !" You Guys Don't seem to get this! Some in the JAVA community don't get this.

    If you are really interested in understanding this, take a look at any article written by Rod Johnson.


    Note, Pick an example where EJBs are REQUIRED, you would find that the .NET code you wrote would NOT BE APPROPRIATE ! Instead you would have to use .NET REMOTING and what ever NAMING/LoadBalancing, Queuing systems you would use.


    And as for SQLServer, I understand you intentions here, but In the NYC financial community, databases run on really BIG BOXES. So if you really want to draw customers, you have to show them why they should get rid of their big boxes (AS400, OS390, SUN ) and use SQL Server. So you need to do benchmarks against those boxes. I think that the trend IS toward commodity hardware and cheap software (linux is FREE), but I'm not sure that DATABASE usage in the NYC Fin. community is heading that way.

    You also have to show reliablity in a high transaction envrionement.

    I consulted at a place where their moto was "you can use MS solutions on departmental level things that don't involve 'real money' e.g. < 100,000 dollar transactions"
    But when you start talking about BILLIONs of dollars a day in transactions.... MS is not where in the picture.


    Finally, I work with Websphere every day. Everyone tells me it is one of the slowest AppServers. But I stick with it mainly because WSAD is one of the BEST and Most Productive Development environments around. But I do understand the frustration the rest of the community has because they don't won't Websphere to be the representitive of the J2EE community.

    And as for your price performance.... Since there are many J2EE solutions ( FREE ONES ) that have as good a stability record as MS does... I think you heading for a tough road.

    My thinking goes as follows : I'm using WEbSphere, MS wants to me to switch to MS on Price / Performance. My next thought is Why not go for LINUX/JBOSS/Eclipse (or others) solution.




    So good luck.
  26. I don't think I'm missing anything ?[ Go to top ]

    Greg, I think you're missing something here, among the J2EE community, the comparision of with EJB is a NON-STARTER. I think that You (Microsoft) don't seem to get this !

    I think we agree here. EJBs should not be used in an app like PlantsByWebSphere. We do get it, that's why we built the JDBC versions in Java in the first place. EJBs may, as you point out, have a good place in other scenarios, but I think they continued to be pushed by the main J2EE app server vendors for all scenarios and quite frankly, they just slow the app down and add complexity. The problem is, most apps have different areas of functionality, and to build an optimal J2EE app you would need to use a mixed architecture...sometimes using EJBs, sometimes not. IBM, not MS, chose to publish the PlantsByWebSphere web app using EJBs. MS, not IBM, chose to create a faster JDBC-only version to add to the comparison.

    As for other comments about MSMQ, etc. Just off-base. Dell.com (order system), LondonStock Exchange, and many other very large .NET apps use MSMQ. But if you like MQSeries, it can also be used in .NET apps since IBM created .NET classes that sit on top of it.


    -Greg
  27. I don't think I'm missing anything ?[ Go to top ]

    .. LondonStock Exchange, and many other very large .NET apps use MSMQ.

    LSE uses MSMQ? For what? You're not talking about one of the _external_ retail pricing gateways, are you? The one that supposedly peaks at a few hundred messages a second?

    Just because there's some app that happens to use data from the LSE doesn't mean that the LSE uses MSMQ (it doesn't). By your measure, the LSE uses VB too, because I've seen VB apps running on computers there. And I know some of the guys in the building that have Windows phones, so the LSE runs on Windows phones. I'm sure Microsoft Bob used to run the LSE, but I don't have any proof .. ;-)

    Peace,

    Cameron Purdy
    Tangosol Coherence: The Java Data Grid
  28. Picking on IBM. Smart!
    Why not take on Resin + iBatis Petstore? It would never touch the DB.
    w/ a 64 bit jRockit VM (parallel GC) - like I deployed for my last client.

    We know J2EE issues, and avoid them. .NET 2 and C# 3 are great. So is Drupal, vBulletin, Plone, etc. they are great.
    But generals fight the last war.
    This is the next war:
    http://msdn.microsoft.com/smartclient/understanding/definition/default.aspx

    .V
  29. Picking on IBM. Smart!Why not take on Resin + iBatis Petstore? It would never touch the DB.w/ a 64 bit jRockit VM (parallel GC) - like I deployed for my last client.We know J2EE issues, and avoid them. .NET 2 and C# 3 are great. So is Drupal, vBulletin, Plone, etc. they are great.But generals fight the last war.This is the next war:http://msdn.microsoft.com/smartclient/understanding/definition/default.aspx.V

    Stop the press :)

    C# 3 is already out? I thought they were just getting ready to release VS2005 with .NET 2.0. I must have fallen asleep for a few years. joking aside, the platform choice is largely the result of executive decisions and staff resources. Switching from 1 platform to a completely new one is hard to do and often isn't cost effective.

    peter
  30. C# 3.0[ Go to top ]

    C# 3 is already out? I thought they were just getting ready to release VS2005 with .NET 2.0.
    .NET 2.0 and VS 2005 are out. However C# 3.0 is the name of the future version, some bits of which are available as previews at this stage (e.g. LINQ).
  31. C# 3.0[ Go to top ]

    C# 3 is already out? I thought they were just getting ready to release VS2005 with .NET 2.0.
    .NET 2.0 and VS 2005 are out. However C# 3.0 is the name of the future version, some bits of which are available as previews at this stage (e.g. LINQ).

    Right, the previews are out, but it's not officially a production release. I'm fine with "preview code". But to be nit-picky, C# 3.0 isn't really out. That's one big advantage of open source to me. It's much easier to release often and get new features out the door faster. There are some interesting things in C# 2.0 and 3.0, but I don't consider them critical. They're nice things to have, but the value proposition is going to take a while to prove in the real world. that's true for every product, be it open source or commercial.

    peter
  32. New JDBC driver for SQL Server[ Go to top ]

    One reader asks about whether MS will provide a great JDBC provider for SQL Server. The answer is yes, as tested in the Plants benchmark, we have been working on a brand new JDBC provider for SQL Server that is in beta now on MSDN, and is going to be released likely in January timeframe (couple of months). It's based on a 100% new codebase from past drivers we have released, which were developed and licensed from other vendors. This one is new code, owned and being shipped and supported by MS. It's going to get very good performance, as the tests show.

    -Greg Leake
    Microsoft Corporation
  33. Just Curious[ Go to top ]

    Can we get a comparison on how this Microsoft PlantsByWebSphere performs on AIX, HP-UX, Sparc Solaris and Linux?

    Just curious....
  34. Just Curious[ Go to top ]

    Can we get a comparison on how this Microsoft PlantsByWebSphere performs on AIX, HP-UX, Sparc Solaris and Linux?Just curious....
    And JBoss, Jonas, WebLogic, OAS, etc.. ;)
  35. New JDBC driver for SQL Server[ Go to top ]

    One reader asks about whether MS will provide a great JDBC provider for SQL Server. The answer is yes, as tested in the Plants benchmark, we have been working on a brand new JDBC provider for SQL Server that is in beta now on MSDN, and is going to be released likely in January timeframe (couple of months). It's based on a 100% new codebase from past drivers we have released, which were developed and licensed from other vendors. This one is new code, owned and being shipped and supported by MS. It's going to get very good performance, as the tests show.

    This is great news, Greg! Will the JDBC performance and throughput for Java applications with SQL Server be as high as the performance and throughput for .NET applications?

    Peace,

    Cameron Purdy
    Tangosol Coherence: Clustered Shared Memory for Java
  36. New JDBC driver for SQL Server[ Go to top ]

    This is great news, Greg! Will the JDBC performance and throughput for Java applications with SQL Server be as high as the performance and throughput for .NET applications?
    Exactly what I had been thinking.

    Pretty much what this study shows - if one is going to use part of the Microsoft stack - you might as well, or best go ahead and use the whole stack.
  37. C# 3.0[ Go to top ]

    Right, the previews are out, but it's not officially a production release. I'm fine with "preview code". But to be nit-picky, C# 3.0 isn't really out. That's one big advantage of open source to me. It's much easier to release often and get new features out the door faster.
    You're not nit-picky, it's an obvious fact that C# 3.0 is still in a somewhat distant future. But I don't see why you make this comparison with open source. A better comparison would be with J2SE itself.
  38. Isn't it obvious[ Go to top ]

    Right, the previews are out, but it's not officially a production release. I'm fine with "preview code". But to be nit-picky, C# 3.0 isn't really out. That's one big advantage of open source to me. It's much easier to release often and get new features out the door faster.

    You're not nit-picky, it's an obvious fact that C# 3.0 is still in a somewhat distant future. But I don't see why you make this comparison with open source. A better comparison would be with J2SE itself.

    You're right, it would be more accurate to compare .NET to J2EE. Being an open source bigot :), I tend to compare things to open source. But that's my own bias.

    peter
  39. The Next War[ Go to top ]

    I agree. With Microsoft's apparent move towards online "services", the next "war" will in fact be in the how we can use those services to improve the overall computing experience. And improved computing experience is what smart clients are all about (AJAX doesn't cut it).

    Paul
  40. The Next War[ Go to top ]

    I agree. With Microsoft's apparent move towards online "services", the next "war" will in fact be in the how we can use those services to improve the overall computing experience. And improved computing experience is what smart clients are all about (AJAX doesn't cut it).Paul

    Hmm... Isn't that similar to Eclipse RCP?

    Stanly
  41. The Next War[ Go to top ]

    I agree too. This is not the topic of this thread, but I think it is an important one. New client software overcoming all the limitations of web technology. AJAX is a only a patch to the current situation. Client software based on Xforms seems a valid alternative to me.
  42. But generals fight the last war.This is the next war:http://msdn.microsoft.com/smartclient/understanding/definition/default.aspx.V

    Next war, huh? Sounds like a Java app I worked on a few years ago.
  43. There we go again...
  44. What should the reaction be from Java developers, if any?
    Two words: Deja vu! :)
  45. What should the reaction be from Java developers, if any?
    Two words: Deja vu! :)
    Two words: Deja mooo! (We have heard this bull before) ;-)
  46. It's hardly surprising that Microsoft can put together a benchmark that shows that Microsoft employees can use Microsoft software on Microsoft platforms with Microsoft tools to build something that is faster than the same Microsoft employees can get an IBM WebSphere app to run.

    Nonetheless, I've changed my mind about how bad these benchmarks are. I saw how the .NET Petstore helped to improve Java and the available libraries significantly. I don't think we'd have iBatis or EJB3 with Microsoft's tacticsk so I think we should welcome these benchmarks with open arms. I think we should be encouraging Microsoft to come out with ".NET is 28x faster" claims, because it helps us in the end.

    Of course, on this particular benchmark there was only one db server and one app server. If they had 16 app servers and one db server, I'd be glad to help re-write the test to show what Java can really do ;-)

    Peace,

    Cameron Purdy
    Tangosol Coherence: Clustered Shared Memory for Java
  47. haha, but we know MS isn't going to do that. Considering the throughput would be seriously hindered at the database layer with 16 app servers. I'm pretty sure MS doesn't recommend doing that. It's all good though. without flame wars, things get pretty boring :)

    peter
  48. I would like to reply to Cameron's post where he points out that it would be better that IBM developed the code and participated in the benchmark, then we would have 'fair' results. I agree! We would welcome the chance to have a head-to-head with IBM, but IBM has declined several such invitations in the past, including those extended by the Middleware Company. They did actually accept one, however, a while back. This was a Web Service performance test with clustering conducted by Network Computing. In this review, the vendors themselves implemented the code and went to the lab independently for tuning/config for the tests. The link is at:

    http://www.networkcomputing.com/showitem.jhtml?docid=1604f3"

    In this test, .NET had 1400 TPS to IBM 600 TPS in one of the main tests, we bested them by a huge margin in all tests, and IBM had negative scalability going from one to two servers. Now, on the other hand, the review has lots of technical mis-statements and I don't believe IBM performance on a cluster of two servers could possibly be this bad, so IBM must have made some mistakes when setting up their own systems. The point is, we welcome a head to head shake with IBM writing the code and doing the tuning--as long as pricing information is disclosed on the configuration tested (something IBM convineiently had removed from JAppServer results at SPEC).

    We believe the tests conducted are very fair, they are based on IBM's own code -- PlantsByWebSphere -- with any modifications we made (like using local vs. remote ejb interfaces, and creating a lightweight JDBC version based on past comments in prior benchmarks) fully published so anyone can see all the code, change the code if desired and perform the tests on their own. All tuning and config is also published. One of the big conclusions of the test is not just the relative performance of EJB vs. JDBC and .NET vs. WebSphere---it's also the pricing of the configs tested. There is a marked difference between .NET/Windows Server 2003 and IBM WebSPhere---even when you price out their Express Edition.

    And in all fairness, Cameron and many others on the serverside and in the Java community, including IBM, Oracle and SUN, have all held up 'performance and scalability' as one of the primary reasons to not switch to .NET---I could pull hundreds of posts from serverside that make this claim, and point to many whitepapers from IBM, Oracle and SUn that also make this claim. So its a little disingenous to say, when we publish benchmarks with full disclosure and downloadable kits to show otherwise, that 'performance does not matter' and all the tests are 'rigged.'


    -Greg Leake
    Microsoft Corporation
  49. Microsoft vs. IBM[ Go to top ]

    Microsoft has it pretty easy here. Lets face it, even though IBM is the biggest vendor, they are also the slowest app server out there. Remember some time ago when TMC did a comparison? Yep, IBM was far behind the pack, and probably always will be. The big Java vendors have made it easy pickings for the Microsoft crowd with their push for more complex architectures along with poor implementations.

    Greg makes a point to mention that they used IBM's own application, but lets face another reality that seems to be lost on most people: .Net != J2EE. Now, take .Net + MSMQ + MTS (does that still exist?) and then you are closer to J2EE as a stack. The biggest problem facing J2EE is that the vendors have pushed their biggest and baddest server onto their customers when all they really needed was a servlet engine in most cases. I agree with a previous comment about comparing against Resin + iBatis or other simliar combo. The vast majority of Microsoft web apps are two-tier in nature and they are compared against 3 or 4 tier java apps.

    To Microsoft's credit, the vast majority of applications could be written nice and simply without a bunch of frameworks or distributed architectures, a fact that gets lost on most of the java community (the same could be said of PHP apps as well). The vendors come up with the multi-layered apps to show off capabilities, then someone else takes that as a "this is the way it should always be done" and runs a benchmark against it. That is what happened with the PetStore app. It was a demo only.
  50. Microsoft vs. IBM[ Go to top ]

    Robert,

    I couldn't agree more. More money is being wasted by customers than I care to imagine. The vast majority of apps can simply be built....more simply. Yet they get sold on all the high end features and complex architectures (remember, IBM has a huge service business so often they are getting paid by the hour to build deploy and even manage this stuff) that *most* apps do not need.

    With that said, sure, many enterprise-class apps do require messaging, clustering, failover and the like. But I will still maintain that apps with simpler architectures will always be easier to cluster and keep running. As for the Plants App.....here are some facts about the implementations we released in the kit.

    All are *logical* three-tier architectures, but deployed in a physical two-tier single app server scenario. You could for .NET or J2EE implementations, remote the back end components to a separate server (via EJBs, .NET remoting, or Web Services--you could even integrate the .NET front end with the J2EE backend via Web Services or vice versa with trivial work). However, I still believe that if you want to add servers, simply replicate the entire app stack on each server---don't remote individual components unless you really have to. Then cluster the replicated app servers together via network load balancing---you can have complete failover, add capacity with new servers and the like in such a scenario with the same very clean/straighforward architecture and much faster performance. You will see linear scaling as servers are added for both websphere and .net in such a case until the network or database become the bottleneck.

    Technically, you can remote the EJBs (if you switch back to remote interafaces); and you can remote the backend .NET components (business logic and data access layers) via .NET remoting or (more easily), Web Services. But customers should really think such decisions through---you always pay a perf penalty for remoting---so is there a *real* need?

    As for messaging, yes, .NET has the full System.Messaging classes that sit on top of MSMQ, and they include full support for transacted queues and the like. So all of that is there as part of .NET/Windows Server--no separate installs or more costs. MSDN has lots of articles on messaging.

    -Greg Leake
    Microsoft Corp.
  51. Microsoft vs. IBM[ Go to top ]

    Robert,I couldn't agree more. More money is being wasted by customers than I care to imagine. The vast majority of apps can simply be built....more simply.<> Yet they get sold on all the high end features and complex architectures (remember, IBM has a huge service business so often they are getting paid by the hour to build deploy and even manage this stuff) that *most* apps do not need.

    There certianly is a class of applications that can be built more simply then they are. However I have to strongly disagree that these account for *most* of them. Maybe this is your experince but it is certianly not mine nor is it the experience of my customers. They are solving very complex problems that requires complex logic to solve.

    As for the Plants App.....here are some facts about the implementations we released in the kit.

    Here are some other facts; this benchmark is meaningless because it is not representative of my customers workload. In fact would bet that this benchmark is not representative of most peoples application. Aside from this point, a benchmark is suppose to answer specific questions about performance. The question posed here is too broad to be meaningful. This is like the old question, which lanaguage performs better, C, C#, Java, Ada. It's just not a question you can answer because the definition for "performs better" is subjective or selectivily objective at best. If this benchmark were to address specific questions, then I would give it a lot more credance. My view on this is not just limited to this particular benchmark, this is a position that I put forward in the benchmarking section of my performance tuning course. But since this is what has been put forward, this is what needs to be reviewed.

    Next fact, I downloaded the benchmark and started looking for evidence that WAS was indeed optimized. Maybe this is where you can help. What (if any) technical tuning was performed on the JVM and WAS during the baselining stage of the benchmark? I ask this because the 183% and 3% numbers published could easily be gobbled up after a round of technical tuning.

    May I suggest that your labs look at the output of the Spec benchmarks. I'm not going to suggest that Spec reporting is perfect but it certianly is a huge step in the right direction.

    Regards,
    Kirk

    kirk[at]javaperformancetuning.com
  52. Microsoft vs. IBM[ Go to top ]

    Kirk,

    *ALL* the benchmark tuning information, ala SPEC, is posted in the appendix called tuning. This details the WebSphere settings changed from defaults, the IBM HTTP tuning (via httpd.conf); and any Windows and Linux tuning done. Its very complete, I do not know what Jospeh means when it says information was left out. What information? If something is missing, we will gladly post this info bbecuase we WANT customers to be able to accurately reproduce the results...and/or test on equipment that matches their environment. So please refer to the tuning section of the doc that says exactly and precisely what was tuned and how. Again, if you deem something missing, be specific and we will post this information and update the doc. There is nothing to hide here!

    As for your comments on benchmark in general, I too have some level of expertise on this matter, so I will add my two cents.


    1) First, all benchmarks are dangerous and should be viewed with caution. Not all benchmarks are meaningless, however. No benchmark should be believed withhout full disclosure (code, tuning, versions tested, scripts, data loads, etc.) and an ability for public review and most importantly, others to be able to reproduce the benchmark tests, verify and comment on the results. This benchmark and the other 64-bit benchmark published in tandem meet that bar.

    2) Unless you are working on the underlying JVM/CLR goo itself at MS, IBM, Sun etc, micro benchmarks are typically not useful for customers. Who cares if .NET can create 2MB objects 15% faster than IBM's latest JVM, or an empty loop in Java runs 100 times faster than an empty loop in .NET/C#? What *is* useful are solutions-based benchmarks that test an entire end-to-end scenario that replicates a common customer scenario. A data-driven web app does that. It tests all the core architectural building blocks a real customer would use to build such an app, even if their app is 100 times more complicated. The same approach to building is done, the same core elements are done, and the benchmark tests the scenario by simulating actual web users pounding on the system. Why is this useful:

    a) If customers themselves did more of this type of testing (of subsets of their apps) during the arcitecture/dev phases, they would much more quickly determine problem perf areas, and be able to make more informed decisions on different architecural approaches (like, JDBC vs. EJB; local vs. remote interfaces, use of framework X vs. framework y vs. no framework); the information gleaned is incredibly useful for architectural decisions, both for .NET and Java.

    b) If the code is published, and tuning is published with the results, then customers can use the approaches in that code as a template for their own apps and architectural decisions, without necessarily having to find out the hard way. With code publsihed, and discussion and public comment, customers glean actual information. For example, they can take our DotNetGardens app and use it as a template for their own data-driven web apps and get a good 3-tier logical architecture, and great performance. And they can understand the perf tradeoff to expect between .NET on Oracle vs. .NET on SQL, for example. This is useful and good, and in fact our PetShop 3.0 application has been downloaded by over 200,000 different customers in the last 2 years precisely becuase they can use the code to jumpstart thhir own apps in a high-perf way.

    c) It turns out, performance *does matter.* If an app like PlantsByWebSphere or dell.com or match.com or bankofamerica.com or whatever must support 10000 concurrent users with avg response times < 1 second, then using a benchmark/load test to determine the required capacity before deployment is critical. Then they can put the right number of servers in place, iron out bottlenecks and the like. But taking this one step further, if app server (A) performs twice as fast as app server (B) for the actual complete app or some core portion, then the customer can use app server A and deploy on 1/2 as many servers as app Server B. This means 1/2 as much in hardware acquisition costs, 1/2 as much on middle tier software licensing, and maybe most import, reduced management costs if you believe managing a cluster of 4 servers is less costly than managing a cluster of 8 servers. Bottom line, performance matters a great deal. Also, imagine the scenario where a customer has a Web-based app running on a mainframe, and its costing them $x million a year to have it there. Then they take this same app, and using Java or .NET or other and find thhey can get the same or better performance on 3 intel-based 2-way servers. Wow. And with clustering they find out they can get the same uptime. Wow.

    -Greg
  53. Microsoft vs. IBM[ Go to top ]

    ... if app server (A) performs twice as fast as app server (B) for the actual complete app or some core portion, then the customer can use app server A and deploy on 1/2 as many servers as app Server B. This means 1/2 as much in hardware acquisition costs, 1/2 as much on middle tier software licensing, and maybe most import, reduced management costs if you believe managing a cluster of 4 servers is less costly than managing a cluster of 8 servers. Bottom line, performance matters a great deal. ....
    Didn’t even read trough the benchmark setup and results, but hey – common sense goes like:

    - Results are deliberately not accurate.
    - Results are mistakenly not accurate.
    - Results are accurate. Then something’s fishy with the setup (rather fruity – like comparing apples and oranges).

    Last one is my pick.
  54. Microsoft vs. IBM[ Go to top ]

    Kirk,*ALL* the benchmark tuning information, ala SPEC, is posted in the appendix called tuning.

    I've not been able to locate that information. It is certianly not in the bundle that I've downloaded and installed. Could you provide us all with a URL?
    First, all benchmarks are dangerous and should be viewed with caution. Not all benchmarks are meaningless, however.

    Agreed and I never said that all benchmarks are meaningless.
    No benchmark should be believed withhout full disclosure (code, tuning, versions tested, scripts, data loads, etc.) and an ability for public review and most importantly, others to be able to reproduce the benchmark tests, verify and comment on the results.
    Hence my suggestion on how Spec reporting is setup.
    This benchmark and the other 64-bit benchmark published in tandem meet that bar.2) Unless you are working on the underlying JVM/CLR goo itself at MS, IBM, Sun etc, micro benchmarks are typically not useful for customers. Who cares if .NET can create 2MB objects 15% faster than IBM's latest JVM, or an empty loop in Java runs 100 times faster than an empty loop in .NET/C#?

    I wasn't talking about microbenchmarking (MBM)in particular no was I suggesting that one could get big picture numbers with MBM and I'm happy to continue to leave MBM out of the discussion if you don't mind.
    What *is* useful are solutions-based benchmarks that test an entire end-to-end scenario that replicates a common customer scenario. A data-driven web app does that.

    Humm, I think this is where we part company. A data-drive web app benchmarks just that, a data-driven web app that resembles your benchmark. IMHO, these course gained benchmarks are equivlent to MBMs (oh gosh, I broke my own request). If I happen to be doing things that way, then great... the number is useful. If I'm not.. and you don't need a very large delta to be different, then these numbers are meaningless to me. However, benchmarking at the component level provides me with much more useful information. In this case of this benchmark, I would have not included a database. The database is not meaningful in .NET performance vs. WAS performance. It is meaningful in my applications performance but your application is not my application and consequently I can't get from these numbers how .NET is going to perform vs how my database is going to perform. Also, without understanding how my database performs and how .NET or WAS performs means that I also can't understand the effects of the interactions of these components on my performance. In other words, the more things that you add to the benchmark, the greater change that it's going to deviate from what I need and the information that the benchmark provides will be multiplxed with numerous signals which will make it difficult for me to understand what will happen when I start mixing components to build my particular application.

    So end to end benchmarks make for good marketing because they are easy to explain and conceptually easy to understand. However from a practical standpoint I'll repeat myself, they are meaningless.

    regards,
    Kirk

    kirk[at]javaperformancetuning.com
  55. Facts[ Go to top ]

    So end to end benchmarks make for good marketing because they are easy to explain and conceptually easy to understand. However from a practical standpoint I'll repeat myself, they are meaningless.

    How come 'we' are not able to come up with an 'end to end' benchmark that blows away M$? If everybody says THE JAVA PLATFORM is better (BTW, DB2 kicks ass in http://www.tpc.org/tpcc/results/tpcc_perf_results.asp so they have a powerfull DB at their diposal) why nobody (IBM, Oracle, BEA, Sun) is able to ramp up something to stand up in front of this marketing aggresion? We are talking the talk( "Vorbim vorbe" in romanian that is) !?

    Regards,
    Horia Muntean
  56. Facts[ Go to top ]

    Totally agree,

    IBM should come forward and write an app, tune it and then run against the same app written by M$.
  57. Facts[ Go to top ]

    So end to end benchmarks make for good marketing because they are easy to explain and conceptually easy to understand. However from a practical standpoint I'll repeat myself, they are meaningless.
    How come 'we' are not able to come up with an 'end to end' benchmark that blows away M$? If everybody says THE JAVA PLATFORM is better (BTW, DB2 kicks ass in http://www.tpc.org/tpcc/results/tpcc_perf_results.asp so they have a powerfull DB at their diposal) why nobody (IBM, Oracle, BEA, Sun) is able to ramp up something to stand up in front of this marketing aggresion? We are talking the talk( "Vorbim vorbe" in romanian that is) !?Regards,Horia Muntean

    For high end systems, DB2 has proven itself to scale well. I've been learning DB2 the last few months and the learning curve is steeper than other databases. That's first hand experience. Compared to SqlServer, Oracle, Sybase ASE, mysql and postgresql, DB2 requires more skill and experience. Also, shops that use DB2 tend to have big systems, so it is rather rare to see DB2 on a low end hardware.

    peter
  58. Facts[ Go to top ]

    I've been learning DB2 the last few months and the learning curve is steeper than other databases. That's first hand experience. Compared to SqlServer, Oracle, Sybase ASE, mysql and postgresql, DB2 requires more skill and experience. Also, shops that use DB2 tend to have big systems, so it is rather rare to see DB2 on a low end hardware.peter
    Not in mine. Installed DB2. Setup the db and let it go. App ran for years. SQL Server? lots of fun running out of space, etc.
  59. Microsoft vs. IBM[ Go to top ]

    Kirk,*ALL* the benchmark tuning information, ala SPEC, is posted in the appendix called tuning.
    I've not been able to locate that information. It is certianly not in the bundle that I've downloaded and installed. Could you provide us all with a URL?

    I see that on the webpage at the bottom there are a number of configurations. My comments on them. First, the heap size seems large for this type of application. There is no mention of GC settings so one has to assume default GC settings were used. The default GC (IIRC) is Mark and Sweep without compaction. The effects of this large of heap on the default IBM GC could be the excessive paging of memory as the GC performs it's mark and sweep. I see that the benchmarking machines all contain 4GB of memory or greater. That said it's does not mean that Java heaps stay resident. That question would be answered by the paging and scan rates in Linux (similar stat exposed in perfmon). For example, windows reduced an applications working set by forcing the application to swap. You can get an interesting performance boost in XP by turning off swap altogether but that does cause an occasional blip while the system trys to reduce the working set by running a the application through (a residual) page file that is way too small for the task. But I digress. The point being is that too much memory can be as harmful as not enough. Difficult to say without monitoring for this activity.

    regards,
    Kirk

    kirk[at]javaperformancetuning.com
  60. Reply to Kirk on GC[ Go to top ]

    In reply to Kirk's post, quoted here:

    I see that on the webpage at the bottom there are a number of configurations. My comments on them. First, the heap size seems large for this type of application. There is no mention of GC settings so one has to assume default GC settings were used. The default GC (IIRC) is Mark and Sweep without compaction. The effects of this large of heap on the default IBM GC could be the excessive paging of memory as the GC performs it's mark and sweep. I see that the benchmarking machines all contain 4GB of memory or greater. That said it's does not mean that Java heaps stay resident. That question would be answered by the paging and scan rates in Linux (similar stat exposed in perfmon). For example, windows reduced an applications working set by forcing the application to swap. You can get an interesting performance boost in XP by turning off swap altogether but that does cause an occasional blip while the system trys to reduce the working set by running a the application through (a residual) page file that is way too small for the task. But I digress. The point being is that too much memory can be as harmful as not enough. Difficult to say without monitoring for this activity.regards,Kirkkirk[at]javaperformancetuning.com

    Good observations. In response, yes you can assume we ran with default GC options with WebSphere. We did, however, test with compaction and the suggested options IBM highlights in their extensive (and good) RedBook on tuning WebSphere. We found no difference with compaction, although varying the heap size did have some impact as detailed in the report. Agree that you likely do not want to run with too big a heap size, its more expensive to GC more memory. Our testing was done to find an optimal heap size, and as long as it was large enough (too small really kills perf), in this scenario there was not a large impact in increasing it. One reason, I think, feel free to correct me if I am wrong, is that as a Web app, all objects are very, very short lived (duration of page, basically, except for cached EJB instances in the EJB implementation), so they get collected regularly in the less expensive ongoing sweep operations. For an app that is not 100% web based, GC tuning would likely be more important. I am just reporting what we found in this scenario, and do not hold myself out to be an expert here.

    You will notice that in most JAppServer results, however, that IBM runs with the largest heap they can on a 32-bit platform ~2Gig (I don't think they have results for 64-bit platforms yet, where the heap size is essentially unlimited). At any rate, we did spend time following IBM's best practice RedBook and testing their different recommended options, and the published results were the best we got.

    One interesting point here, is that with .NET you do not need to spend time tuning a heap size.....this can be quite a time-consuming thing in J2EE, and the optimal heap size is very often work-load dependent. Change the workload, you may need to re-tune. In .NET, this is not required.

    The one place this had a bigger impact was not the Plants Benchmark, but actually the 64-bit caching scenario (second paper we published in tandem.) One reason is that WebSphere cache works by setting the max number of entries allowed in the cache, while .NET works by setting the max memory the cache (or .NET CLR process itself) is allowed to use. Setting a cache size based on *number* of entries does not really make sense...if you set to high, you consume the entire heap and websphere not only throws out-of-memory exceptions, it actually crashes/heap dumps. This leaves you guessing at the mix of objects and relative sizes that may be added to the cache (or spending a lot of time monitoring your cache which is not easy with WebSphere)...and in the end you never really can know what the mix will be or the relative sizes...its all guesswork. Much better, I think, to set the max size in bytes a cache can grow to before it starts using its eviction/LRU mechanisms to keep the size within that limit. For the 64-bit tests, it was pretty easy to guess for websphere what the max entries should be (some trial and error still required) since we were caching a single object always of the same size. In the real world, it seems to me this would be impossible to guess.

    -Greg
  61. Tuning details[ Go to top ]

    All tuning details are in the papers, which are available in HTML and as PDFs on

    http://msdn.microsoft.com/vstudio/java/compare

    and for Plants, specifically at: http://msdn.microsoft.com/vstudio/java/compare/appserver/default.aspx

    See "Apendix 2: Tuning" for all the details in terms of WAS settings, heap sizes, threads, logging, ibm http server settings, Linux tuning, Windows tuning.

    In terms of benchmarking, I think we actually agree on a lot, but have a slightly different philosophical take on what is useful and what is not. Any benchmark is a simplification and is testing one specific scenario, and the results can *never* be blindly applied to all scenarios. JAppServer, TPC-C, PlantsByWebSphere, whatever. You can only conclude that for that precise scenario, on that equipment, with that tuning applied, these were the results. And you can only believe those results if there is full disclosure. however, we attempt with "solutions-based" benchmarking to create a tested scenario that re-creates a broader set of real-world customer scenarios. For Plants, that is a data-driven web application, of which customers are building *plenty*.

    The benchmark shows that for typical DB operations in such an application (queries, inserts, updates, etc with browser-based display via a web processing engine and http stack for receipt of requests and delivery of content) what the relative performance is for the exact same *functional* app coded in various ways against commonly deployed "stacks of technologies":

    .NET on SQL
    .NET on Oracle
    WebSphere/Windows EJB on Oracle
    WebSphere/Windows JDBC on Oracle
    WebSphere/Windows JDBC on SQL
    WebSphere/Linux EJB on Oracle
    WebSphere/Linux JDBC on Oracle
    WebSphere/Linux JDBC on SQL

    Interesting data, considering there is nothing funky going on in the app and these same techniques are being used in maybe 99% of all data-driven web apps out there. Now, the benchmark says *nothing* about scenarios that involve messaging, and *nothing* about scenarios that involve Web Services, etc. There are other benchmarks for these scenarios, or customers can download the code and add messaging and/or web services or whatever to the Plants app and re-bench for .NET and WebSphere. But regardless, these database operations will still be a huge portion of any data-driven web app.

    So the basline it establishes is useful. Interestingly, its primarily useful in my opinion becuase of the database access. If you take away the database access in the app, then you are not testing something customers actually do---you are not testing an end-to-end scenario using the complete stack that customers deploy and must work with. In all likelyhood, the results without database access would start to all look alike. Yet when a customer hooks their app to an actual database and deploys it, they would get dramatically different performance as the data shows depending on their choice of platform (.NET vs. WebSphere), their choice of DB (SQL, Oracle), and choice of architecture (JDBC vs. EJB).

    That's the useful info, becuase no customer will ever deploy without a backend database. In the end, a lot of the difference in performance is due to the efficiency of the middle tier wrt to the database driver it uses (niether Oracle or SQL Server was a bottlenck in this test on purpose, its not a backend DB benchmark).

    One reason .NET is so fast on SQL Server vs. all other implementations is becuase we were able to create a highly integrated, optimized .NET driver for SQL Server that ships with .NET. While we have a good .NET provider for Oracle, it is not as fast (although still faster than EJB access from WebSphere for this scenario and still basically as fast as JDBC access to Oracle in this scneario), becuase we don't own the api to Oracle databases, and indeed we must go through Oracle's OCI layer/DLL. Its easier to optimize when you control the entire stack. So it would be interesting to see Plants/WebSphere going against DB/2, or Plants on Oracle app server going against Oracle. Who knows?

    On the other hand, the implementations tested represent what we think (based on research) are very common comparative scenarios in terms of customer deployments: .NET on Oracle or SQL vs. WebSphere on Oracle or SQL. They are not the only ones, doing an exhaustive matrix of all app server/db combos is out of scope---but with code published anyone can do this on their own if they want.


    -Greg
  62. Tuning details[ Go to top ]

    If you take away the database access in the app, then you are not testing something customers actually do---you are not testing an end-to-end scenario using the complete stack that customers deploy and must work with. In all likelyhood, the results without database access would start to all look alike.

    Maybe the would and maybe they wouldn't. The point is, what is the question that you are trying to answer? In this instance testing and end to end appliation is not so useful because as you yourself have said, the results on only valid for the very specific set of circumstances that you run the benchmark under.

    So using your argument here, you would say that there is 0% difference between the performance profile of .NET and Websphere. This leaves the difference in numbers between the .NET implementation and the Java implementation of the benchmark down to.... the database? or how .NET or WAS interact with the database? So are you saying that .NET has more efficent interactions with the database then WAS does? Now I'm confused as to what this benchmark is telling me and how I can apply that information in my decision making process.

    I liken this to network benchmarking. Do you include the time the systems external to the network take to process the data into account when running a networking benchmark? The answer is of course not. That lies outside of the scope of the question. It also provides me useful information that I can use for downstream decisions and it tells me helps me diagnose future problems with peformance in my overall applications. Of course people don't deploy networks with nothing on either end of them. But that is not the point, to deploy a full application to test a network. I would use the same arguement to say that if you are trying to say that .NET out performs WAS then IMHO, you need to eliminate the database from the picture.


    Yet when a customer hooks their app to an actual database and deploys it, they would get dramatically different performance as the data shows depending on their choice of platform (.NET vs. WebSphere), their choice of DB (SQL, Oracle), and choice of architecture (JDBC vs. EJB).
    If I understand how .NET performs and I understand how my database will hold up, then the only unknown is how .NET interacts with my database. And that is a seperate question which is much easier to answer if I know the answers to the firt two questions.

    In the end, a lot of the difference in performance is due to the efficiency of the middle tier wrt to the database driver it uses (niether Oracle or SQL Server was a bottlenck in this test on purpose

    I don't want to dispute your assertion that the DB was not the bottleneck. I'm sure you did a good job of making sure that it wasn't. Point is, if the question concerns the driver, then it should be treated as the network was in the benchmark. Simple question for simple answer in a much more controlled environment. Benchmarks that try to go farther are much less useful IMO.

    regards,
    Kirk

    kirk[at]javaperformancetuning.com
  63. jndi lookup caching[ Go to top ]

    DAL object is created for ANY request...

    Thanx a lot for posting "Util" class in unreadable format...
    Anyway... The use Hashtable to cache a DataSource for one possible datasource in this test.
    So ALL concurent requests should pass throuth this synchronized part... Again I think lookup method use something like Hashmap already. The good practice is to cache datasource inside a servlet itself in init method, I think (or at least inside "DAL" class initialized in init()).

    Next thing I'd like to see is using Blob.getBinaryStream() and passing it directly to the servlet output stream (using BufferedStream of course).

    Sure there are more things we can find
    (damn! I just don't get why we should use so much code for the task that needs ~30 lines!).

    Cheers
  64. Tuning details[ Go to top ]

    Kirk,

    I think we just disagree here. End-to-end benchmarks, like JAppServer, TPC-C and many others are common place and useful for a reason. When I buy a sports car and one of my criteria is speed, as a customer I want to know if one car is faster than the other when I drive it; car customers don't typically care if the fuel injector is faster in one car (or low-level part x,y or z) is faster in car one if car two does zero to 60 3 times as fast at the end. The makers of the car care so they can concentrate on improvements. Customers typically care about the full product. Its little comfort to know that garbage collection is better in one App Server than another if, in a full deployed scenario (with web access, data access, transactions, session state, etc), their apps are going to run slower on that app server vs. the other.

    Both solutions-based benchmarks and micro benchmarks have a place. I think for customers, who build solutions, typically are more interested in what the performance of their final deployed solution will be.

    As for this benchmark, yes, in part its about how fast each middle tier interacts with the database, since queries are executed on each page. The other two areas that affect performance in this solution are the JSP/ASPX processing engines, and the network/web stack of each product and how fast it can respond to incoming web requests and deliver content back up to the clients. So its not just data access, but it would be pretty simple to take one page in the app, say the search page, modify the code/test and start to test/compare these isolated areas.

    -Greg
  65. From a service perspective[ Go to top ]

    I think we just disagree here. End-to-end benchmarks, like JAppServer, TPC-C and many others are common place and useful for a reason. When I buy a sports car and one of my criteria is speed, as a customer I want to know if one car is faster than the other when I drive it; car customers don't typically care if the fuel injector is faster in one car (or low-level part x,y or z) is faster in car one if car two does zero to 60 3 times as fast at the end. The makers of the car care so they can concentrate on improvements. Customers typically care about the full product. Its little comfort to know that garbage collection is better in one App Server than another if, in a full deployed scenario (with web access, data access, transactions, session state, etc), their apps are going to run slower on that app server vs. the other.

    I'm being nit-picky here, but I think the use of the term "app-server" is a bit liberal. An app server for small-to-mid size business have one set of requirements. An app server built for large institutions with complex integration requirements have a completely different set of requirements. I draw a bad analogy, it's like comparing a BUS to a 2-door coupe.

    If all you need to do is get 2 people from point A to point B, then sure use a 2-door. On the otherhand, if you need to get 35 people from point A to B to C to D to E to A, a 2-door isn't going to meet the needs.

    Is the point of the benchmark to prove WAS is a poor fit for a simple E-Commerce site like plants? If that's the case, it's obvious to a decent developer. Is microsoft saying .NET 2.0 can now handle the same type of applications that Websphere targets in the large enterprise space? The benchmarks are useful, but one should be careful extrapolating what those benchmarks mean and how they should be applied to the "real-world".

    my bias 2 cents

    peter
  66. Tuning details[ Go to top ]

    I drive it; car customers don't typically care if the fuel injector is faster in one car (or low-level part x,y or z) is faster in car one if car two does zero to 60 3 times as fast at the end.
    The problem with this analogy is that you are equating the end consumer of the product to those that are designing and building it. So while I or you may not care about how effecent the fuel injector is, the person designing the engine sure does. Maybe the car designers don't care about the fuel injector either and in that case would consider that information to be more of a microbenchmark. What they are interested in the specs on the engine component. Although this information maybe interesting for end users, it is vital for designers.
    Its little comfort to know that garbage collection is better in one App Server than another
    Again I would consider this more of a microbenchmark.
    I think for customers, who build solutions, typically are more interested in what the performance of their final deployed solution will be.
    True and the way we as designers ensure that is to use the components that support the needs and in order to make informed decisions we need component level information.
    As for this benchmark, yes, in part its about how fast each middle tier interacts with the database, since queries are executed on each page. The other two areas that affect performance in this solution are the JSP/ASPX processing engines, and the network/web stack of each product and how fast it can respond to incoming web requests and deliver content back up to the clients. So its not just data access, but it would be pretty simple to take one page in the app, say the search page, modify the code/test and start to test/compare these isolated areas.-Greg
    So how can anyone possibly know which elements of this stack addded or detracted from the overall performance profile of this benchmark? When can start with your suggestions (from previous postings) that the .NET and WAS environment offer the same performance profile. We can guess that the database performance is the same... or is it? You claim that the difference is in the drivers yet there is no direct evidence offered by this benchmark to back that claim. Sure people can download the benchmark and run it for themselves but how is this going to help them make an informed choice without having to eat the entire stack!

    The discussion has been fun but in the end, this is benchmark hubris. Good luck with it.

    regards,
    Kirk

    kirk[at]javaperformancetuning.com
  67. Tuning details[ Go to top ]

    Kirk,

    Yes, I agree that engineers building products (WebSphere, .NET, etc) are interested in isolating the perf differences so they can focus on where to improve, I said this and I think we agree on this.

    As for customers having to 'eat the entire stack'---that's what products are---a packaged set of technologies put together so customers do not have to assemble and integrate for themselves. Customers largely want to know the performance of the entire stack in a deployed scenario, and care much less about the micro details. Customers could try to use WebSphere for a data-driven Web app without using WebSphere's JSP engine, or WebSphere's data binding technology, or WebSphere's network stack....but I am not even sure how they could accomplish this technically (or why they would want to try). The app, as tested, is a "WebSphere data-driven Web app" and all the technologies in the stack tested are those WebSphere customers use everyday when deploying data-driven web apps on the IBM app server.

    -Greg
  68. Tuning details[ Go to top ]

    Damian,

    The arguments and ongoing debate around what data access technology to use when in the Java world are pretty well documents. A huge % of the debates on ServerSide are on this very topic, and as you point out books have been published on the subject. Some Java developers believe heavily in EJBs, some use them in targeted ways (likely the best approach), some avoid them at all costs. The benchmark is not about telling J2 architects how to use mixed approaches or which approach is better in every situation.

    The benchmark is simply data, take it or leave it (it is very accurate, however, for the scenario tested), on the relative performance of an all EJB implementation (as IBM publishes the app) vs. an all JDBC implementation for the IBM sample app PlantsByWebSphere. Really, if a Java developer had done this and there were no .NET results in the benchmark, it would not even be controversial, and many developers/architects would likely even find it useful.

    If when to use EJBs vs. JDBC vs. other Java data-access frameworks is obvious to you, congrats. The same is not true of all Java developers, especially those new to the technology.

    As for MS doing benchmarks, its always fine when people slam MS as M$ (even though we cost a fraction of WebSphere, BEA); its always fine when folks on this forum or JavaLobby slam MS as not scalable, or IBM/BEA/SUN does a benchmark ripping MS performance. Grow up and stop seeing the world with rose-colored glasses. We spend a huge amount of time on performance and we have every right and even a responsibility to showcase this. Customers are free to ingore the message, or consider it and maybe even download the fully disclosed code and learn from it. It's their choice. But don't get mad about the message, we are the ones disclosing all the code and testing details, and the app is based on IBM's own guidance/sample app they ship in the box.

    -Greg
  69. Tuning details[ Go to top ]

    Also Kirk,

    Obviously you have a deep understanding of Java, tuning and performance issues in general. If you are interested in understanding where .NET is getting its performance advantage in this scenario, you will find, as I have said in previous posts, three core areas that happen to be core areas for many, many deployed apps:

    1) Data access/connectivity. Understanding that for straightforward DB operations like those in most data-driven web apps, there is a big difference between EJB performance and JDBC performance. And yes, MS has some a very scalable data access technology in ADO.NET that works great with Oracle and SQL Server (and there are drivers from MS and IBM for ADO.NET on DB/2 as well).

    2) Web processing engine-- JSP vs. ASPX. ASPX is very, very fast. That, I think, is just one advantage, productivity and underlying architecture are two others. That's a different topic.

    3) Efficiency of the ASP.NET/IIS Web stack vs. IBM's plugin model between Apache (IBM HTTP Server) and the backend WebSphere app server. Our stack is very integrated, and handles very heavy loads very efficiently.

    You may not be interested, but others may be more open. If you wanted to isolate the impact of each of these three areas on the overall Plants application, its easy to do with our published code.

    a) Create a routine that just calls into the Data Access Layer of each implementation and run this on the server. Time various operations (inserts, updates, single record selects, multi-record selects or a mix of all) and time 10,000 operations or so. No jsp/aspx engine involved, no web stack involved. You now have an isolation of the data access speeds for the queries tested.

    b) Take a page in the app and strip out the data access logic call, and instead generate dummy data from the Data Access Layer. Run it through a web test tool like Mercury or whatever, and you now have a test of the JSP/ASPX engines in combination with the web stack of each platform. (Its extremely hard to separate the web stack from the JSP/ASPX engines, since they are used and integrated for web scenarios).

    c) Create a blank aspx or JSP page that does little or nothing, and run this through Mercury or whatever, and it will give you a decent idea of the relative perf of the web stacks alone and how efficient at handling large concurrent user loads, receiving http requests, calling into the app server layer, and then sending results via HTTP back to the clients.

    -Greg
  70. It's so fast...[ Go to top ]

    It so fast that in all the JDBC VS .NET benchmarks (NO SQLSERVER) you did it came out slower that than Websphere, on of the slowest appservers around ?
  71. It's so fast...[ Go to top ]

    1) On Oracle the .NET implementation was 75% faster than the fastest WebSphere/EJB implementation running against the same database. IBM created the EJB implementation, not MS.

    2) On Oracle, the .NET implementation was 15% slower than the faster, WebSphere JDBC-only (no EJBs) version on Oracle. MS created the JDBC implementation, although it uses the same UI and business logic layers as the IBM/EJB implementation.

    3) The .NET/SQL Server 2005 implementation was 183% faster than the fastest WebSphere/EJB implementation tested (against Oracle).

    4) The .NET/SQL Server 2005 implementation was 38% faster than the fastest WebSphere implementation tested (WebSPhere/Linux against an Oracle 10G backend).

    5) All .NET implementations beat WebSphere 6.0 by a wide margin in price/performance for this benchmark (data-driven web app)...even considering we used the lowest priced edition of WebSphere in the pricing calculation (Express). License costs of middle tier (supported OS + app server) were what was compared...not DB license costs (excluded, but those that wish to compare SQL pricing to Oracle can do so on their own).

    Total license costs (including support) of WebSphere + 1 dev seat for Rational App Developer (using Express edition pricing, and current IBM promo pricing for Rational App Developer): ~$12,000.00.

    Total license costs for .NET/Windows Server + 1 dev seat for Visual Studio: ~$6,000.00

    (exact pricing and breakdown/sources in the doc).

    6) As with any benchmark, its a datapoint. The data can be interpreted however customers choose. We encourage customers to test for themselves--that's the only way to make the most informed decisions. They can download the kit, modify it, or do their own independent tests on WebSphere or other app servers and compare to .NET 2.0. The data we published, for the scenario tested, is very accurate, and we provided full disclsoure (code, test scripts, data load, tuning details, etc.).

    7) As for WebSphere being one of the slowest app servers around, the benchmark makes no claim as to what this looks like against other app servers. You may be right. Its a .NET 2.0 vs. WebSphere benchmark; not a .NET 2.0 vs. all of Java-app servers benchmark.

    -Greg
  72. Tuning details[ Go to top ]

    You may not be interested, but others may be more open.

    You seemed to have listened to your critics in the past so maybe I've had a positive effect on your next benchmarking effort. So, please just take this as a peer review of this effort.

    regards,
    Kirk

    kirk[at]javaperformancetuning.com
  73. Tuning details[ Go to top ]

    Also Kirk,
    Obviously you have a deep understanding of Java, tuning and performance issues in general. If you are interested in understanding where .NET is getting its performance advantage in this scenario, you will find, as I have said in previous posts, three core areas that happen to be core areas for many, many deployed apps:
    1) Data access/connectivity. Understanding that for straightforward DB operations like those in most data-driven web apps, there is a big difference between EJB performance and JDBC performance. And yes, MS has some a very scalable data access technology in ADO.NET that works great with Oracle and SQL Server (and there are drivers from MS and IBM for ADO.NET on DB/2 as well).
    2) Web processing engine-- JSP vs. ASPX. ASPX is very, very fast. That, I think, is just one advantage, productivity and underlying architecture are two others. That's a different topic.
    3) Efficiency of the ASP.NET/IIS Web stack vs. IBM's plugin model between Apache (IBM HTTP Server) and the backend WebSphere app server. Our stack is very integrated, and handles very heavy loads very efficiently.

    ...................

    -Greg
    I don't know about others, but I would like some further clarification of what "heavy loads" means in specific terms. I'm asking because I've worked on financial applications and tried using .NET 1.0/1.1 for it. My first hand experience is that I had to build a lot of infrastructure myself, since they were missing in Microsoft's stack. Rather than talk abstractly, I've give a concrete example. Part of this particular application has data entering through several different channels ranging from direct ADO connection, OMS system, messaging and compliance system. I wanted to be able to send a message if a particular table had inserts/updates using triggers. In java, I would easily do this with java triggers. Yukon now has these features, but back in 2003-2004, the .NET option was either a polling mechanism or write the events to file. We did some testing and both of those techniques were not appropriate. We eventually had to design around these limitations by building our own enterprise messaging BUS and required all communications go through it. That was a huge task and ultimately failed because no developer on the project had experience building an enterprise class messaging BUS. Had the executes chosen Java instead of .NET at the start of the project, we would have saved 2 years of work.

    This application also has some major concurrency needs, so we went about creating 5 dozen benchmarks to measure performance as concurrency increases. My own findings showed that beyond 200 concurrent queries, SqlServer2K running on a quad CPU system quickly hit a bottleneck. We then tested SqlXML, since one of the developers said "it's what Microsoft recommends." I don't know where he heard that, but we had to measure the performance. What I found is that with 6-8 concurrent sqlxml queries, the CPU was maxed out at 100%. It was 50-100x slower than plain ADO.net. Once we presented the results, the Project manager declared "we are not using SqlXml, so everyone has to change their code."

    Many of these types of things "just work" with existing J2EE products. The same is not true of the .NET 1.1 stack. I completely agree that simple data driven applications are much faster in .NET, but for heavy weight applications that have large concurrency needs, .NET 1.1 is not well suited. Not only did I have to find work around for the limitations we found, but we had to constantly fight developers who didn't want to stray from Microsoft stack or the prescribed approach. Since what I work on is heavy weight with all sorts of extreme requirements, the benchmark results are not valid. Weblogic and Websphere are specifically designed and tuned for heavy weight financial applications.

    I would argue it isn't possible to make a server that is optimal for the simplest case like a data driven site and optimial for heavy financial applications. I would love to be proven wrong and hope microsoft can further the state of art for heavy weight applications.

    peter lin
  74. Tuning details[ Go to top ]

    Peter, re:
    I don't know about others, but I would like some further clarification of what "heavy loads" means in specific terms. I'm asking because I've worked on financial applications and tried using .NET 1.0/1.1 for it. My first hand experience is that I had to build a lot of infrastructure myself, since they were missing in Microsoft's stack. Rather than talk abstractly, I've give a concrete example. Part of this particular application has data entering through several different channels ranging from direct ADO connection, OMS system, messaging and compliance system. I wanted to be able to send a message if a particular table had inserts/updates using triggers. In java, I would easily do this with java triggers. Yukon now has these features, but back in 2003-2004, the .NET option was either a polling mechanism or write the events to file. We did some testing and both of those techniques were not appropriate. We eventually had to design around these limitations by building our own enterprise messaging BUS and required all communications go through it. That was a huge task and ultimately failed because no developer on the project had experience building an enterprise class messaging BUS. Had the executes chosen Java instead of .NET at the start of the project, we would have saved 2 years of work.

    This application also has some major concurrency needs, so we went about creating 5 dozen benchmarks to measure performance as concurrency increases. My own findings showed that beyond 200 concurrent queries, SqlServer2K running on a quad CPU system quickly hit a bottleneck. We then tested SqlXML, since one of the developers said "it's what Microsoft recommends." I don't know where he heard that, but we had to measure the performance. What I found is that with 6-8 concurrent sqlxml queries, the CPU was maxed out at 100%. It was 50-100x slower than plain ADO.net. Once we presented the results, the Project manager declared "we are not using SqlXml, so everyone has to change their code."

    A couple of things here. I'll assume you used the .NET provider with SQL Server, and were not going through the OLEDB/.NET driver (which has always had big perf issues). We have had good success (and growing) in large scale transaction-heavy scenarios, but proper architecture is always key, no matter the technology. Over the past 2 years we have spent a great deal of time on creating architectural guidance via our Prescriptive Architecture Group (PAG). This group publishes the Enterprise Patterns for .NET on MSDN, tuning/security guides and the like.

    Also, a lot changed between .NET 1.0, 1.1 and now .NET 2.0 (as well as SQL 2K and SQL 2005). I would love to run a benchmark for a scenario you deem high-volume/transacted. Give me a functional specification and I will invest the time in this. As with any benchmark, it will be a starting point, but very good for discussion and learning, and publishing techniques that work better than others.

    -Greg
  75. Tuning details[ Go to top ]

    Peter, re:
    I don't know about others, but I would like some further clarification of what "heavy loads" means in specific terms. I'm asking because I've worked on financial applications and tried using .NET 1.0/1.1 for it. My first hand experience is that I had to build a lot of infrastructure myself, since they were missing in Microsoft's stack. Rather than talk abstractly, I've give a concrete example. Part of this particular application has data entering through several different channels ranging from direct ADO connection, OMS system, messaging and compliance system. I wanted to be able to send a message if a particular table had inserts/updates using triggers. In java, I would easily do this with java triggers. Yukon now has these features, but back in 2003-2004, the .NET option was either a polling mechanism or write the events to file. We did some testing and both of those techniques were not appropriate. We eventually had to design around these limitations by building our own enterprise messaging BUS and required all communications go through it. That was a huge task and ultimately failed because no developer on the project had experience building an enterprise class messaging BUS. Had the executes chosen Java instead of .NET at the start of the project, we would have saved 2 years of work.This application also has some major concurrency needs, so we went about creating 5 dozen benchmarks to measure performance as concurrency increases. My own findings showed that beyond 200 concurrent queries, SqlServer2K running on a quad CPU system quickly hit a bottleneck. We then tested SqlXML, since one of the developers said "it's what Microsoft recommends." I don't know where he heard that, but we had to measure the performance. What I found is that with 6-8 concurrent sqlxml queries, the CPU was maxed out at 100%. It was 50-100x slower than plain ADO.net. Once we presented the results, the Project manager declared "we are not using SqlXml, so everyone has to change their code."

    A couple of things here. I'll assume you used the .NET provider with SQL Server, and were not going through the OLEDB/.NET driver (which has always had big perf issues). We have had good success (and growing) in large scale transaction-heavy scenarios, but proper architecture is always key, no matter the technology. Over the past 2 years we have spent a great deal of time on creating architectural guidance via our Prescriptive Architecture Group (PAG). This group publishes the Enterprise Patterns for .NET on MSDN, tuning/security guides and the like.

    Also, a lot changed between .NET 1.0, 1.1 and now .NET 2.0 (as well as SQL 2K and SQL 2005). I would love to run a benchmark for a scenario you deem high-volume/transacted. Give me a functional specification and I will invest the time in this. As with any benchmark, it will be a starting point, but very good for discussion and learning, and publishing techniques that work better than others.-Greg

    I had several performance requirements given to me, and most of them were works of fiction not grounded in reality. What I consider moderate to heavy concurrent load would be 500-1000 queries. Given that SqlServer 2K has a limit of 250 connections, it's obvious that one has to use COM+ has a middle layer. Although COM+ could have worked, we needed a way to make sure data changes are propogated across the cluster. That means building our own event notification mechanism on top of COM+ to do this. Again, that lead to the need for an Enterprise Message Bus.

    Heavy requirement to me is 1000+ concurrent queries. Clearly, SqlServer 2K can't handle that. I haven't kept up with the latest features in Yukon, so perhaps that has changed and SqlServer 2005 can handle this kind of load. These kinds of loads are very common for mid-sized banks. The larger banks obviously have even bigger requirements. I suppose one could support these kinds of loads on a HP Superdome, assuming SqlServer can handle more than 250 concurrent connections. In terms of transaction rates, it's common for mid-sized banks to handle 5-10K transactions/second during the busy hours and half that on the slow hours. For really big banks, I've heard some crazy loads.

    What would be interesting would be to see the results of the Plants example with a few changes:

    1. store images as blobs
    2. cache the images on the mid-tier
    3. log every page view in an audit log
    4. run it over SSL
    5. if the audit log fails, the application should show an error and not allow the user to do anything
    6. use pragma no-cache for all pages and prevent the browser from caching

    That would be the minimum requirement for someone building a banking application. A system that doesn't perform well with these basic requirements honestly would not be considered viable. I look forward to playing with the new features of VS2005.

    peter
  76. Tuning details[ Go to top ]

    Peter,

    I like the scenario you highlight, I will try to get this in place for the near future, so stay tuned.

    One note on your comments, however. Actually SQL 2000 has a concurrent user connection limit of 32,768 I believe, not 250.

    see: http://support.microsoft.com/kb/320728
    which I quote here:
    SUMMARY
    In SQL Server 7.0 and SQL Server 2000, administrators can use the sp_configure stored procedure to modify configuration settings. One of the settings that you can modify is the user connections option. When you install SQL Server, the default value for user connections is 0 (32767 concurrent connections). Microsoft recommends that you do not change the default user connections setting.

    Individual connection pools, however, are limited (I believe) to 150 connections per pool. This is typically plenty for most server-based workloads, unless queries are very long running. But you can always create multiple pools. To create multitple ADO.NET connection pools you simply create different connection strings, varying some element in the string (different authentication parameters, for etc., or simply reverse the login/password fields so the connection strings are physically different), this creates a second pool.

    Plants, for example, was supporting about 2500 SQL Server 2005 queries per second on our 4-proc SQL Server box for the benchmark, at about 30% CPU load. However, these queries are very simple, and there is no distributed transaction involved (although there are transactions on order inserts). To support a user load of 3,000 concurrent users, with 2500 queries per second (mix of updates, inserts, selects), we used 75 connections in the pool, with just a single pool.

    Also, COM+ is not necessary to achieve server-side connection pooling; this is just part of ADO.NET ( including our providers for Oracle, SQL Server, etc). Database connection pooling is not a COM+ feature.

    Basically today, while you can create COM+ components from .NET assemblies (components that inherit from ServicedComponent and have a GUID), this is now typically for interop purposes only with VB6, VC++, older apps. It definately slows things down. With .NET 2.0, we have a new System.Transaction namespace which allows you to fully utilize the distributed transaction coordinator (including XA support for hetergeneous distrbuted txs) without creating a COM+ component out of your .NET classes. This improves performance, and also makes deployment easier.

    On SQL Server 2005, System.Transactions is smart enough to promote a distributed tx to a local tx if the two databases are co-located on the same physical SQL box. But even here, it's best not to mark methods as transacted if they do not need to be---only invoke the tx manager for actual tx work (single queries, for example, do not need to invoke the TX coordinator). Also, for simpler, non-distributed transaction scenarios, System.Transactions is not needed, you can simply use ADO.NET transactions that work a lot like JDBC transactions in code. Sometimes for architectural reasons, however, it is useful to use System.Transactions since this flows transaction context through called methods automatically.

    -Greg
  77. Re: Luis benchmark[ Go to top ]

    I have Resin numbers and ASPX numbers generated from your posted benchmark code using LoadRunner, after playing with the Resin tuning a bit and following their tuning guide, trying differnt heap sizes, thread settings, keepalives and the like. I used the latest JRockit 5.0.0_3 as you suggested, it does better than Sun JDK 1.5 release 5.

    In these runs, the benchmark tools captures all returned errors, so you know for sure whether you are getting valid content back and how many clients either have errors or connection refused errors.

    I ran first with no think time and 20 simulated users (spread across 10 physical machines) and no connection resetting between requests (each client gets a connection and continues to use it through duration of test). This is basically how you ran your first test using the freebie MS ACT load-driver program (I admit I am not very familiar it).

    Then I ran to simulate more real-world conditions, with 20 physical client machines, and a 1-second think time between requests. In this scenario (like Plants), 1 user equates to 1 transaction per second until a bottleneck is reached and the system is saturated. So it takes >1,000 concurrent users to saturate the systems at least, sometimes many more. In this scenario, each client runs an iteration of 5 pages, then exits and a new client comes in on a new connection (simulating users than visit 5 pages in a visit, waiting 1 second between each request).

    But I am waiting to post the numbers until I can run with the non-trial version of Resin, since the trial version does not support JNI, which their FAQ says will make a performance difference wrt to handling keepalives better. It will take up to 2 days to get the license in email after purchasing a copy for $500.00. This may make a big difference in the numbers, it will be interesting to see.

    So I will publish the data somewhere on a personal public-hosted site on my own time for review and comment in the next couple of days, and post that info here on this thread at that time.

    One key things to keep in mind, as I pointed out, is that IIS/ASPX is running with full process isolation between the app and the web server (as long as you run IIS 6, this is always the case); Resin is not process isolated from the http server, its in one process which is fast but not as safe/fault tolerant. Also, Resin on Apache or IIS might be slower/lower tps on no-think-time tests than when running with an in-process http server, but it may be better in the 1 sec think time tests since these may do a better job at handling incoming traffic/connections from large numbers of simulated users, etc. Look for the results in next couple of days.

    -Greg

    PS: I also ran on 1CPU, 2CPU and 4CPU configs to see what the scaling was like.
  78. Re: Luis benchmark[ Go to top ]

    I have Resin numbers and ASPX numbers generated from your posted benchmark code using LoadRunner, after playing with the Resin tuning a bit and following their tuning guide, trying differnt heap sizes, thread settings, keepalives and the like. I used the latest JRockit 5.0.0_3 as you suggested, it does better than Sun JDK 1.5 release 5.

    That was what I expected, I keep earring that the x64 JVM from Sun has multi-threading issues. BEA sure has a wonderful product in their hands and the soon to be released JRockit update will no doubt improve things even further.

    I ran first with no think time and 20 simulated users (spread across 10 physical machines) and no connection resetting between requests (each client gets a connection and continues to use it through duration of test). This is basically how you ran your first test using the freebie MS ACT load-driver program (I admit I am not very familiar it).

    *sigh*... I will say it again, all my tests were run *with* connection resets at end of the request, look at the scripts, there is a "connection.close" in every one of them.


    One key things to keep in mind, as I pointed out, is that IIS/ASPX is running with full process isolation between the app and the web server (as long as you run IIS 6, this is always the case); Resin is not process isolated from the http server, its in one process which is fast but not as safe/fault tolerant.

    Not true. There are several ways to achieve this with Resin, read the docs... some solutions need more hardware and some don't.

    Also, Resin on Apache or IIS might be slower/lower tps on no-think-time tests than when running with an in-process http server, but it may be better in the 1 sec think time tests since these may do a better job at handling incoming traffic/connections from large numbers of simulated users, etc. Look for the results in next couple of days.

    Ok, I look forward for the results.

    Regards,
    Luis Neves
  79. Tuning details[ Go to top ]

    Peter,I like the scenario you highlight, I will try to get this in place for the near future, so stay tuned.One note on your comments, however. Actually SQL 2000 has a concurrent user connection limit of 32,768 I believe, not 250. see: http://support.microsoft.com/kb/320728which I quote here:
    SUMMARYIn SQL Server 7.0 and SQL Server 2000, administrators can use the sp_configure stored procedure to modify configuration settings. One of the settings that you can modify is the user connections option. When you install SQL Server, the default value for user connections is 0 (32767 concurrent connections). Microsoft recommends that you do not change the default user connections setting.

    Individual connection pools, however, are limited (I believe) to 150 connections per pool. This is typically plenty for most server-based workloads, unless queries are very long running. But you can always create multiple pools. To create multitple ADO.NET connection pools you simply create different connection strings, varying some element in the string (different authentication parameters, for etc., or simply reverse the login/password fields so the connection strings are physically different), this creates a second pool.

    Plants, for example, was supporting about 2500 SQL Server 2005 queries per second on our 4-proc SQL Server box for the benchmark, at about 30% CPU load. However, these queries are very simple, and there is no distributed transaction involved (although there are transactions on order inserts). To support a user load of 3,000 concurrent users, with 2500 queries per second (mix of updates, inserts, selects), we used 75 connections in the pool, with just a single pool.Also, COM+ is not necessary to achieve server-side connection pooling; this is just part of ADO.NET ( including our providers for Oracle, SQL Server, etc). Database connection pooling is not a COM+ feature. Basically today, while you can create COM+ components from .NET assemblies (components that inherit from ServicedComponent and have a GUID), this is now typically for interop purposes only with VB6, VC++, older apps. It definately slows things down. With .NET 2.0, we have a new System.Transaction namespace which allows you to fully utilize the distributed transaction coordinator (including XA support for hetergeneous distrbuted txs) without creating a COM+ component out of your .NET classes. This improves performance, and also makes deployment easier. On SQL Server 2005, System.Transactions is smart enough to promote a distributed tx to a local tx if the two databases are co-located on the same physical SQL box. But even here, it's best not to mark methods as transacted if they do not need to be---only invoke the tx manager for actual tx work (single queries, for example, do not need to invoke the TX coordinator). Also, for simpler, non-distributed transaction scenarios, System.Transactions is not needed, you can simply use ADO.NET transactions that work a lot like JDBC transactions in code. Sometimes for architectural reasons, however, it is useful to use System.Transactions since this flows transaction context through called methods automatically.

    -Greg

    thanks for correcting my error. I was thinking of connection pooling, which I thought was 250 max and not 150. I'll take your word on it, since you clearly know more.

    If someone really needed a driver that can handle more than 500 connection, they could always write their own connection manager wrapping odbc or oledb. Though I question the value and productivity of doing that. When Oracle 8i first came out, the JDBC driver didn't support connection pooling, so we wrote our own. In my bias opinion, it wasn't product time and we ended up replacing our pool manager when Oracle released new drivers.

    I've only peaked at the new transaction namespace in .NET 2.0 and I think that's definitely going to help.

    One area I think MS could improve on is providing better guidance for .NET developers. I had 3 developers tell me, "we should lock the record when the user clicks the drop down." Clearly, that is a terrible thing to do, and isn't Microsoft's fault. It took me several weeks to prove how bad that idea was. Better guidance would reduce the frequency of these kinds of developer issues.

    peter
  80. I followed your advice.[ Go to top ]

    2) Web processing engine-- JSP vs. ASPX. ASPX is very, very fast.

    No... ASPX is *not* very, very fast.
    b) Take a page in the app and strip out the data access logic call, and instead generate dummy data from the Data Access Layer. Run it through a web test tool like Mercury or whatever, and you now have a test of the JSP/ASPX engines in combination with the web stack of each platform

    Well I did just that... here are the code and the results.
    I used the "Application Center Test" tool that comes with Visual Studio 2003 as web test tool. You will need to use this application to open the Test project.


    Some notes:
    -The code used was mostly taken from the Wicket web framework source repository but translated to JSP.
    -The machine used is an old Athlon box under the desk, which is far from being an enterprise class server machine.
    -There is no database involved.
    -The servlet engine is Resin, the GPL version.
    -The JVM is JRockit.
    -The OS is Windows Server 2003 SP1
    -I did the same optimizations to Windows and IIS as the ones mentioned in this paper.
    -Of course this test is bogus, but I submit to you that it's not more bogus than the one presented by MS... after all, it's just a data point and the code is available so people can test for themselves.

    The results can also be seen in this pretty picture.

    The upper to lines are for the JRockit and Sun JVMs and the bottom two are for the ASP.NET 2.0.
    As you can see the performance of the fastest Java implementation is twice the performance of the slowest ASP.NET implementation.

    So let's play "make believe"... pretend that I'm an IBM customer, and I want to server web pages real fast.
    Why should I use .NET?... I can have twice as much performance essentially for free sticking with Java.

    Regards,
    Luis Neves
  81. I followed your advice.[ Go to top ]

    Luis, some notes:
    After looking at the code the only thing that can be learned from it is "How *not* do it".

    - Do not use string concatenation in loops (request.jsp, line 220).

    This is how IBM coded it.
    - Do not use the piss poor excuse for a Jdbc driver that MS offers as the official driver for SQL Server, use Jtds instead.

    We want to provide a better JDBC driver than what we provided in the past with SQL2K. That is why we working on this new one, still in beta though. If you have feedback on why this one sucks for you, be specific, we *really* do want to make it very good for Java developers.

    - If your method signature declares that it throws Exception, please refrain yourself of catching Exception and re throwing it.
    - If you don't know exactly what do to with caught exceptions and you if feel that logging is the "The right thing to do" please don't use System.out.println() to do it - Util.java, line 305 - because if for some reason you start to have Exceptions you will bring the server to its knees.

    The app throws exceptions to make sure the benchmarking software picks up on all exceptions. With just a catch and a printstacktrace, all you would get is a blank page and LoadRunner would track it as a completed transaction--leading to bogus results. For non-benchmark scenarios, of course you do not want 500 errors showing up in the browser, it would be handled differently. At any rate, as mentioned earlier no exceptions are being thrown during the benchmark timing runs, so has no impact on perf.
    - If you store images as Blobs in the DB, resist the urge to store them in File system for performance reasons *only*, there might very well be good reasons like to store them in the DB: privacy/security concerns, logic/relational integrity reasons our deployment issues.

    True, in some cases one approach is better than the other depending on requirements. Each is valid...in the benchmark all implementations are doing it the same way so treatment is consistent.
    - In Java, arrays are already bound checked, there is little reason to do it yourself, unless you *really* want to be sure - Util.java, line 164

    Again, this is how IBM coded it.

    - Do not use floats when dealing with currency - Util.java, line 175

    Again, how IBM coded it.



    -Greg
  82. I followed your advice.[ Go to top ]

    - Do not use the piss poor excuse for a Jdbc driver that MS offers as the official driver for SQL Server, use Jtds instead.
    We want to provide a better JDBC driver than what we provided in the past with SQL2K. That is why we working on this new one, still in beta though. If you have feedback on why this one sucks for you, be specific, we *really* do want to make it very good for Java developers.

    Thank you for working on the JDBC driver. I'm glad you're looking at things like IBM's DB2 folks do -- they like WebSphere or PHP, but they're perfectly fine with .Net as long as you're using DB2.

    I've tested it a bit and it's leaps and bounds better than the old one. However, this benchmark shows that you still have some work to do. The JDBC driver will have a hard time gaining traction if performance isn't comparable to the .Net provider.
  83. In your reponse to Luis :

    "
    After looking at the code the only thing that can be learned from it is "How *not* do it".

    - Do not use string concatenation in loops (request.jsp, line 220).

    This is how IBM coded it.

    ........ This is how IBM code... This is how IBM coded it...
    "

    Please ANSWER THIS SINGLE QUESTION :
       DID IBM TELL YOU THIS IS BENCHMARK WORTHY CODE !!!!!

    If so, please supply the document where IBM expressly says this is how we think apps should be implemented, and this is efficient code.

    - YOU (MS) chose this code, and chose to use it for a benchmark.

    Therefore, you can't point the finger at IBM. This RUINS YOUR Credibility !!! You Just don't get it !!!! It RUINS YOUR CREDIBILITY !!!!

    If you want to make a REAL POINT, do your home work. Once you do this, "you can score REAL points". You know It's funny, I bet you could do an excellent market JOB for MS in the J2EE community if you would just listen. If you really have a quality product I think you could pick off customers because cannonical J2EE usage is overly complex, but it's really sad, you guys keep putting out these really bad benchmark and articles, and it just makes you guys look dishonest. Its very unfortunate, because I believe the better .NET is... the better J2EE will be. Also, as an Application Architect, I am more interested in getting my JOB done than arguing about technology. I will use the best tool for the JOB.

    thanx
  84. I followed your advice.[ Go to top ]

    - In Java, arrays are already bound checked, there is little reason to do it yourself, unless you *really* want to be sure - Util.java, line 164
    Again, this is how IBM coded it.
    - Do not use floats when dealing with currency - Util.java, line 175
    Again, how IBM coded it.-Greg
    IBM coded it as a demonstration of different techniques not as a benchmark.

    regards,
    Kirk

    kirk[at]javaperformancetuning.com
  85. I followed your advice.[ Go to top ]

    Yet PlantsByWebSphere uses IBM's primary design pattern for data-driven Web apps:

    JSP/servlets....activating stateless session beans....front-ending entity beans using CMP 2.x.

    Really, its very straightfoward application that suits itself well to testing---and I think the JDBC numbers are quite interesting.

    With J2EE PetStore, people raised the same issue even though Sun themselves at the time were publishing technical architecture blueprint books on the app, and many customers were using it as a design pattern. On the other hand, at the time the issues were more complex:

    -Use of stored procedures in .NET app
    -Local vs. remote interfaces in EJBs
    -JNDI caching
    etc.

    Now, with Plants at least we are down to more mundane issues like:

    -string concatenation
    -use of floating point for currency


    -Greg
  86. I followed your advice.[ Go to top ]

    Now, with Plants at least we are down to more mundane issues like:-string concatenation-use of floating point for currency-Greg

    err, you are forgetting scale. The benchmark is too course grain to be meaningful. This was the same problem with PetStore aside from the fact that you ran it as two tier where as the J2EE implementation was three.

    regards,
    Kirk

    kirk[at]javaperformancetuning.com
  87. I followed your advice.[ Go to top ]

    On the other hand, at the time the issues were more complex:
    -Use of stored procedures in .NET app
    -Local vs. remote interfaces in EJBs-JNDI caching
    etc.
    Now, with Plants at least we are down to more mundane issues like:
    -string concatenation
    -use of floating point for currency

    This are not mundane things, the code hurts the eyes, and when I mentioned them it was not because of performance, the .NET code is in many places a direct translation of those silly things.
    I mentioned them in relation to the "you could learn something from it" comment.
    My position is that the code sucks and can only serve as an example of what not to do.
    Some of the things that I would make different would incur in a performance penalty.

    Regards,
    Luis Neves
  88. I followed your advice.[ Go to top ]

    Kirk,

    Actually, as Luis points out, these benchmarks are a good opportunity for learning experiences. We learned a lot with .NET PetShop--some of the most vocal heat we took was actually from within Microsoft--our consulting reps and many of our technical reps on accounts, as well as many in the Java community pointed out issues with our original PetShop 1.0 implementation. The last benchmark that was published on .NET PetShop vs. Websphere and one other commercial J2EE app server was with PetShop 3.0. In this implementation, we went to a fully abstracted logical 3-tier architecture, and we ran on both SQL Server and Oracle. The PetShop 3.0 application is a good one to illsutrate for enterprise .NET shops a basic design template for their own apps (many have used it thus).

    It is a fully abstracted 3-tier architecture, much like PlantsByWebSphere it uses model objects to represent data records. These are passed bwteen the UI tier (ASPX/HTML), the business logic tier (separate namespace, separate assembly), and the data tier. The data tier is loaded dynamically at runtime based on whether the config option specifies Oracle or SQL Server; there are no references to the database implementation in either the UI or the business logic tier. And beginning with .NET PetShop 2.0 we also stopped using stored procs completely, to equalize with the J2EE implementation. We still believe stored procs offer advantages in terms of security and manageability in many cases, and in some cases perf. For Petshop though, where all queries were very straightforward, it turned out preprepared SQL statements were just as fast.

    In those tests, completed in 2003, the PetShop 3.0 with a proper logical 3-tier architecture ran with similar perf results and still tested out fastest, even after the J2EE version was further optimized for local interfaces, better caching, etc. However, Middleware never created an all JDBC version which I actually wanted them to do, given the debate/feedback around EJBs and when they are appropriate and when they are not. The Plants benchmark is more interesting becuase we did create an all JDBC version with servlets as many in the Java community wanted to see.

    In terms of tiering, while the architecture of both the .NET implementation and the J2EE implementation was 3-tier; we did run the tests in a two-tier physical environment with the web app activating local .NET assemblies and for J2EE local EJBs with local interfaces. This will always perform fastest. An interesting extension to the plants benchmark (or other) would be to start remoting the back end components so not only would you have a logical 3-tier architecture, but you would also have a physical deployment architecture in 3 tiers. I believe many customers over use physical 3-tiers (even for intranet apps), but testing the relative performance of various remoting technologies would be very interesting becuase this is also a mainstream case for both .NET and J2EE:

    -RMI vs. .NET binary remoting
    -RMI and .NET binary remoting vs. XML/Web Services via SOAP.

    One of the things we are beginning to recommend for mainstream scenarios with .NET is to use SOAP whenever possible for remoting, since it gives you the interop with J2EE app servers, and we have spent a ton of time optimizing our SOAP serialization engine (see the Sun-created WSTest benchmark for details). Its also easier to do in .NET. At any rate, no benchmark paints the complete picture, you are right. However, testing courser grained "solutions" I still firmly beleive to be most interesting for customers. Not for the developers of the system software itself (although its extremely important for them too, since it more closely simulates real-world conditions with product integration), but for customers who deploy end-to-end solutions with a packaged product like WebSphere or .NET.

    Finer grained benchmarks are important too, especially when it comes to making architectural decisions within an app, or for the software vendor to isolate where they need to concentrate on performance (for example, at MS of course we micro-benchmark all the time, looking at our Web stack, our garbage collection, serialization engines and the like).

    As for Plants, I maintain that while you could build it 10 different ways in Java (and different ways in .NET as well), the approach taken is very mainstream, the code issues are minor, and the results are very accurate for the scenario tested. Again, agree you cannot claim the results apply outside of what was tested; on the other hand what was tested was a very mainstream type of app that customers, large and small, deploy all the time. And the underlying technologies as tested in the app are used in most more complex apps all the time as well.

    I am willing to admit when I am wrong (I am passionate about this stuff, just as many Java developers are), however. Hence why we publish our code, and took the time to create revs to the PetShop code to answer the feedback, and did a second round of testing. And hence why I am willing to run Luis's code for his micro-benchmark on Resin. Its actually fun and you learn a lot in the process.

    -Greg
  89. I followed your advice.[ Go to top ]

    Luis, re:
    This are not mundane things, the code hurts the eyes, and when I mentioned them it was not because of performance, the .NET code is in many places a direct translation of those silly things.I mentioned them in relation to the "you could learn something from it" comment.My position is that the code sucks and can only serve as an example of what not to do.Some of the things that I would make different would incur in a performance penalty.Regards,Luis Neves

    Make a comprehensive list of the changes you want to see in the WebSphere implementation, and I will have these changes made and re-publish new results. I don't think the results will be any different, but I am perfectly willing (even eager) to implement your changes and find out. So far I have:

    -Exception handling: just show me how you want excpetions handled, as long as the result is LoadRunner can catch 500 errors that's fine.

    -Changing all floating point to decimal types. What type specifically do you want used?

    -No string concats in loops. We will have this changed in all implementations.

    This is all good feedback, it will lead to a better codebase/sample app for both WebSphere and .NET.

    -Greg

    PS: Still running new Resin numbers for your benchmark with JRockit. Look for update later.
  90. I followed your advice.[ Go to top ]

    Still running new Resin numbers for your benchmark with JRockit. Look for update later.


    http://www.theserverside.com/news/thread.tss?thread_id=37516#190974


    .V
  91. I followed your advice.[ Go to top ]

    Hi Luis,

    This is actually a simple (as you point out) but interesting test in terms of isolating the jsp/aspx engines + the web stack--no data access! First, I really do not want to get into a personal battle, you have taken the time to look at the code, comment on, point out potential flaws, and even create your own benchmark. This is very cool and more than 99% of people ever bother to do--thank you.

    So, I have downloaded your code, and installed Resin on a 2 x 1.8 Ghz AMD Opteron machine. It's pretty cool.

    A couple of questions on stuff not in your disclosure:

    a) What web server did you run on? The built in http server Resin ships with (port 8080 by default), or other? Did you try with the plugin for Apache? Is the app server process isolated from the web server in the config you ran in, for proper fault tolerance?

    b) When you ran WebCat, what settings did you use? My guess (I may be wrong) is that you ran with no think time and no connection resets between requests so you could run with just a few client threads to keep it simple. Can you verify?

    On point A, you could get very different results using the Apache plugin for Resin and running with a production quality web server. I do not know technically whether their plugin approach with Apache maintains process isolation between the web server and the app server, I did verify their provided http server does not. This is important for high volume sites becuase most do not want to run their web server in a single process with their apps; an app can quickly bring down the whole web server. Its equivalent to running a web site with IBM WebSPhere going against port 9080 (built in http listener w/ websphere) vs. running WAS with a full production server like IBM HTTP Server (Apache). .NET 2.0/IIS runs process isolated by default (you can even create multiple app pools for isolating several different apps from eachother....but apps always run process-isolated from IIS/InetInfo.exe).

    IBM runs process isolated when running with IBM HTTP Server. Once you go to a process isloated model w/ a plugin you will likely find very different perf metrics. It tends to slow a lot of app servers down quite a bit becuase of the extra process hop----but typically necessary/desired for high-volume availability. This is one reason I state our net stack is very fast becuase of the integration we do between IIS/ASPX worker processes.

    On Point B)

    i. Its important to simulate a realistic-as-possible scenario in the lab. One key element here is making sure to simulate *new* users on each new request from the client drivers. Otherwise the client driver is just always reusing the same open conneciton, you are not testing the ability of the backend app server/web server to handle keepalives, new connections, closing connections etc as happens on a production site.

    ii. Its also important to run with a client think time. Too many benchmarks including some industry standard benchmarks do not require this. Hence, it takes only 20 or so clients (threads from the benchmark driver) to saturate a server and achieve peak throughput. The server is not running under high-concurrent user loads, and most app servers are much more efficient and fast with just a few clients than when they have to deal with several hundred concurrent users or so. By using a one second think time, like we did with Plants, you at least approach a more realistic test, with the app server having to handle many, many more concurrent clients to achieve peak throughput. For an example of how this impacts a test, read

    http://www.msdn.microsoft.com/vstudio/java/compare/webserviceperf/default.aspx


    (don't yell at me so much for this one, Sun wrote this benchmark originally and published their results--they even wrote the C# code which was fine; we were responded to the methodology they used and got **very** different results with the right methodology for a web-based test)

    At anyrate, I do not know:

    -what web server you used

    -the process-isolation level between the web server and the app server

    -whether you ran with connection resets to simulate new users establishing connections during the test run

    -whether you ran with a think time or not which would force the server to handle many more concurrent users (vs. 20 or so) before peaking tps.

    ------------------------------------------
    But I did run your tests after installing Resin with their http server. Note again, it this config it provides no process isolation, although maybe their Apache Plugin with Resin does. In-process (no process isolation) is always faster, but much less safe. In the tests I quickly ran, I ran with

    a) no connection resets and no think time;

    b) with connection resets, no think time; and

    c) with connection resets and a 1-second think time. This
    is the only test that comes close to simulating real-world conditions in a lab.

    The only tuning I did to Resin was turning off access logging and upping the number of KeepAlives. Tuning threads beyond default of 128 seemed to have no impact, but more tuning could be done.

    The results (I can publish the bitmaps and LoadRunner test scripts if anyone wants them):

    Scenario A)
    No think time, no connection resets between requests (20 client threads basically saturate both servers):

    -JAVA-RESIN: Customer-List.JSP: ~2600 TPS
    -ASPX-IIS: Customer-List.ASPX: ~2100 TPS

    Scenario B)
    No think time, but with connection resets between requests:
    (again, 20 client threads both hit peak TPS)

    -JAVA_RESIN: Customer-List.JSP: ~1900 TPS
    -ASPX_IIS: Customer-List.ASPX: ~1987 TPS

    Scenario C)
    1 second think time (max tps in this case roughly equates to max concurrent user loads supported):

    -JAVA-RESIN: Customer-List.JSP: ~280 TPS
    -ASPX-IIS: Customer-List.ASPX: ~1940 TPS

    Its quite possible that the in-process scenario (if that's what you ran) is actually *faster* than running with a real production web server...but likely has issues handling large concurrent user loads (ala scenario C and the Plants scenario).

    And Resin with Apache which I did not run might well do much better.

    So, you made your point! At the same time, its your (almost) full disclosure and publishing your code that allowed me to see what was going on or at least try your test out on my own--it took about 2 hours.

    -Greg
  92. I followed your advice.[ Go to top ]

    C)1 second think time (max tps in this case roughly equates to max concurrent user loads supported):-
    JAVA-RESIN: Customer-List.JSP: ~280
    TPS-ASPX-IIS: Customer-List.ASPX: ~1940 TPS

    Great for A/B Greg

    Now.. lets use jTDS JDBC driver instead of MS, use iBatis SQL map for your "Select * from xyz where abc=#ARG#" and set SQL map to cache to result to soft memory. Make sure your 64 bit jvm of choice is set to address all memory (jRockit won't have huge GC pause). It should take less than 2 hours again.
    (Soon enough a large % will be a cache hit even w/ timout ... and then it's just memory access speed :-)
    Now... set up a round and robin cluster of resin boxes to get any scale you want :-)

    .V
  93. I followed your advice.[ Go to top ]

    b) When you ran WebCat, what settings did you use? My guess (I may be wrong) is that you ran with no think time and no connection resets between requests so you could run with just a few client threads to keep it simple. Can you verify?
    I didn't use WebCat... I used MS Application Center Test, and you don't have to guess, see below.
    On point A, you could get very different results using the Apache plugin for Resin and running with a production quality web server.

    Resin *is* a production quality web server.

    ii. Its also important to run with a client think time.

    I rerun the tests with 1second think time... se results below.

    At anyrate, I do not know:
    -what web server you used
    -the process-isolation level between the web server and the app server
    -whether you ran with a think time or not which would force the server to handle many more concurrent users (vs. 20 or so) before peaking tps.
    -Only Resin.
    -There is only one process.
    -Look at the test scripts, I used no think time and the Connection is closed at the end of the request, so your guess is half-right.

    Scenario C)
    1 second think time (max tps in this case roughly equates to max concurrent user loads supported):

    -JAVA-RESIN: Customer-List.JSP: ~280 TPS
    -ASPX-IIS: Customer-List.ASPX: ~1940 TPS

    You are not mentioning which JVM you're using... are you sure it's JRockit?
    I rerun the tests with 1s think time using the Resin-JRockit combo and I can't reproduce this results.
    With 450 users (more than that and I saturate the client machine, which may lead to unpredictable results) this are the results I get:


    -JAVA-RESIN: Customer-List.JSP: ~438 TPS
    -ASPX-IIS: Customer-List.ASPX: ~357 TPS

    Picture

    Note that IIS pegs the CPU at 100%, but Resin still has some CPU to spare, this results are inline with my experience with Resin and IIS.
    Greg I'm also a .NET developer and let me tell you one thing:
    I have never seen IIS outperform resin in dynamic content... *never*.

    So what could explain the different results... I have no clue.

    Regards,
    Luis Neves
  94. I followed your advice.[ Go to top ]

    Luis,

    Thanks for the info. I will re-run with JRocket, I used the latest SUN JDK 1.5 release 5. What other specific tuning did you do with Resin, in terms of thread settings or other that I can make sure to replicate?

    The big difference technically is process isolation. Anything that runs in one (monolithic) process has a big perf advantage, at the expense of fault tolerance. I did not mean to imply their HTTP server is not production quality; however in their docs they don't really position it as such, and have gone to the trouble to integrate with IIS and Apache which they spend some time on in their readme.

    Another difference is the use of a two proc box (which I used) vs. a single proc box. Both are valid tests of course, but that might explain some differences as well.

    You mention that the ASPX box is pegged at peak TPS...This is what would be expected if you have no other bottlenecks in the system other than the efficiency of the web/aspx processing engine. With IIS/ASPX running fully process isolated vs Resin running in one process, I am not surprised you got this result, and will note that when you went to a 1 second think time the results got a lot closer.

    However, why can't you push the Resin server to peak CPU utilization by adding more users? The differences you show between Resin and ASPX/IIS might actually grow in Resin's favor if you did.

    Also, I like your .NET code, I tested the one with data binding and the repeater control (the slightly slower ASPX implementation one in your test).

    -Greg
  95. I followed your advice.[ Go to top ]

    What other specific tuning did you do with Resin, in terms of thread settings or other that I can make sure to replicate?
    The only thing I've done to alter the default Resin setup was to disable acess-logging and increase the dependency-check-interval to 5 minutes.
    However, why can't you push the Resin server to peak CPU utilization by adding more users? The differences you show between Resin and ASPX/IIS might actually grow in Resin's favor if you did.

    I don't have your resources, in addition to having a piece of thrash as server, I also have a pice of trash as the client machine doing the stress testing, adding more users would force the client machine to a point where the results will no te be reliable.

    Regards,
    Luis Neves
  96. You get bonus point for trying[ Go to top ]

    Really, if a Java developer had done this and there were no .NET results in the benchmark, it would not even be controversial, and many developers/architects would likely even find it useful.

    It could have been useful, had the marketing team not taken over, but since you guys choose to put so many variables in the equation (as you already have been told) the only possible useful conclusion to derive from it is the fact that MS seems to have a really optimised data driver for SQL Server 2005.
    You are ignoring/dismissing that the .NET magic pixie dust didn't work so well when dealing with Oracle... the database that has better performance than SQL server in the TPC-C non-clustered results in a machine with half the processors ;-)


    As for MS doing benchmarks, its always fine when people slam MS as M$ (even though we cost a fraction of Web Sphere, BEA); its always fine when folks on this forum or JavaLobby slam MS as not scalable, or IBM/BEA/SUN does a benchmark ripping MS performance.

    Unbelievable!... You have done it again! Your positioning yourself as the victim.
    <sobbing>Poor MS, always the "whipping boy" on those nasty Java Forums, where all those bad and uninformed people keep saying rude and untruthful things</sobbing>
    Grow up and stop seeing the world with rose-colored glasses.

    Give me a break... I just spent the last 3 days watching presentations at the "Ready to Launch Tour 2005" and seeing the world with ms-coloured glasses.
    We spend a huge amount of time on performance and we have every right and even a responsibility to showcase this.

    I hate to be the one to tell you this, but if you are looking to attract WebSphere customers you are spending a huge amount of time solving the wrong problem. Serving web pages really fast is the least of their worries.

    Customers are free to ingore the message, or consider it and maybe even download the fully disclosed code and learn from it.

    Surely you jest!
    After looking at the code the only thing that can be learned from it is "How *not* do it".
    - Do not use string concatenation in loops (request.jsp, line 220).
    - Do not use the piss poor excuse for a Jdbc driver that MS offers as the official driver for SQL Server, use Jtds instead.
    - If your method signature declares that it throws Exception, please refrain yourself of catching Exception and re throwing it.
    - If you don't know exactly what do to with caught exceptions and you if feel that logging is the "The right thing to do" please don't use System.out.println() to do it - Util.java, line 305 - because if for some reason you start to have Exceptions you will bring the server to its knees.
    - If you store images as Blobs in the DB, resist the urge to store them in File system for performance reasons *only*, there might very well be good reasons like to store them in the DB: privacy/security concerns, logic/relational integrity reasons our deployment issues.
    - In Java, arrays are already bound checked, there is little reason to do it yourself, unless you *really* want to be sure - Util.java, line 164
    - Do not use floats when dealing with currency - Util.java, line 175

    I could go on and on ... the code is a case study of Anti-Patterns.

    But don't get mad about the message, we are the ones disclosing all the code and testing details, and the app is based on IBM's own guidance/sample app they ship in the box.

    "MS does it faster and cheaper" ... is this the message?
    The message is a lie.
    You guys are adverting dubious conclusions from a poorly done "study".

    Regards,
    Luis Neves
  97. You get bonus point for trying[ Go to top ]

    You are ignoring/dismissing that the .NET magic pixie dust didn't work so well when dealing with Oracle... the database that has better performance than SQL server in the TPC-C non-clustered results in a machine with half the processors ;-)

    With "IBM Power 5" processors, not Intel Itanium, so this looks like a processor issue to me..... be honest.

    Regards,
    Marlon Smith
  98. You get bonus point for trying[ Go to top ]

    You are ignoring/dismissing that the .NET magic pixie dust didn't work so well when dealing with Oracle... the database that has better performance than SQL server in the TPC-C non-clustered results in a machine with half the processors ;-)
    With "IBM Power 5" processors, not Intel Itanium, so this looks like a processor issue to me..... be honest.

    I know... I was teasing him :-)
    TPC-C is the ultimate meaningless benchmark, I'm surprised that people still take the results seriously.

    Regards,
    Luis Neves
  99. Tuning details[ Go to top ]

    Kirk,Yes, I agree that engineers building products (WebSphere, .NET, etc) are interested in isolating the perf differences so they can focus on where to improve, I said this and I think we agree on this.As for customers having to 'eat the entire stack'

    This is absolutely not true.
    Customers largely want to know the performance of the entire stack in a deployed scenario

    End users yes, system designers no.
    The app, as tested, is a "WebSphere data-driven Web app" and all the technologies in the stack tested are those WebSphere customers use everyday when deploying data-driven web apps on the IBM app server.-Greg

    Remember we do agree that one cannot extrapolate results from this type of benchmark for other applications. Yet this is what you keep insisting on doing. And yet keep saying that I'm talking about micro-benchmarking which I'm not. What I am talking about is deriving useful information from the benchmark and the only way to do that is to demux the exercise into useful chunk sizes. In this regards, the "benchmark" fails.

    regards,
    Kirk

    kirk[at]javaperformancetuning.com
  100. Fixing the message[ Go to top ]

    Greg,
    b) If the code is published, and tuning is published with the results, then customers can use the approaches in that code as a template for their own apps and architectural decisions, without necessarily having to find out the hard way. [...] This is useful and good, and in fact our PetShop 3.0 application has been downloaded by over 200,000 different customers in the last 2 years precisely becuase they can use the code to jumpstart thhir own apps in a high-perf way.

    That would be nice if these samples were written in a clean way, with clean code rather than a complete mess, otherwise this attitude is criminal for the software industry. This is not targeted at Microsoft only but to all samples from Microsoft, IBM, Weblogic, Oracle, ...

    Otherwise I have people giving as an excuse for bad code: "IBM/Microsoft/BEA/Oracle is doing this in its examples, who are you to say it is not good ?"

    As has been mentioned already, Exception handling is atrocious, it looks like the developper does not know what the finally clause is about. This is a recipe for memory leaks, and I'm surprised that in a sustained load testing environment you did not have any issues.

    This:
    <code>
    try {
    ....
    rs.close();
    ps.close();
    conn.close();
    } catch (Exception e){
    conn.close();
    throw new Exception(e.getMessage());
    }
    </code>

    Should be for example:

    <code>
    try {
        ...
    } finally {
    Util.close(conn, ps, rs);
    }
    </code>


    What's the point of catching every exception in all classes to wrap it in an Exception when the signature declares an Exception. Not only you lose precious information but the whole thing adds hundreds useless lines of code.

    The context path is hardcoded in the Java and JSP code. As it is you forgot to fix a few of these when renaming from PlantsByWebpshere to PlantByWebsphereJDBC and you also rely on the servlet invoker.

    There is also a gem in Order.java which is my favorite so far:

    <code>
    private void createBackOrder(Connection conn, int itemid, int currentQuantity, int quantityNotFilled, int minThreshold, int maxThreshold, Timestamp time)
            throws Exception {
        try {
            try {
    ... sql update ...
            } catch (Exception ee) {
                throw new Exception("Could Not Create BackOrder!!");
            }
        } catch (Exception e) {
            if (e.getMessage() != "Could Not Create BackOrder!!") {
            try {
            ...sql insert...
                } catch (Exception eee) {
                    throw new Exception(eee.getMessage());
                }
            } else {
                throw new Exception(e.getMessage());
            }

        }
    }
    </code>

    Those little things are not a performance issue for response time, but this is a human performance issue: less code = less bug = easier to read = easier to maintain = easier to tune = easier to extend.

    So please, IBM,Microsoft, BEA, Oracle, etc... don't publicly say that this is a template developpers should use. You're not dealing with the consequences. We do.

    Put a big warning disclaimer:

    "Disclaimer: This is bad code developped in a hurry as a throw-away prototype to demonstrate specific points. It does not follow professional development practices, DO NOT try to reproduce in your company code"
    c) It turns out, performance *does matter.* [...]But taking this one step further, if app server (A) performs twice as fast as app server (B) for the actual complete app or some core portion, then the customer can use app server A and deploy on 1/2 as many servers as app Server B. This means 1/2 as much in hardware acquisition costs, 1/2 as much on middle tier software licensing, and maybe most import, reduced management costs if you believe managing a cluster of 4 servers is less costly than managing a cluster of 8 servers. Bottom line, performance matters a great deal. Also, imagine the scenario where a customer has a Web-based app running on a mainframe, and its costing them $x million a year to have it there. Then they take this same app, and using Java or .NET or other and find thhey can get the same or better performance on 3 intel-based 2-way servers. Wow. And with clustering they find out they can get the same uptime.

    No matter if you are able to change the app. server, badly written applications will NOT scale.

    Let me explain what I'm facing EVERYDAY when auditing code or helping customers solve performance problems. Each time, I'm thinking I have seen the highest level of incompetence, but each time there's always someone to prove me there is another level.

    Most of the time, a project is made with inexperienced developpers of an IT company with no knowledge about even the slighest basic development practices, nor J2EE or .NET development but just basically able to layout keywords. The architect in charge (if any) is great at displaying boxes in Powerpoint too.

    These people are producing a *massive* amount of spaghetti code with amazingly inefficient algorithms that drain each time the entire content from a database that has a completely weird data model. All this with a queuing network and architecture that does not make any sense.

    Everybody make mistakes but those are not. It's incompetence.

    The result of these kind of development is able to put a 128-way HP SuperDome on its knees to handle 100 users to display their weekly timesheet on a webpage.

    If a poorly written app. is responsible for such performance (and this is 90% the case), it does not matter if it's .NET or WebSphere or WebLogic or JBoss or Geronimo or Tomcat alone you won't scrap a single % improvement. Because the IT company refuse to see its mistakes, you will have to throw so many computers to have decent performance that it becomes ridiculous. Your only way to save a bit of money in the short term is to put OpenSource app. servers.

    In the long term I would recommend my customers to stop operating with IT companies that have such a desastrous track record at developping crap and not admitting and fixing their mistakes.

    But there's also some education on the customer too: if something is cheap upfront, that may sound attractive, but be aware of what the shortcuts taken by the IT company will cost you later.
  101. Microsoft vs. IBM[ Go to top ]

    There certianly is a class of applications that can be built more simply then they are. However I have to strongly disagree that these account for *most* of them. Maybe this is your experince but it is certianly not mine nor is it the experience of my customers. They are solving very complex problems that requires complex logic to solve.

    I think that depends on how you define "most." If you count applications, I'm sure there are more "simple" ones. Kind of like there are more ants than human beings.

    However, if you define "most" in terms of money, I'd tend to agree that "most" money is spent on large, complex applications.

    Of course, then you have to define "simple" and "complex" logic. A piece of software can have incredibly complex logic but run on a workstation in single process for hours and hours. It's logically complex, but architecturally simple.

    Likewise Peter's banking applications may be logically simple (I'm not saying they are), just check some conditions, debit one account, credit another; but it architecturally very complex due to load, fault tolerance, and security requirements.

    Or you might have an application that is both logically and architecturally complex, like an air traffic management system.

    Ok, so maybe I'm being nit-picky about words, but they really matter.
  102. Microsoft vs. IBM[ Go to top ]

    There certianly is a class of applications that can be built more simply then they are. However I have to strongly disagree that these account for *most* of them. Maybe this is your experince but it is certianly not mine nor is it the experience of my customers. They are solving very complex problems that requires complex logic to solve.

    I think that depends on how you define "most." If you count applications, I'm sure there are more "simple" ones. Kind of like there are more ants than human beings.

    However, if you define "most" in terms of money, I'd tend to agree that "most" money is spent on large, complex applications.

    Of course, then you have to define "simple" and "complex" logic. A piece of software can have incredibly complex logic but run on a workstation in single process for hours and hours. It's logically complex, but architecturally simple.

    Likewise Peter's banking applications may be logically simple (I'm not saying they are), just check some conditions, debit one account, credit another; but it architecturally very complex due to load, fault tolerance, and security requirements.

    Or you might have an application that is both logically and architecturally complex, like an air traffic management system.Ok, so maybe I'm being nit-picky about words, but they really matter.

    I'd like clarify a bit. the requirements I stated in previous posts were the bare minimum and in no way represents what happens in a real banking application. this is especially true of larger banks that grew through acquisition and have a half dozen separate systems with different architecture.

    In most cases, what the customer sees is just a snap shot and not a reflection of the real activity in a given account. In the back office, there's real-time and batch processing happening throughout the day. That's another reason for stateful containers. Not only are these process distributed, but they are complex from a business requirement perspective. A simple performance comparison is a nice baseline that measures "theoritical max throughput". Actual throughput is a completely different ball game.

    peter
  103. Microsoft vs. IBM[ Go to top ]

    Microsoft has it pretty easy here. Lets face it, even though IBM is the biggest vendor, they are also the slowest app server out there.

    I don't mean to defend IBM, as they are no doubt big enough to defend themselves, but properly configured they are certainly not the slowest, and in some cases (from tests that we've done internally) they are actually the fastest.
    The vast majority of Microsoft web apps are two-tier in nature and they are compared against 3 or 4 tier java apps.

    Yup, but who wants to compare to PHP, Python and Perl (free, free and free) when you can compare to IBM, IBM and IBM (not free, expensive and "enterprise")? It makes good business sense for Microsoft, because (sorry Greg!) it makes Microsoft customers feel good that their locked-in choice is somehow better than that "big blue" choice that all those "big inefficient" companies waste money on.

    Like I said, though, the result for Java will hopefully be better JVMs and more optimized middleware implementations.

    Peace,

    Cameron Purdy
    Tangosol Coherence: Clustered Shared Memory for Java
  104. Microsoft vs. IBM[ Go to top ]

    Like I said, though, the result for Java will hopefully be better JVMs and more optimized middleware implementations

    http://weblogs.java.net/blog/opinali/archive/2005/11/mustangs_hotspo_1.html

    Still, how come the Java community is not able to provide a benchmark (something solid, far better than the WS based published by SUN some months ago that was shattered by M$ later). Fireworks are comming, with higher abstraction mechanics from RoR and with unbeliveble performance boost from M$. What's going on?

    Are we dreaming the wrong dream?

    Regards,
    Horia Muntean
  105. Microsoft vs. IBM[ Go to top ]

    Yup, but who wants to compare to PHP, Python and Perl (free, free and free) when you can compare to IBM, IBM and IBM (not free, expensive and "enterprise")? It makes good business sense for Microsoft, because (sorry Greg!) it makes Microsoft customers feel good that their locked-in choice is somehow better than that "big blue" choice that all those "big inefficient" companies waste money on.

    Weblogic has more mindshare in the Enterprise Java community. If these comparisons were targeted for developers instead of IT managers, MS would have picked Weblogic.
  106. Microsoft vs. IBM[ Go to top ]

    i.e. the only reason I can think of MS not picking Weblogic is that this comparison is primarily targeting IT managers.
  107. Microsoft vs. IBM[ Go to top ]

    Or BEA does not have its own backend DB system?
  108. Why focus on IBM WebSphere, not BEA?[ Go to top ]

    The answer is straightforward (and I am not saying one is better than the other):

    1) WebSphere currently is the market leader in terms of market penetration for J2EE app servers, at least for larger companies. Lots of independent studies seem to show this lately. Hence we get more customer requests for comparative data vs. WebSphere than for BEA lately.

    2) WebSphere has a more liberal benchmarking clause now, where you are allowed to disclose results if you publish full disclosure and code/tuning tested. BEA restricts publication of any benchmark results without their approval. FYI, with .NET 2.0 we also went to a more liberal disclosure policy, basically the same as IBM's. We, like IBM, just want full disclosure so we can review/comment/verify and customers themselves can verify results.

    Finally, on this topic IDC recently did a study for us [yes, we sponsored it, so Beware, Beware, Beware ;-)] using completely random selection of large companies across 8 different countries, using their own scientific research methodology and results in no way changed/influenced by MS. In fact Gartner Group analysts originally constructed this survey with us back in 2004 and the new IDC data backs up the original Gartner Research Group findings. We have published the entire IDC research deck (with their mandatory caveats) at

    http://download.microsoft.com/download/1/8/a/18a10d4f-deec-4d5e-8b24-87c29c2ec9af/IDC-MS-MissionCritical-WW-261005.pdf

    -Greg
  109. IBM acknowledges that...[ Go to top ]

    I don't mean to defend IBM, as they are no doubt big enough to defend themselves, but properly configured they are certainly not the slowest, and in some cases (from tests that we've done internally) they are actually the fastest.

    Their results are more sensitive to L2/L3 cache than other app server vendors. If you're getting a server to run WebSphere, one of the cardinal rules is to plunk down the money to max out the L2/L3 caches.
  110. I've used Websphere 6.0. We didn't get around to doing performance tests with it, but judging by the large number of functional bugs we discovered, I wouldn't be too surprised if it also had significant performance and scale issues.

    The fact that Websphere seems to be the default application server for many large organisations because it is owned IBM is a great shame.

    There are many superior alternatives to Websphere in the market.
  111. I saw how the .NET Petstore helped to improve Java and the available libraries significantly.
    Competition helps a lot, but misinformation a.k.a "Benchmarks" is the dirty competition.
  112. Smart idea to try to put a Java theme on a TheServerSide.net story. That site might get its first traffic in months.

    Of course it may be more humane just to go Old Yeller on the unfortunate 'companion site'.
  113. Now before you go declaring TSS.NET dead, you need to understand something that we ourselves are learning here at TSS. Our two communities, while sharing both a common codebase and the mission to lead the communities we support, do support two very different communities.

    Here on the Java side of the world, things are in a bit more "religious". You have many warring factions with this framework and that ORM doing battle. TSS.COM is often the battlefield on which this takes place, and that's all great. It's kind of like the Middle East actually with several diametrically opposed forces working for supremacy. It makes for great debates and an occasional outburts of violence.

    Over on the .NET side, we've only have one real higher power and so there's little to debate. We occasionally think that Microsoft is doing something stupid but for the most part our readers are content to get their news and content and be on their merry way. Our readership and pageviews have been steadily increasing but more importantly what we've seen is that we are getting huge RSS numbers. In fact, we made Feedster's Top 500 blogs.

    So TSS.COM's readers should realize, just as we have over the past year, that a lack of comments on our news posts doesn't mean that nobody is reading or that the site is "dead".

    Paul Ballard, Editor for TheServerSide.NET
  114. but more importantly what we've seen is that we are getting huge RSS numbers.

    All that's missing is dialog and community. Sharing ideas, albeit violently at times, is still sharing ideas. Posting marketing fluff with no dialog says alot for the MS culture.
  115. I agree that dialog and community are important, it's just that this community seeks it's dialog in other forms, primarily blogging. That's not to say that we've given up by any stretch, why we even welcome comments from Java programmers! :-)
  116. Here on the Java side of the world, things are in a bit more "religious". You have many warring factions with this framework and that ORM doing battle. TSS.COM is often the battlefield on which this takes place, and that's all great. It's kind of like the Middle East actually with several diametrically opposed forces working for supremacy. It makes for great debates and an occasional outburts of violence.

    Over on the .NET side, we've only have one real higher power and so there's little to debate. We occasionally think that Microsoft is doing something stupid but for the most part our readers are content to get their news and content and be on their merry way.
    So who is religious then? (with the high-pitch voice of gingerbread man) God bless, all of you!
  117. It's specific to the application...[ Go to top ]

    Two things seem to exist in our business that everybody from senior management to junior analyst feels to be experienced enough to discuss: First is web design and second is performance. If I have learned a single thing than this one: Relevant performance benchmarking is the hardest thing to do! A single undiscovered error can have huge effects eventually rendering the whole results completely useless. This in my experience happens in about 90% of all performance benchmarks.

    The important thing is that performance is only minimal related to "performance" of an application server. It is rather up to the application to be fast in a given context!

    In that sense it is not a "Microsoft-trying-to-screw-IBM" issue. It is just complete nonsense to perform these kind of tests if not both parties have the opportunity to build an optimal application for their platform and for the given context. Unfortnately this rarely ever happens...

    Finally I wonder what it means if one thing can handle "about 3% more transactions per second" than the other....

    Sven
  118. This wouldn't happen with .Net ![ Go to top ]

    Ironically enough, TSS is apparently overwhelmed by the number of hits caused by the benchmark announcement:

    java.rmi.ServerException: RuntimeException; nested exception is: kodo.util.DataStoreException: java.util.NoSuchElementException: Timeout waiting for idle object
  119. Great now imagine I'm in a small company bought by a bigger on e who have standardised on Solaris or Linux as the enterprise platform. With websphere or any Java App Server I could probably deploy it in the new environment.

    Also if I am selling a product I don't want it to depend on just windows or Mono.

    Not flame-baiting but what is the percentage verndor lock in with the .Net offering?
  120. Have you tested the performance of webserver on Unix
    Machine. Please try this, its awesome.
  121. First of all the comparision code is not optimized in java. The author just uses catch(Exception e) blindly without even noticing what the methods n try block is throwing. Secondly the Author used e.rpintStackTrace() which will never be used in any production code. The Author doesn't even know the performance cost of catching general exception compared to catching the specific exception and also the cost of printStackTrace().

    I suggest Author to go back to school and learn core Java before writing such non sense code for bench marking.

    I completely don't agree with the little knowledge Author upon using Websphere for the code that is bench marked where he could have better utilized Tomcat instead...
  122. Exceptions[ Go to top ]

    Good feedback. The benchmark is run under conditions where:

    1) All exceptions should be sent back up the chain so they result in a 500 error in the browser, such that the performance test tool (Mercury) can catch them and hence alert us to the fact exceptions are being thrown. this is very important for benchmarking, lest you start testing stuff that is actually just throwing errors but you don't know this. The stacktrace print is there so once the exceptions are thrown, its easy to go back in logs and figure out what is not working (database down, jndi naming exceptions, no connections left in the pool, null references, etc).

    2) During all final benchmark runs for all published data, *zero* exceptions are being thrown, so there would be no perf penalty here.

    3) We will galdly change the code, however, and publish the updates to use a different technique for catching exceptions. Just paste some sample code here for consideration and comment by the community.

    -Greg
  123. Exceptions[ Go to top ]

    Good feedback.
    ....
    3) We will galdly change the code, however, and publish the updates to use a different technique for catching exceptions. Just paste some sample code here for consideration and comment by the community.-Greg

    I think it's safe to say that one can suggest all the code changes one wants. I see that your guys made quite a few code changes for the betterment of the application. That said it doesn't address Cameron's question of architecture.

    regards,
    Kirk

    kirk[at]javaperformancetuning.com
  124. First of all the comparision code is not optimized in java. The author just uses catch(Exception e) blindly without even noticing what the methods n try block is throwing. Secondly the Author used e.rpintStackTrace()

    Unless the code is throwing exception, this should not be a problem. "try-catch" has no cost unless actually used. If the code is throwing exceptions, then there are bigger problems with it then the numbers that are being produced.

    regards,
    Kirk

    kirk[at]javaperformancetuning.com
  125. I suggest Author to go back to school and learn core Java
    >before writing such non sense code for bench marking.

    there is no penalty for the performance unless you've got the exception

    >Secondly the Author used e.rpintStackTrace() which will >never be used in any production code.
    Why? :-)


    Dmitry
    http://www.servletsuite.com
  126. Hmm, my reading of the benchmark "App Server Perf Comparison" shows the .NET app losing to the WebSphere JDBC app when the Oracle 10G database is used on both windows and linux.

    The only time .NET is faster is with the SQLServer 2005 database.

    Interestingly though, the WebSphere app is slightly slower when run against SQLServer 2005 but the .NET app gets a massive performance improvement.

    Is it really a surprise that a microsoft product can be made to work better with a microsoft database than a 3rd party product can. Perhaps the microsoft team developing the beta JDBC driver they used in the benchmark can look at the results and realise they still have some improvements to make.

    Cheers
    David
  127. TSS is back to same old ***** [edited for offensive content Kirk Pepperdine].
  128. .NET 2 is not an app server[ Go to top ]

    The thing that gets to me is that Microsoft believes that you can compare one .NET "vm" running 3 tiers of plain objects with a full blown appserver. If you want a fair comparison, you must create 3 tiers of pojos, build them into a jar and reference that jar from a couple of pages in tomcat.
    The .NET version doesn't have half the functionality of the J2EE app server, object pooling, caching, distributed objects , distributed transactions etc. etc.
  129. .NET 2 is not an app server[ Go to top ]

    In response, respectfully,
    The thing that gets to me is that Microsoft believes that you can compare one .NET "vm" running 3 tiers of plain objects with a full blown appserver. If you want a fair comparison, you must create 3 tiers of pojos, build them into a jar and reference that jar from a couple of pages in tomcat.The .NET version doesn't have half the functionality of the J2EE app server, object pooling, caching, distributed objects , distributed transactions etc. etc.

    Actually, .NET has all of these features including:

    -object pooling

    -caching (a very complete caching mechanism inclusive of object and page-based caching, chache invalidation callbacks, cache dependencies, etc.)

    -distributed objects (via .NET remoting which is the equivalent to RMI in the java world, or, via Web Services with SOAP serialization which in my opinion is easier, and offers the benefit of integration with J2EE components and vice/versa)

    -distributed transactions. Yes we, have this too, in fact we had this before J2EE came out. Our transaction model is a bit different today with .NET 2.0 in that we allow you to directly integrate with our Distributed Transaction Coordinator in .NET without creating a COM component (COM to us is a legacy technology now) via the System.Transaction classes. We speak XA from the DTC, and support interaction with any XA compliant resource, so yes you can do distributed transactions from .NET across Oracle, SQL, DB/2, etc no matter what platforms those DBs are running on.

    Other technologies many Java developers do not know .NET/Windows Server supports in the box:

    -Clustering/load balancing
    -Failover
    -Shared session state via a dedicated state machine or backend database (a simple config option, no code change)
    -Great interop with J2EE via Web Services (it takes about 1 second to create a web service out of any method in .NET)
    -Multiple worker processes that can be throttled by CPU usage or memory consumption
    -Worker process recycle----we automatically will create and failover to a second worker process in ASP.NET should the first one die or consume too much memory (a runaway app, for example)
    -Messaging ( a whole set of classes here that sit on top of MSMQ, and IBM has also released .NET classes that work on top of IBM MQSeries)
    -lots more.

    Point is, it *is* a full blown app server!

    -Greg
  130. .NET 2 is not an app server[ Go to top ]

    PS: Lots of people get confused on that topic becuase we integrate full-blown app server functionality into Windows Server with .NET. We don't charge extra for it, hence the huge cost differential when compared to IBM, BEA. As for functionality, both Forrester and Gartner now put .NET/Windows Server in their leadership group (along with BEA and WebSphere) for fully functional, enterprise-class app servers. The app server workload now accounts for a large chunk of Windows Server deployments, with directory services, file/print, database/rdbms hosting, and packaged app (SAP, PeopleSoft, etc) being the other core workloads.

    -Greg
  131. .NET 2 is not an app server[ Go to top ]

    Other technologies many Java developers do not know .NET/Windows Server supports in the box:
    -Clustering/load balancing
    -Failover-Shared session state via a dedicated state machine or backend database (a simple config option, no code change)
    -Great interop with J2EE via Web Services (it takes about 1 second to create a web service out of any method in .NET)
    -Multiple worker processes that can be throttled by CPU usage or memory consumption
    -Worker process recycle
    ----we automatically will create and failover to a second worker process in ASP.NET should the first one die or consume too much memory (a runaway app, for example)
    -Messaging ( a whole set of classes here that sit on top of MSMQ, and IBM has also released .NET classes that work on top of IBM MQSeries)
    -lots more.Point is, it *is* a full blown app server!-Greg

    I've used Database driven shared sessions in the past. From my experience it quickly hits a scalability wall. To phrase it in simple terms. If I have to build a consumer banking application that has to support a max load of 5K concurrent users. It would be harder to achieve the same level of fault tolerance with .NET than it is with a proven J2EE product.

    Not that it can't be done, but I'd probably have to setup 20 databases for the shared sessions and have 10-20 app servers for each database. For many shops, that is not an acceptable solution for fault tolerance, because it creates a completely different set of problems. of course, one could combine MSMQ with Sql Server to do session replication, but I haven't tried that and don't know the limitations first hand.

    A big mainframe can easily handle several thousand concurrent database connections, whereas a low or mid end PC running Sql Server definitely can't. I would love to see Microsoft tackle these types of cases and really give J2EE some hard competition.

    I think once MS sees first hand how hard it is to built these types of applications, there will be less talk about "smaller, simpler and lighter is better." Simple data driven web apps are completely different from heavy financial applications. I think many people still don't realize just how different it is.

    peter
  132. .NET 2 is not an app server[ Go to top ]

    -distributed transactions. Yes we, have this too, in fact we had this before J2EE came out.
    Yes, of couse. What about transactional reading of message from non local MSMQ Queue?
  133. .NET 2 is not an app server[ Go to top ]

    Actually, .NET has all of these features including:-object pooling-caching (a very complete caching mechanism inclusive of object and page-based caching, chache invalidation callbacks, cache dependencies, etc.)-distributed objects (via .NET remoting which is the equivalent to RMI in the java world, or, via Web Services with SOAP serialization which in my opinion is easier, and offers the benefit of integration with J2EE components and vice/versa)-distributed transactions. Yes we, have this too, in fact we had this before J2EE came out. Our transaction model is a bit different today with .NET 2.0 in that we allow you to directly integrate with our Distributed Transaction Coordinator in .NET without creating a COM component (COM to us is a legacy technology now) via the System.Transaction classes. We speak XA from the DTC, and support interaction with any XA compliant resource, so yes you can do distributed transactions from .NET across Oracle, SQL, DB/2, etc no matter what platforms those DBs are running on. Other technologies many Java developers do not know .NET/Windows Server supports in the box:-Clustering/load balancing-Failover-Shared session state via a dedicated state machine or backend database (a simple config option, no code change)-Great interop with J2EE via Web Services (it takes about 1 second to create a web service out of any method in .NET)-Multiple worker processes that can be throttled by CPU usage or memory consumption-Worker process recycle----we automatically will create and failover to a second worker process in ASP.NET should the first one die or consume too much memory (a runaway app, for example)-Messaging ( a whole set of classes here that sit on top of MSMQ, and IBM has also released .NET classes that work on top of IBM MQSeries)-lots more.Point is, it *is* a full blown app server!-Greg
    Greg,


    We all believe .NET programmers could (potentially buying additional
    products) use techniques You described above. The very same statement is
    valid for every POJO application without container. The problem is not
    availability of these mechanisms separately. The problem is how easy is to
    achieve some goals every J2EE container has to implement, to be certified. For
    instance having remoting, doesn't automatically mean that You can do what
    every J2EE container can do - what about clustering, distributed beans, HA,
    failover, statefull object pooling and caching, distributed security,
    manageability (JMX), distributed deployment etc. I worked with J2EE
    application in quite big financial institution, and when the institution
    grow, they realized that they want more power and availability. We just
    reconfigured container, added new nodes and that's all. Try the same with
    pure .NET or POJO. The biggest mistake it to think that J2EE container is
    just bunch of loosely coupled already available mechanisms...

    Artur
  134. Well Said[ Go to top ]

    Greg, I think this ia a valuable point. Here is an example of market reality. I work in a HA environment. We have Java and .Net developers on site, as well as the fact that I used to be a consultant and have a number of JAVA and .NET resources ( people) to contact.

    Ask the JAVA/Websphere or Java/Weblogic person, how to setup a High Availability site (with or without remoting). The will have an answer.

    Ask a .Net developer to do the same, very few of them have an answer.
  135. How dull, I'm sure if they ported it to a Cray it would run even faster. Microsoft's code only runs on their machines. If Cray did the same they could quite legitimately sit in the same benchmark but their stuff only runs on Crays.

    The beauty of the Java version is that it runs on almost anything, can use almost any database, has a choice of over a dozen application servers, many free and you can get better Java programmers for cheaper prices than C# ones.

    This is a totally waste of time and energy, Microsoft has lost this race and we've moved on from JEE anyway so it's all academic.

    :-)

    -John-
  136. I see the this in all the java ears, where the same setting for the .NET stuff?

    <connectionPool
    xmi:id="ConnectionPool_1119393544385"
    connectionTimeout="180"
    maxConnections="75"
    minConnections="75"
    reapTime="180"
    unusedTimeout="1800"
    agedTimeout="0"
    purgePolicy="EntirePool"/>
  137. Not so fast, sparky...[ Go to top ]

    Actually read the report before you put in the summary that .Net is 183% faster than WebSphere 6.0.2.3.

    Here are a couple of interesting things:

    .Net is only significantly faster when talking to a SQL Server 2005 database and only 183% faster when comparing this result to WebSphere's EJB result. Note that the JDBC driver used is the SQL Server 2005 beta driver, which isn't released yet.

    The JDBC/SQL version of PlantsByWebSphere performs comparably to .Net for Oracle on Windows, and better than .Net for Oracle on Linux.

    The primary conclusion of this result should be that Microsoft has highly optimized the data paths between .Net and SQL Server 2005, and that they don't have a JDBC driver that can measure up.
  138. I only underline that application used for these benmarks was not build for performance scope (like any other application into 'sample' area of WAS) but for show some tecnical aspects of J2EE/WAS.
    If you, or MS, want to test a performance-oriented WAS application you can use Trade 6:

    IBM Trade Performance Benchmark Sample for WebSphere Application Server
    (otherwise known as Trade 6) is the fourth generation of the WebSphere end-to-end
    benchmark and performance sample application. The Trade benchmark is designed and
    developed to cover the significantly expanding programming model and performance
    technologies associated with WebSphere Application Server. This application provides a
    real-world workload, enabling performance research and verification test of the JavaTM 2
    Platform, Enterprise Edition (J2EETM) 1.4 implementation in WebSphere Application
    Server, including key performance components and features.
    Overall, the Trade application is primarily used for performance research on a wide range
    of software components and platforms. This latest revision of Trade builds off of Trade
    3, by moving from the J2EE 1.3 programming model to the J2EE 1.4 model that is
    supported by WebSphere Application Server V6.0. Trade 6 adds DistributedMapbased
    data caching in addition to the command bean caching that is used in Trade 3.
    Otherwise, the implementation and workflow of the Trade application remains
    unchanged.
    Trade 6 also supports the recent DB2® V8.2 and Oracle® 10g databases. Trade’s new
    design enables performance research on J2EE 1.4 including the new Enterprise
    JavaBeansTM (EJBTM) 2.1 component architecture, message-driven beans, transactions (1-
    phase, 2-phase commit) and Web services (SOAP, WSDL, JAX-RPC, enterprise Web
    services). Trade 6 also drives key WebSphere Application Server performance
    components such as dynamic caching, WebSphere Edge Server, and EJB caching.
  139. Apple vs IBM PC[ Go to top ]

    .NET vs J2EE
    Apple vs IBM PC
    close vs open
  140. ok, summary time...[ Go to top ]

    So to summarize:

    This is a pretty real comparison of 2 technology stacks, unfortunately it's Microsoft's soon-to-be-released next-gen stack against a stack that is clearly inappropriate for the task at hand. Shame on IBM for putting this out there to be used against the Java community, I would have thought we'd learned our lesson with the Petshop fiascos.

    Greg has done an admirable job being professional and informed on behalf of Microsoft, and has remained factual even in the face of some bitter and non-professional responses. Way to go.

    Unfortunately, you're testing the wrong things, as has been pointed out. I would hope no-one would build an app the way IBM did to meet the set of requirements for this application. Even your JDBC version, while better, is incurring the penalty of running in an EJB server with the services implemented as SLSB's and has to run in WebSphere, which is a burden in and of itself :-)

    Also, the impression I got (without looking through the materials, admittedly) is that this is running on a single app server machine? What kind of deployment is that? The problem is that this doesn't prove ANYTHING about the scalability of an application. What do you do when your 8x box can't handle the load?

    Scalability in a cluster is a whole other beast, and designing and building to be scalable in a cluster often has detrimental effects on the single-box performance of your application. As a major example of this, think about caching.

    Caching on a single box is simple: just save some data in memory and use that instead of going to the database. If the app changes the data, then clear the cache when you write to the database and load it next time. Very simple, and very fast.

    Unfortunately this is completely useless when you go to a cluster. Adding the second machine to the cluster makes this method of caching dangerously incorrect as you will very quickly notice as you get dirty data in your database. Caching complexity goes from zero to 100 WITH THE SECOND BOX ADDED TO THE CLUSTER. From there it's only increments of how much stale data will corrupt your database, the damage was done as soon as you went into a cluster.

    So what's to be done? Well, you've basically got a couple of options:

    1 - Don't cache anything (or at least anything transactional). Load it from the database every time. This performs poorly per box, but it scales ok for the first couple of boxes, until your DB starts to fall down. Then you start thinking about upping your DB hardware or really complex and expensive solutions like Oracle RAC.

    2 - Use a clustered cache (like Tangosol Coherence) to allow you to have transactional caches which maintain correct data in-memory across the cluster. This adds a little bit of overhead over the simple local cache, but is much faster than not caching and maintains correctness of your data. It does add to your per-CPU cost over the single box, local cache design, but it probably saves you over the cache-nothing solution as you scale.

    I'd like to see this as a real test of an enterprise solution. How about a 10 machine cluster running against a DB? Cameron's offered some engineering time to get the Java version going. I'll offer up some design patterns for non-EJB declaratively transactional services... We can make it run in just a Servlet container, then you can test it in a cluster of WebSphere, Resin, and Tomcat servers for comparison.

    How does the .NET solution handle scaling out across a cluster like this? How do you maintain a transactional cache, or do you fall back to the database for that?

    Anyway, Greg, good job on this thread. .NET is definitely advancing, and I'm glad for the innovations it's driving on the Java side.
  141. ok, summary time...[ Go to top ]

    So what's to be done? ...
    Not only that, but I'd like to see what's the impact when going from single machine to cluster for each of the 3 solutions (.Net, JDBC and EJB), from the code, configuration, design and architecture viewpoints. This is a very important component of the scalability issue, but left behind in almost every discussion which focus instead on just how fast or how much memory a system uses.

    Regards,
    Henrique Steckelberg
  142. Back to Resin[ Go to top ]

    I am still working on the "LuisBench" benchmark, using Resin. I am going to make some statements in this thread that may well be viewed as highly controversial by some, so this is my warning! Please (try) to avoid flaming me, I am perfectly willing to be educated.

    My observations are as follows:

    1) Resin is screamingly fast supporting very, very high TPS rates when running a test with no think time and a few client threads.

    2) When trying to run the same test in a scenario using many clients with a think time (even just a 1-second think time), however, its difficult to get to run consistently without errors. The issues surrounding handling many TCP/IP connections and keepalives are complex. This makes publishing benchmark numbers for this test difficult without lots of tuning time spent. But in my mind this is the realistic way to test the relative perf in terms of handling large concurrent user loads. What are the errors, you ask? They are connection refused errors, as well as many times getting ArrayIndex outofbounds errors in the JSP (a http 500 app error).

    First, the basic tuning knobs seem to be max threads and keepalives/keepalive timeouts (similar to Apache tuning that often needs to be adjusted). Having now read up on Resin, we have already adjusted obvious things like access logging, servlet refresh rates, etc (and switched to the purchased copy that supports JNI for network operations). But as soon as you take a simple page like this, which does not do much, you need to run many concurrent users to max out a 1, 2 or 4 CPU reasonably fast box and get the max TPS rate for the LuisBench. And despite lots of tuning, we still get lots of errors beyond 1,000 users. So if there is a magic pixy dust resin.config file that someone wants to offer up for this, great, we can use it.

    3) So this leads to my (perhaps to some) controversial statement #1: IIS/ASPX needs no such thread tuning/keepalive tuning and the like. I can easily push to 1,000, 2,000, and even 4,000 users, in fact over-saturating the box, with absolutely no IIS tuning and immediately am able to run the ASPX test with zero errors indefinitely under these user loads.

    4) Also, my (perhaps to some) controversial statement # 2, which I made earlier, is that Resin as Luis tested and we are currently testing, is running in a single monolithic process---there is no process isolation between the http server and the app server--in this config it has no concept of application pools which IIS/ASPX offers...hence once the jsp page crashes (which it has on several occaisons under high loads), the entire server is gone and has to be stopped/restarted. With IIS/ASPX, an ASPX app will not bring down IIS since they run in separate processes. As I said, this will always add overhead, but it also adds much safety/fault tolerance. (For those interested, ASP.NET also supports automatic worker process recycling and restarts for failed worker processes). So while its fine to compare IIS/ASPX (process isolated) to Resin with Resin's built-in http server running in a single process, folks need to understand the difference. Also, we could configure to run Resin with Apache or IIS using their plugin, which may (or may not) give it process isolation ala the combos of IIS/ASPX and IBM HTTP Server/WebSphere, etc. This would no doubt slow it down, but its necessary to paint the better more complete picture. Thoughts?

    -Greg
  143. Back to Resin Part II[ Go to top ]

    Since I made a few potentially controversial statements above, I'll give fair warning and make a couple more:

    #3: I think Luis posted his code/results to illustrate that anyone can post anything and call it "a benchmark with full disclosure" even if its bogus/worthless. I am taking the time to actually respond to this post to illustrate that full disclosure in the dev community, especially on a topic as hotly debated as alternative java solutions or java vs. NET, does mean something, becuase people, like myself, Luis, and undoubtedly IBM with the Plants benchmark, do actually look at what's posted and it has to withstand public scrutiny and further testing by others.

    Hence, for a benchmark like Plants, 64-bit CachePerf, WSTest, XMLMark, or even the original PetStore benchmark, we actually take a very large amount of time on tuning, rechecking, re-implenting code, re-tuning, etc etc. This stuff is not just thrown out there. How much time? Many months. And when it is published, there is invariably feedback, some of it on point, some of it off-point, and some of it debatable whether on-point or off-point. Full disclosure matters a great deal, and there is no perfect benchmark.

    But I will stand by Plants as being a useful comparison across a variety of fronts, even if some view the code originally published by IBM as being 'ugly', or the DotNetGarden functionally equivalent app as having some warts. With public scrutiny, these warts gets quickly pointed out and we (and IBM) can address, like we did through 3 iterations of PetShop. In this case though, I am very confident that the warts are not very major, and will contribute (close to) zippo to increased performance for WebSphere. I may be proven wrong, but on the other hand I have probably done more perf testing of this app than all members of theserverside.com combined, so I have a very good feel for what is contributing to the relative perf differences (hey, I warned you my statements might be controversial to some!).

    #4: Back to Resin---another reason I am responding relates to Kirk's point that he believes a benchmark such as Plants or other solutions-based benchmarks are simly too course-grained to be useful for customers. We disagree on this point, in fact I think the opposite, by being course-grained and testing an end-to-end solution they are actually much more interesting to most customers, and in general micro benchmarks often are meaningless to customers, although system engineers at ISVs and architects facing alternative design patterns/api useage will find micro-benchmarks the only way to isolate perf choices and help them improve their product or make the right design pattern/api choice correctly. So (sorrying taking awhile to get to my point #4), another reason I am doing the Resin benchmark is that I made the statement that there are three factors contributing to .NET's overall win in the Plants benchmark: a) data access speed (which has to do with both the driver used and the data access technology and architecture used); b) the relative perf of the JSP/ASPX engines between WebSphere and .NET; and c) the efficiency of the network/web stack in the products. b and c are very difficult to isolate I think, but Luis's benchmark at least isolates b and c from a. So in the spirit of Kirk's test, why not try it to see 1) how fast ASXP/IIS really is for straight processing/server request speeds and 2) compare this as a micro benchmark to WebSPhere.

    #5) So why then, am I comparing it to Resin, if the point is to try to satisfy Kirk's point on isolating perf differences between .NET and WebSphere in the Plants benchmark? I am doing so at the request of Luis and others, becuase they feel WebSPhere is just too heavy-weight a solution for something like Plants. That's fine, but wouldn't benchmarking this on WebSphere also be on point, since that's what the original comparison of Plants was for. With that said, I will gladly continue to work with resin to publish some data, but the code does not work in WebSphere. I assume this is becuase Luis used Generics (supported in .NET 2.0 and JDK 1.5), but WebSphere only runs on JDK 1.4.x, and hence the code is not even runnable in WebSphere. Is this true?

    -Greg
  144. Back to Resin[ Go to top ]

    I am still working on the "LuisBench" benchmark, using Resin.
    This a test from the Wicket Framework source repository, I don't deserve the honour of having the test named after me.

    2) When trying to run the same test in a scenario using many clients with a think time (even just a 1-second think time), however, its difficult to get to run consistently without errors. The issues surrounding handling many TCP/IP connections and keepalives are complex. This makes publishing benchmark numbers for this test difficult without lots of tuning time spent. But in my mind this is the realistic way to test the relative perf in terms of handling large concurrent user loads. What are the errors, you ask? They are connection refused errors, as well as many times getting ArrayIndex outofbounds errors in the JSP (a http 500 app error).

    I can think of two reasons for this:
    1) - I made something dumb in the code and it has a race condition somewhere. No obvious things jump at me, but I will look at it in more detail.
    2) - Resin has some kind of deadlock bug, you could try to use the latest resin snapshot to see if this still happens.
    If you keep getting errors you could try some of the tips in troubleshooting guide, follow the advise on how to get a thread dump, you should most definitely raise this issue with the Caucho folks.

    First, the basic tuning knobs seem to be max threads and keepalives/keepalive timeouts (similar to Apache tuning that often needs to be adjusted). Having now read up on Resin, we have already adjusted obvious things like access logging, servlet refresh rates, etc (and switched to the purchased copy that supports JNI for network operations). But as soon as you take a simple page like this, which does not do much, you need to run many concurrent users to max out a 1, 2 or 4 CPU reasonably fast box and get the max TPS rate for the LuisBench. And despite lots of tuning, we still get lots of errors beyond 1,000 users. So if there is a magic pixy dust resin.config file that someone wants to offer up for this, great, we can use it.
    Not being in a position of seeing what is happening is difficult to offer advise... are you using Jrockit?
    I noticed yesterday that BEA still doesn't have a X64 JVM for windows, are you using windows?
    Did you fallback to using the Sun JVM?
    What are the JVM options you have?


    3) So this leads to my (perhaps to some) controversial statement #1: IIS/ASPX needs no such thread tuning/keepalive tuning and the like. I can easily push to 1,000, 2,000, and even 4,000 users, in fact over-saturating the box, with absolutely no IIS tuning and immediately am able to run the ASPX test with zero errors indefinitely under these user loads.

    But it obviously needs tuning for having higher throughput than Resin for users<1000 ... so what's your point?
    No server in the market is best for *ALL* situations... is that so difficult for you to accept?
    e.g: for static files neither Resin nor IIS holds a candle to Tux or lighthttpd.
    Please accept the fact that IIS is not God's gift to the server market.


    4) Also, my (perhaps to some) controversial statement # 2, which I made earlier, is that Resin as Luis tested and we are currently testing, is running in a single monolithic process---there is no process isolation between the http server and the app server--in this config it has no concept of application pools which IIS/ASPX offers...hence once the jsp page crashes (which it has on several occaisons under high loads), the entire server is gone and has to be stopped/restarted. With IIS/ASPX, an ASPX app will not bring down IIS since they run in separate processes. As I said, this will always add overhead, but it also adds much safety/fault tolerance. (For those interested, ASP.NET also supports automatic worker process recycling and restarts for failed worker processes). So while its fine to compare IIS/ASPX (process isolated) to Resin with Resin's built-in http server running in a single process, folks need to understand the difference. Also, we could configure to run Resin with Apache or IIS using their plugin, which may (or may not) give it process isolation ala the combos of IIS/ASPX and IBM HTTP Server/WebSphere, etc. This would no doubt slow it down, but its necessary to paint the better more complete picture. Thoughts?

    Like I said above you can have all of this with Resin (I'm beginning to sound like a sales guy from Caucho - I'm not)
    See the ping parameter in resin.conf and how to run with a Backup-JVM
    In one off my setups I have resin instance for each webapp and each instance running with a backup-jvm...zero problems so far.
    If for some reason this doesn't work for you please contact Caucho and let them know what the problems are.

    Regards,
    Luis Neves
  145. ok, summary time...[ Go to top ]

    Hi Jason,

    Thanks for the positive words. A couple of notes and my thoughts:
    So to summarize:This is a pretty real comparison of 2 technology stacks, unfortunately it's Microsoft's soon-to-be-released next-gen stack against a stack that is clearly inappropriate for the task at hand.
    Actually, all of the .NET code tested is officially released and available for free download on MSDN--nothing tested on .NET side is pre-release as of 11/7/2005, the date of publication.

    The only thing not released yet is the new MS SQL Server JDBC driver (Jan timeframe for release, beta2 available on MSDN now). The only implementation therefore that used any beta code was the Java/JDBC version running against SQL 2005.
    Shame on IBM for putting this out there to be used against the Java community, I would have thought we'd learned our lesson with the Petshop fiascos. Greg has done an admirable job being professional and informed on behalf of Microsoft, and has remained factual even in the face of some bitter and non-professional responses. Way to go.Unfortunately, you're testing the wrong things, as has been pointed out. I would hope no-one would build an app the way IBM did to meet the set of requirements for this application. Even your JDBC version, while better, is incurring the penalty of running in an EJB server with the services implemented as SLSB's and has to run in WebSphere, which is a burden in and of itself :-)

    Couple of thoughts here. I would not be too harsh on IBM for PlantsByWebSphere. Its much cleaner than PetStore was, first of all. Also, its a simple app but still meant to demonstrate concepts to be used in larger apps. That neither makes it inappropriate for a sample, or inappropriate for a benchmark, in my mind. A more complex app may have many more pages, and more complex backend business logic to be sure, but the basic data access might look exactly the same for 80-90% of the app. So it illustrates common building blocks using a simple "building"---but those same building blocks would/could be used for much bigger "buildings". So although I agree the solution they present is too heavy weight for the problem it solves, it still represents a very common scenario for both simple and larger customer apps: a data driven Web app. Their choice to use EJBs is understandable, that's the technology they want to highlight for so many WebSphere features are based on this. On the other hand, WebSphere Express (which is what we priced for the price comparisons) is likely used by a lot of customers for single server, simpler apps. The basic architecture of PlantsByWebSphere is meant to illustrate concepts that are encouraged to be applied in larger web-based apps: a servlet architecture activating stateless session beans that front end CMP-EJBs.

    I know there is tons of disagreement as to when and when not to use EJBs for a Web-based scenario, as this thread points out, and this is the reason we created a light-weight JDBC version with no EJBs to add to the comparison matrix and paint a more complete picture. I guess whether this benchmark is useful/interesting depends on the customer. If they are currently a WebSphere customer using WebSphere (any edition) for basic data-driven Web apps, even if much more complex/bigger app than Plants, they may find the comparison interesting, inclusive of the pricing data. Even without the .NET data the comparison for such customers may be useful considering it does highlight that EJBs may not be the best solution for all cases.

    Also, in terms of the benchmark, some very basic data-driven Web apps need to handle very large concurrent loads, and the comparison highlights relative perf in such a scenario since we are driving very large loads against the servers. Complexity of the app and the concurrent loads they need to support are at least somewhat independent....some complex apps may only have small concurrent loads, and some simple apps may need to handle very large concurrent user loads.

    As for WebSphere being too heavy weight a solution for this type of app, I would tend to agree the customer could get more bang for the buck with a different java solution, or, of course, I will add: .NET! :-).
    Also, the impression I got (without looking through the materials, admittedly) is that this is running on a single app server machine? What kind of deployment is that? The problem is that this doesn't prove ANYTHING about the scalability of an application.

    Yes, a single server. I still think this is a useful showcase of single server scalability. And actually, if we were to cluster this type of app with .NET, we would use the built-in Windows Load Balancing which is simple network load balancing solution with failover. There is really no need for replicated caching in this app (even if we introduced caching, in this app it would likely not need replication, simple time-based dependency for product info would suffice as long as product data is refreshed once before checkout).

    We have tested clustered scenarios in the past (as we did at NetWork Computing in the shootout I referenced many messages earlier); it scales linearly for a stateless app like this. One difference though in a clustered config would be the treatment of session state. With .NET, you would either goto a dedicated state server outside the cluster, or to a dedicated database (preferred) storing state. This is just a config option. IBM offers a dedicated database or replicated session state (which we don't do becuase in general we believe trying to constantly replicate between servers adds to brittleness/data integrity issues, and there is lots of overhead involving state replication anyway). Even IBM recommends for their state replication only replicating every 30 seconds or so (I believe I read in their RedBook for perf reasons, and this leads to a pretty big window when state is not in sync across servers, doesn't it?)

    So I agree, I think a clustered test for an app like Plants would add a great deal to the comparison, but this does not per se make the single server comparison we did completely un-interesting, since the same code would be running largely independently across servers in the clusters with simple load balancing.
    What do you do when your 8x box can't handle the load? Scalability in a cluster is a whole other beast, and designing and building to be scalable in a cluster often has detrimental effects on the single-box performance of your application. As a major example of this, think about caching.Caching on a single box is simple: just save some data in memory and use that instead of going to the database. If the app changes the data, then clear the cache when you write to the database and load it next time. Very simple, and very fast. Unfortunately this is completely useless when you go to a cluster. Adding the second machine to the cluster makes this method of caching dangerously incorrect as you will very quickly notice as you get dirty data in your database. Caching complexity goes from zero to 100 WITH THE SECOND BOX ADDED TO THE CLUSTER. From there it's only increments of how much stale data will corrupt your database, the damage was done as soon as you went into a cluster. So what's to be done? Well, you've basically got a couple of options:1 - Don't cache anything (or at least anything transactional). Load it from the database every time. This performs poorly per box, but it scales ok for the first couple of boxes, until your DB starts to fall down. Then you start thinking about upping your DB hardware or really complex and expensive solutions like Oracle RAC.2 - Use a clustered cache (like Tangosol Coherence) to allow you to have transactional caches which maintain correct data in-memory across the cluster. This adds a little bit of overhead over the simple local cache, but is much faster than not caching and maintains correctness of your data. It does add to your per-CPU cost over the single box, local cache design, but it probably saves you over the cache-nothing solution as you scale.

    It depends on what you are caching and whether the cached data can be out of sync or stale. I suppose cache policy is a huge architectural decision, very important. In many cases, like with Plants, having out-of-sync caches for limited durations may be perfectlly tolerable. Take the case of Match.com, for example, the largest online dating service in the world which happens to support very, very large concurrent users loads running on .NET in a clustered environment. For most users reading data on other users, they likey (I don't know for sure) use the .NET Cache API on a timeout basis..I don't know how long they cache stuff for however. I would imagine for an app like this, or for caching other infrequently updated data it really does not matter if there is a timeout-based cache expiration or a fixed time at which the cache is invalidated across servers (as long as some trigger event can invalidate the entire cache if necessary, which is easy to do in .NET). Hence, they don't in these scenarios need any cache replication whatsoever. This makes the architecture and solution much easier, and this certainly applies to many large scale apps or portions of those apps.

    For your more complex scenario where you need to make sure the cache across servers is always in complete sync, a couple of solutions on the .NET side. Beginning with .NET 2.0 and SQL Server 2005, we support something called query notifications, so every cache will automatically invalidate if an individual row in the database changes. I have not used this yet. We do table-based invalidations with SQL Server 2000. For this type of solution on Oracle, there is code available that could be used that is trigger based, but it would need to be ported to work with Oracle:

    http://msdn.microsoft.com/msdnmag/issues/03/04/WickedCode/

    Or, there are products such as Tangosol (which Cameron can comment on in terms of support for .NET which the Web site says, or a pure .NET solution called NCache:

    http://www.alachisoft.com/ncache/features.html

    I'd like to see this as a real test of an enterprise solution. How about a 10 machine cluster running against a DB? Cameron's offered some engineering time to get the Java version going. I'll offer up some design patterns for non-EJB declaratively transactional services... We can make it run in just a Servlet container, then you can test it in a cluster of WebSphere, Resin, and Tomcat servers for comparison. How does the .NET solution handle scaling out across a cluster like this? How do you maintain a transactional cache, or do you fall back to the database for that?

    See above for some of the possible approaches for handling in .NET. Its a tempting offer, if Cameron offers up a solution, will it be Tangosol-based? That would be cool but maybe better to compare with something like .NET running with NCache, I am not sure. Or would you be ok with us using SQL Query Notifications?

    Anyway, Greg, good job on this thread. .NET is definitely advancing, and I'm glad for the innovations it's driving on the Java side.

    Sure. Competition always tends to drive innovation in both directions. When Sun did their first XML mark benchmark (they wrote the .NET/C# code), we got focussed on our XML parse (DOM and pull) performance and got in many cases a 3x perf gain with .NET 2.0. In the end this is good for customers. We learn a lot about our own products as well in comparisons/reviews/benchmarks, and are always looking for the key areas we need to improve. Suffice it to say when it comes to software, there is still plenty of room for improvement by just about every vendor in existence.
  146. ok, summary time...[ Go to top ]

    I guess whether this benchmark is useful/interesting depends on the customer. If they are currently a WebSphere customer using WebSphere (any edition) for basic data-driven Web apps, even if much more complex/bigger app than Plants, they may find the comparison interesting, inclusive of the pricing data. Even without the .NET data the comparison for such customers may be useful considering it does highlight that EJBs may not be the best solution for all cases.

    I'll venture a guess that the poor souls stuck on the projects where the architecture involves CMP on WebSphere have little hope of changing course now, as much as they might like to. This benchmark isn't for them, though, it's for development managers and IT executives who push architecture decisions based on whitepapers and benchmarks :-) For that, at least, I think this is good, because it may counteract some of IBM's sales guys buzzword bingo around EJBs. Hopefully they'll figure out from your benchmark that they can still have the job safety and platform independence of Java without the pain of CMP.
    Yes, a single server. I still think this is a useful showcase of single server scalability. And actually, if we were to cluster this type of app with .NET, we would use the built-in Windows Load Balancing which is simple network load balancing solution with failover. There is really no need for replicated caching in this app (even if we introduced caching, in this app it would likely not need replication, simple time-based dependency for product info would suffice as long as product data is refreshed once before checkout). We have tested clustered scenarios in the past (as we did at NetWork Computing in the shootout I referenced many messages earlier); it scales linearly for a stateless app like this.

    Sorry, I should have clarified. I don't build apps like this. I build large-scale financial applications and stale data is... frowned upon... Clustered caching is somewhat important to me, as I've seen the impact of the "cache-nothing" scenario.

    I think a lot of JEE developers build apps where data can't be stale and clustering is required for both fault tolerance and scalability. For us, you'll need a better answer.
     One difference though in a clustered config would be the treatment of session state. With .NET, you would either goto a dedicated state server outside the cluster, or to a dedicated database (preferred) storing state. This is just a config option. IBM offers a dedicated database or replicated session state (which we don't do becuase in general we believe trying to constantly replicate between servers adds to brittleness/data integrity issues, and there is lots of overhead involving state replication anyway). Even IBM recommends for their state replication only replicating every 30 seconds or so (I believe I read in their RedBook for perf reasons, and this leads to a pretty big window when state is not in sync across servers, doesn't it?)

    A DB for session state is unacceptable, IMO. The time to push things to the DB and pull them out would quickly add up, especially since you can't cache anything transactional and every cluster node is pounding the DB. This is another area where a clustered cache can help out, though. Replication doesn't have to be heavy or brittle if it's done correctly. If you've got a cache you're trusting with your transactional data, surely session data is safe too?
    So I agree, I think a clustered test for an app like Plants would add a great deal to the comparison, but this does not per se make the single server comparison we did completely un-interesting, since the same code would be running largely independently across servers in the clusters with simple load balancing.

    Well, let's say un-interesting to me, then. I can't run with simple load balancing and cache-invalidation. Stale data isn't an option.
    It depends on what you are caching and whether the cached data can be out of sync or stale. I suppose cache policy is a huge architectural decision, very important. In many cases, like with Plants, having out-of-sync caches for limited durations may be perfectlly tolerable. Take the case of Match.com, for example, the largest online dating service in the world which happens to support very, very large concurrent users loads running on .NET in a clustered environment. For most users reading data on other users, they likey (I don't know for sure) use the .NET Cache API on a timeout basis..I don't know how long they cache stuff for however. I would imagine for an app like this, or for caching other infrequently updated data it really does not matter if there is a timeout-based cache expiration or a fixed time at which the cache is invalidated across servers (as long as some trigger event can invalidate the entire cache if necessary, which is easy to do in .NET). Hence, they don't in these scenarios need any cache replication whatsoever. This makes the architecture and solution much easier, and this certainly applies to many large scale apps or portions of those apps.

    Right, I can point you to 3 or 4 good opensource Java caches that can do this plus cache invalidation across a cluster. These are pretty good, and good enough for non-transactional data that can afford to be a few seconds out-of-date. I've used them for non-transactional data before, and they work. Unfortunately, their usefullness is much less when you need a transactional cache.
    For your more complex scenario where you need to make sure the cache across servers is always in complete sync, a couple of solutions on the .NET side. Beginning with .NET 2.0 and SQL Server 2005, we support something called query notifications, so every cache will automatically invalidate if an individual row in the database changes. I have not used this yet.

    This invalidates caches on the application servers? That's nifty, and all, but is it transactional? Will the data be committed to the database before the caches are invalidated? Are the caches XA transaction resources and managed by a transaction manager?
      We do table-based invalidations with SQL Server 2000. For this type of solution on Oracle, there is code available that could be used that is trigger based, but it would need to be ported to work with Oracle:http://msdn.microsoft.com/msdnmag/issues/03/04/WickedCode/Or, there are products such as Tangosol (which Cameron can comment on in terms of support for .NET which the Web site says, or a pure .NET solution called NCache:http://www.alachisoft.com/ncache/features.html
    I'd like to see this as a real test of an enterprise solution. How about a 10 machine cluster running against a DB? Cameron's offered some engineering time to get the Java version going. I'll offer up some design patterns for non-EJB declaratively transactional services... We can make it run in just a Servlet container, then you can test it in a cluster of WebSphere, Resin, and Tomcat servers for comparison. How does the .NET solution handle scaling out across a cluster like this? How do you maintain a transactional cache, or do you fall back to the database for that?
    See above for some of the possible approaches for handling in .NET. Its a tempting offer, if Cameron offers up a solution, will it be Tangosol-based? That would be cool but maybe better to compare with something like .NET running with NCache, I am not sure. Or would you be ok with us using SQL Query Notifications?

    I'm fine with you using whatever you would recommend to developers building on .NET 2.0, as long as it meets the requirements of not having stale data for those tables which are identified as transactional data which cannot be out of date. It's all about delivering a solution. Of course, I'll feel free to point at you and laugh if you build a solution which can't be maintained over time because it's got so many hacks thrown in for performance :-)
  147. probably, REMOTENESS is a key ...[ Go to top ]

    I've suspected from the very beginning that there might be an extra serialization was occasionally (heh!) left or introduced into the java code.

    I hav'nt deeply investigated the code, actually I listed only included in the arcticle itself ... But, folks, see - data cache is remote object and (I suppose) is not even EJB, which interactions WAS could possibly optimize with local i-faces.

    ...
    dm1 = (DistributedObjectCache) ic.lookup("services/cache/instance_one");
    ...

    while Cache in .Net seems to be a trivial singleton

    ...
    Object cacheddata = Cache.Get(selectid.ToString());
    ...

    Needless to say, what side effects you could expect of this
    'oversight' :-))
  148. Distributed caches that are transacted are not for vast majority of apps (even most large scale web apps that need massive scale). The vast majority of data-driven web apps do not need this capability, and using such as approach needs to be thought through. If all the issues have been solved in the J2EE community with products like Coherence, great. A big one being whether the database is shared or not---in most cases it will be and hence the invalidations would need to be driven by the DB itself if updates could be ocurring outside the app using the cache. I will quote one observation from an interesting thread on distributed transacted caches on theserverside:
    DB Phobia Syndrome?
    Posted by: Harold Russell on October 29, 2003 in response to Message #99890 6 replies in this thread
    As an architect who is deeply involved with both database and J2EE technologies, I found it perplexing that a large part of the J2EE community seems to be suffering from a "DB Phobia syndrome" i.e. choose to ignore or plain refuse to consider some good and mature database technologies as part of a solution.

    Caching is a case in point. I'm sure most architect knows that data caching is available in most if not all commercial databases. However most J2EE oriented architect choose not to consider it at all. I quote from the article: "If possible, I try to stay away from having to cluster the database machines". My question is why not? I sure hope its not because this solution is not in the J2EE domain.

    Another DB phobia syndrome phenomenon in the community I observed is the use of SQL. Many in the community seems convinced that smart developers who are well versed in complicated OO technologies cannot master the skill of writing good efficient SQL.

    I'm excited at how J2EE has come along in the last 2-3 years as a standardized middle tier platform but with regards to data access and database, I cannot help wonder if technology as a whole has regressed..

    -Greg
  149. Distributed caches that are transacted are not for vast majority of apps (even most large scale web apps that need massive scale). The vast majority of data-driven web apps do not need this capability, and using such as approach needs to be thought through. If all the issues have been solved in the J2EE community with products like Coherence, great. A big one being whether the database is shared or not---in most cases it will be and hence the invalidations would need to be driven by the DB itself if updates could be ocurring outside the app using the cache. I will quote one observation from an interesting thread on distributed transacted caches on theserverside:
    DB Phobia Syndrome? Posted by: Harold Russell on October 29, 2003 in response to Message #99890 6 replies in this thread As an architect who is deeply involved with both database and J2EE technologies, I found it perplexing that a large part of the J2EE community seems to be suffering from a "DB Phobia syndrome" i.e. choose to ignore or plain refuse to consider some good and mature database technologies as part of a solution.

    Caching is a case in point. I'm sure most architect knows that data caching is available in most if not all commercial databases. However most J2EE oriented architect choose not to consider it at all. I quote from the article: "If possible, I try to stay away from having to cluster the database machines". My question is why not? I sure hope its not because this solution is not in the J2EE domain.

    Another DB phobia syndrome phenomenon in the community I observed is the use of SQL. Many in the community seems convinced that smart developers who are well versed in complicated OO technologies cannot master the skill of writing good efficient SQL.

    I'm excited at how J2EE has come along in the last 2-3 years as a standardized middle tier platform but with regards to data access and database, I cannot help wonder if technology as a whole has regressed..

    -Greg

    You're right. There are cases where an "architect" has some odd database phobia, but not a good architect. I categorize that as "architects who don't know jack, but know how to play politics" syndrome. I like databases and enjoy using them, but not everything should be a trigger/stored proc/materialized view. Clearly, there are cases where invalidating from the DB makes sense, but there are plenty of cases where it doesn't. It would be great to see Microsoft finally address those areas.

    Plenty of developers abuse technology and use it inappropriately. I know I've made those mistakse in the past. A while back I saw a developer attempt to use the DB for session replication with horrendous results. This goes back to a trading application. When the developer tried to demo the framework he had built using .NET Remoting, the entire system ground to halt and was told it was unusable. In this specific case, there's a constant stream of changes and those changes must propogate across the cluster, so that all active sessions get "instantaneous" updates. If microsoft provided a solution for this problem, it would have saved me a years worth of headaches. Having to explain to a developer why that approach was flawed for the given requirements caused a ton of grief. In this area, I think MS could do a better job of explaining when a DB centric approach is inappropriate. I know I'm whining here, but when I challenged the developer to prove the DB approach works, his response was "that's what Microsoft recommends."

    peter
  150. Distributed caches that are transacted are not for vast majority of apps (even most large scale web apps that need massive scale). The vast majority of data-driven web apps do not need this capability ..

    It totally depends on the application. While some of our customers' applications make use of the transactionality, most simply rely on the coherency. In other words, the main thing that people need is a guarantee that regardless of what server a request lands on, it's going to get the same result. That's what our Coherence software provides, and it does it in a way that 10 or 100 or 1000 servers can run the application with linear scale-out (i.e. no database bottleneck).

    Peace,

    Cameron Purdy
    Tangosol Coherence: Clustered Shared Memory for Java
  151. Price performance comparison[ Go to top ]

    I just noticed the header of the chart "Price performance comparison" in the "study". It actually says, "Lower bar is better". Goes to show what audience MS is targetting with this study.