How would you make the PetStore application high performing? A new article by Dion Almaer, "Making a Real World PetStore", talks about the history of the pet wars, and puts forth a few ideas on enhancing the PetStore.
What would you change? Would you use stored procedures?, denormalize the application?, use caching tools?, high performing CMP?
Read "Making a Real World PetStore" Here
If performance is the only goal then the techniques are pretty obvious. However, in the real world maintainability is far more important than performance, except where it seriously gets in the way of performance. Many of the patterns that Sun has put together help a great deal with this, but some of the implementations of these patterns do suffer from both performance and maintainability issues.
I've been working on the Open For Business Project for nearly a year now and we have run into many of these issues. We spent some time looking at the Pet Store and various patterns documents and eventually came up with some variations on the theme. Basicly pattern implementations, but with a few twists.
Our Control Servlet and the rest of our webapp framework are somehwat similar to the ones found in the Pet Store, with a few additions and greater flexibility.
For persistence we started with CMP Entity Beans, but with around 400 entities in the project at the time that became not only slow running, but very difficult to maintain. Our first approach to solving that was to use a code generator with an XML file describing the structure of the entities.
That helped, but having a million lines of code whether generated or not is a bit of a pain, especially when you want to modify the code and try to synchronize everything.
So, our current persistence stuff is done through a tool we call the Entity Engine. We use nearly the same XML file that the code generator used, but now instead of generating CMP entity beans and deployment descriptors, it simply drives a dynamic API and GenericValue objects to provide the basic persistence functionality needed without any persistence code that is entity specific.
This performs nearly as well as straight JDBC calls (the Entity Engine just generates SQL and hits the database through JDBC), but provides a nice insulation between the application and the database. In addition to this it offers other features that help with things like keeping the database definitions in synch with the entity model XML file, and provides an API to interact with these definitions. To play well with the other J2EE components it uses JTA for transaction management and support various methods (including JNDI) to get Connection, and UserTransaction, and TransactionManager objects.
There are a number of other tools and components to make life more convenient. You can check them out at www.ofbiz.org, or the SourceForge site at sf.net/projects/ofbiz.
So, to answer the original question, for real world use, the OFBiz infrastructure is what I would do to the Java Pet Store.
We had this doubt in the arquitecture definition phase of our bank project (still going on), if use PL/SQL (oracle) vs ejb, , because pl can ne very optimized with its for -all commands, and others. So, to make a test using the ejb technology, i made a reading of a importation file with 1300 lines, and used MDB to make it parallel (its important that the order didn't matter). The PL/SQL took 2 and a half seconds to read, validate and process all file. We thought that even if took a half of minute in JAVA, it would be worth of if we mantained all the logic in java components and independent of the database (in the case, Oracle). For our surprise, to make the same work, it took astonishing 6 seconds !! Ande the client was not in the same machine of the DB, as in the PL case.
If want more details, put on this thread or email me.
Emerson Cargin, Brazil
echofloripa at bol dot com dot br
BTW, Oracle finally released the source code for their "28 times faster than .NET" J2EE petstore: http://otn.oracle.com/tech/java/oc4j/content.html
Here are some of the changes they made to Sun's implementation:
* In some cases, they are doing lazy loading by only loading detail information if it is really needed. The did this by changing the DAO implementations. That is probably the most contraversial, if the .NET version is not doing lazy loading. But, I assume that it will have the same run-time behavior.
* I didn't look _extremely_ closely, but it seems like they did _NOT_ remove the leading or trailing '%' from the queries using the LIKE operator, like Microsoft suggested. They did re-write that query though.
* They changed (most/all) the EJB's so that they use isModified() to avoid unnecessary database updates. Simple change that probably made a big difference.
* It seems that they are rewrote all the database connection code to keep the DB connections very short lived, and also removed most of the uses of prepared statements in favor of hard-coded statements (since the connection is now only open for the one execution of the function, it doesn't make sense to use a prepared statement). I don't know if that is an Oracle JDBC-specific optimization or what. I don't know a lot of about the JPS implementation but I would have expected it to use some kind of connection pooling mechanism. Apparently it does not.
* They must have paid a high-school student a few bucks to rewrite all of the string-handling code to use StringBuffer instead of String, and remove as many extreneous string operations as possible (e.g. "asdf" + "jkl;" -> "asdfjkl;"). In other words, just cleaning up some slop from the original code.
* No stored procedure usage. I'm a little suprised they didn't consider stored procedures to be fair game after seeing the .NET implementation using them. I think they were trying really hard to keep the changes "trivial" and/or portable.
Actually, I think that javac optimizes "blah" + "blah" to "blahblah" on its own (in the same way as 1+1 gets optimized to 2) as it's a simple Compilers 101 type optimization.
In the same way I think that the chained + operator is optimized now not to create X instances (again, a reasonably simple compile time optimization - maybe late Compilers 101 :) ). No idea why it wasn't in the original implementation though - especially since it's the only place where + operator is allowed on objects.
We were actually discussing this a few months ago - not making a FASTER PetStore but rather making it a BETTER example application.
Call me stupid - and you might anyway - but does anyone actually follow the PetStore when building J2EE applications? It always strikes me as an application that was built by a first timer - not someone who has a lot of experience building J2EE apps.
What we did was rewrite the PetStore (renamed PetSoar - not PetSore!) to use various Open Source projects and improve the overall quality of the code.
Our changes included:
- Using WebWork
for the MVC instead of the god-awful Sun screens crap
- Rewroting the presentation layer using SiteMesh
to handle all navigation and decoration of in page elements
- Changing the BMP beans to CMP (although right now I would potentially also use the OFBiz
entity engine which I'm quite enamoured with for portability reasons)
- Used XDoclet
to generate all the EJBs, deployment descriptors etc (so our PetSoar was much more portable)
- Cleaned up a lot of the internal code so that it all operated more cleanly than the existing PetStore (I can't remember the exact changes there)
Basically my biggest complaint was not speed. PetStore was never built for speed - noone cared until Microsoft decided to pick a fight over it - it was built as an example app. Even for that purpose though I don't think it works very well (IMHO).
Our overall aim was to rewrite the PetStore using various successful Open Source projects to show how real applications can be built rapidly using pre-existing (and free!) enterprise components and tools.
Food for thought.
(disclaimer: I do participate in most of the projects used - and there are some substitutes available - although of course none as good as the projects chosen! ;)).
Mike, where can I find a copy of PetSoar, if it is available.
I'd also be interested in seeing the code for petsoar
could be interesting
Me too - where is petsoar?
Yes where is the PetSOAR!...no man - just kidding!
We are asking you kindly to show us the PetSOAR thanks
Brian, I have a question about the following:
"It seems that they are rewrote all the database connection code to keep the DB connections very short lived, and also removed most of the uses of prepared statements in favor of hard-coded statements (since the connection is now only open for the one execution of the function, it doesn't make sense to use a prepared statement). I don't know if that is an Oracle JDBC-specific optimization or what. I don't know a lot of about the JPS implementation but I would have expected it to use some kind of connection pooling mechanism. Apparently it does not."
I thought prepared statements make the database server compile and cache the query so it can be reused by replacing the parameters so it should make it faster. What did you mean by hard coded statements?
"I thought prepared statements make the database server compile and cache the query so it can be reused by replacing the parameters so it should make it faster. What did you mean by hard coded statements?"
Yeah, I'm not sure why they did what they did. Searching through the source code, there is now only one place where they are using a prepared statement.
It seenms that Oracle removed all knowledge of connection pooling from JPS. Presumably, the connection pooling is happening transparently at the DataSource level, so that the application opens, uses, and then closes the connection for every query/statement. This means that you basically can't use a prepared statement with the connection pool unless you have a map (connection, query) -> preparedstatement.
Also, it could be that the Oracle database is doing the pre-parsing itself. In Oracle9i there is something like "non-exact statement matching" where the database can do a "quick parse" of a statement and match it to a statement that it already created an execution plan for, as if you used bind variables instead of constants.
If either or both of the above are true, then I suppose we should be impressed since they certainly made connection pooling simple.
If somebody is interested, I have created a "cvs diff -u" patch of the differences between Oracle's version and Sun's original version. You can get it by emailing me: [email protected]
"* It seems that they are rewrote all the database connection code to keep the DB connections very short lived, and also removed most of the uses of prepared statements in favor of hard-coded statements (since the connection is now only open for the one execution of the function, it doesn't make sense to use a prepared statement). I don't know if that is an Oracle JDBC-specific optimization or what. I don't know a lot of about the JPS implementation but I would have expected it to use some kind of connection pooling mechanism. Apparently it does not."
If the connections are pooled they are not actually closed and reopened each time. So the prepared statement remains with the connection that used it. Opening and closing connections is expensive. So the pooled connections stay open across calls.
Hi, I recently worked on caching of dynamic pages (JSP/Servlet) for a customer and it resulted in some dramatic speed-ups. Caching matches store applications well, were product list and product detail pages are often not unique. We worked on simple request matcher that would recognise that a request could be satified from the cache and then just blurted out the pre-generated page. To cache the page the JSPWriter or ServletOutputStream would be overridden with a classs that wrote the output as normal as well as making a copy as a byte array and putting it the cache. Dynamic JSP includes and forwards were difficult to manage as we lost control of the outputstream, the new decorators would probably help this.
We did not have the opportunity to get into the servlet API code within the application server, if we could we could have cleaned alot of the problems up.
Either self written or as part of the application server page caching makes sense.
Did you write this from scratch - or did you use one of the stable open source caching alternatives such as oscache?
I would like to see this PetSOAR app as well. Would you mind sharing?
From scratch, when we started we were just playing looking for more performance. We demonstrated very quickly that caching could save a lot of CPU. In fact as the app is running on WebSphere we will probably use IBM's built in caching "Dynacache" for production. Dynacache is very nice but under used, it also integrates with their WebSphere Edge server, serving the cached Servlets and JSP's from the front end reverse proxy. The application server creates the cached page send it to the Edge server, which then servs it and when the cache in invalidated the Edge server is told about it and the new page is sent to it by the Application Server.
A few points.
What is the Pet Store ? - it's really a object-based application (rather than a service-based one). As, (in my experience), 90% of applications are. In other words it requires a data-centric solution.
The natural solution for EJB is to use Entity Beans, unfortunately this exposes the issue that EJB is not an out-of the box 'Real World' solution to 90% of problems.
At www.javelinsoft.com we have code generated Petstore using Jgenerator, and have used the following optimizations, to get Entity Beans up to speed. All these optimizations would be economically unfeasible without a code generator.
(1) Client Side Object Stubs to encapsulate/cache Remote and Local EJB Objects, to save loading them by PK.
(2) Client Side Session and Home Stubs to encapsulate/cache Remote and Local EJB Homes and Session, to save discovering them.
(3) Lazy Loading of relationships on the Client Side.
(4) Delaying CRUD actions by pssing the Client Side Object stubs to Client Side Session stub CRUD functions, and getting them to optimize the behaviour.
(5) Serializing Entity Beans/Value Objects back and forth rather than setting/getting every value.
(6) Client Side (on save) and Server Side (on load) Validation to reduce the overheads.
(7) Modified flags on the Client and Server Objects to stop loading and saving when it's not needed.
(8) Optimistic Locking on the database to save loading the complete object when it's not needed.
(9) Having the Client Side stubs handle Read-Only Tables and Views (i.e. take the UD out of CRUD).
After all this exaustive optimization we've come to realize that it's not the EJB internals that really matter it's how useable they are.
Pet-store also exposes a common problem with Sun, that they focus too much time on design and too little on real-world use-cases. External requirements constrain internal design, NOT internal design constrains external requirements.
We have to rememeber that EJB is the first widely adopted distributed object model, and it won't be the last.
We (Javelin Software) have put up an on-line (2-tier)version of the Petstore :
This can be compared with an on-line (2-tier) demo of the Northwind database :
Northwind is larger in terms of data and tables. Northwind also includes more types, indexes and complex keys as well as views. Northwind forced us to upgrade several aspects of our code generator (e.g. views). Petstore as a use case was really a waste of time.
An unoptimized EJB server struggles with even a tiny dataset in Petstore. MS Access ran Northwind so quickly on a 486 we thought we'd all died and gone to heaven - those were the days.
- Robin :)
Some of my comments in the article have turned out to be wrong, so I wanted to clarify some information on the PetShop.
1: Stored Procedure Use in PetShop
If you look at the SP code in the .NET PetShop, it actually is just a simple wrapper on data access, and no business logic is placed in there. The only advantage here, is that the SQL query plan is compiled once only.
The ASP.NET layer talks to .NET Assemblies, which in turn talk through to the backend tier.
.NET Assemblies are the new way of doing things, and can be accessed via SOAP (from external sources), .NET Remoting (from internal sources), and DCOM is still supported of course.
If you wanted to deploy an app in a clustered environment, the recommended way of doing so would be:
Use Network Load Balancing (NLB). Turn on this feature on each box, configuring the cluster ip/domain name, and have the clients access the cluster.
IMHO The biggest problem is that Microsoft continues to use the Petstore example as the reason why managers should invest in .NET ... until that can be stopped, they will continue to profit from what is simply a lie.
Hello, what amazes me is all of these "realworld" examples have very patterns and strategies 4 the execution of the their respective business logic efforts but no "realworld" security beginning w/ userid password not sent as clear text. and, also no provision 4 running secured path servlets/jsp/ejb's under ssl. if u want to run "realworld" u have to start w/ the security. thanx, david brown
We are doing a project for one of the major insurance companies. We have an architecture that is pretty standard: JSP->Command class->Sesion Bean->Business Objects->Entity Beans (or database using JDBC).
The performance optimization techniques we used are:
1. Didn't use SPs because of portability reasons. We use a mixture of BMP and CMP. Wherever we fetch multiple records from the database, we use CMP and create business objects directly from these records and return a collection of business objects instead of a collection of entity beans. We use CMP for inserts, updates and where we know the query returns a single record.
2. We optimize BMP by doing as many joins as possible and minimizing on the number of database calls we make.
3. Our services (on session beans) are coarse grained thereby giving some performance improvements.
4. The services take in Data Transport Objects as parameters and return Data Transport Objects. These are purely data objects (as opposed to business objects or Entity Beans) which are smaller and quicker to transport over the wire.
Mainly we tries to optimize by reducing the network traffic (back and forth database calls or back and forth service calls)
There are too many EJB files in Java's PetStore.
Is it possible to package them together? And PetStore's deep directory architecture makes it difficult to management.
At presetn .Net's solution is clean.