TMC Announces TORPEDO O-R Benchmarking Initiative

Home

News: TMC Announces TORPEDO O-R Benchmarking Initiative

  1. The Middleware Company today announced the Testbed for Object Relational Products for Enterprise Distributed Objects (TORPEDO). TORPEDO is a TMC-funded initiative to help developers and architects understand the effectiveness of SQL generation of object-relational technologies.

    TORPEDO is not a performance analysis. It attempts to create a level approach to analyzing the effectiveness of SQL generation of an Object-Relational solution. This release of TORPEDO only measures the effectiveness of generated SQL and the resulting number of hits, which could be an indication of the effectiveness of analysis, caching ability, and SQL generation technique. TORPEDO does not measure throughput, resiliency, or response times.

    TORPEDO includes an implementation of the specification with plug-ins for execution on different OR technologies including EJB CMP, Oracle TopLink, JDO, and Hibernate. Running the TORPEDO implementation on different OR technologies and analyzing the generated SQL (and the number of hits) gives a relative comparison of the capability of an OR generation engine.

    TORPEDO results can be submitted by anyone. Results can be Verified and Non-Verified. Verified results are double-checked by a review committee to ensure that the configuration yields the results submitted, concurrency is upheld, the specification is adhered to and that the code used to generate the results has not been altered.

    Early Results

    There are some initial results posted. You will see a signifcant range in the results of some products as different submitters use TORPEDO to demonstrate the range of possibilities involved with tuning (or running something out of the box).

    A word of note for interpreting the results: TORPEDO shows the total aggregate hits for a number of different tests run. While the aggregate number is a score used for comparison, it is simply the sum of hit-counts from multiple specialized tests that should be compared individually.

    The numbers from some products show significant ranges on their submissions. As a sample, TMC provided two BEA WebLogic Server SP2 CMP submissions (one out-of-the box un-tuned and another tuned by our architects) to demonstrate the range in possible results (although having a submission from BEA would be even better).

    See the TORPEDO results page for an explanation of the configurations and structures used to achieve these results.
    Solarmetric Kodo 3.2 RC1: 22 hits
    Hibernate 2.1.6 (two submissions): 22 hits, 34 hits
    BEA WebLogic Server (CMP) 8.1 (two submissions): 26 hits, 113 hits
    Oracle TopLink (POJO) 9.0.4 (two submissions): 30 hits, 97 hits
    We'd like to have some more results to share with the community. As such, anyone (individuals or vendors) can submit results for free that will be posted as Non-Verified.

    Additionally, input on the specification and current implementation are invited on the TORPEDO site at MiddlewareRESEARCH.com/torpedo.

    Threaded Messages (46)

  2. Congrats TMC![ Go to top ]

    Congratulations to the guys at TMC for getting this out there -- it's always a big risk to publish a comparison tool. Hopefully, this will become something that will help educate the community about O/R mapping and drive some discussion about what is relevant in an O/R mapping product.

    As a vendor of an O/R mapping product, we're thrilled to see independent initiatives like this that help combat the biggest O/R mapping misconception: the classic "but-can-it-produce-good-SQL" question. I hear this question time and again from O/R mapping neophytes. It'll be fantastic to be able to lean on TORPEDO to demonstrate that yes, in fact, an O/R mapping tool can generate beautiful SQL.

    -Patrick

    --
    Patrick Linskey
    Kodo JDO
    http://solarmetric.com
  3. Not too surprising that Hibernate comes out on top (tied with SolarMetric Kodo). Anyone who has really spent time tuning a Hibernate-based data model the amount of flexibility in what you do and do not retrieve on any given request. Between join fetching, lazy loading, and outer join defaults, you can generally get the result you want without much trouble. I wonder what the minimum hit count possible is?
  4. minimum hit count[ Go to top ]

    I wonder what the minimum hit count possible is?
    Let me guess. One ? :))
  5. why no benchmark?[ Go to top ]

    I welcome the initiative because it is TMC-funded. No Sponsor!

    But,

    "TORPEDO is not a performance benchmark".
    "TORPEDO does not measure throughput, resiliency, or response times".

    I and I am sure many others are dying to see performance benchmark between Hibernate, Kodo, iBATIS, and plain JDBC (Dynamic SQL).

    "Give your readers what they want!"

    Regards
    Rolf Tollerud
  6. why no benchmark?[ Go to top ]

    I and I am sure many others are dying to see performance benchmark between Hibernate, Kodo, iBATIS, and plain JDBC (Dynamic SQL).
    iBatis and plain JDBC would probably be faster for this kind of benchmark than Hibernate as it's a much simpler framework (see below). A benchmark between Kodo (or JDO in general) and Hibernate would definitely be interesting.

    http://www.javaperformancetuning.com/news/interview041.shtml (04/28/2004, Gavin King) :
    You should use Hibernate if you have a nontrivial application (definition of nontrivial varies, but I usually think of Hibernate being less applicable to applications with only ten tables or so) that use an object-oriented domain model. Not every application needs a domain model, so not every application needs ORM. But if your application does a lot of business logic - rather than just displaying tabular data on a webpage - then a domain model is usually a good thing.

    Hibernate really starts to shine in applications with very complex data models, with hundreds of tables and complex interrelationships. For this kind of application, Hibernate will take away a huge amount of coding effort (perhaps up to 25%, for some applications) and will result in an application that performs better than the alternative handcrafted JDBC. This is possible because some kinds of performance optimizations are very difficult to handcode: caching, outer-join fetching, transactional write-behind, etc.

    [...]

    We have some standard performance tests that I run regularly, all of which compare Hibernate against handcrated SQL/JDBC. But again, they turn out to be quite unhelpful in practice. The scalability tests I have done have so far been quite informal, and always confirmed my expectation that the database falls over before Hibernate does. Now that we have access to a real stress testing environment through JBoss, Christian Bauer is putting together some more formal benchmarks. These will include tests for the nontrivial usecases I talked about.

    We are considering releasing these benchmarks to the public for the purpose of comparative testing of different ORM solutions (especially since it doesn't seem right to criticize existing benchmarks without providing some alternative). But I'm not sure about that; I don't see how we could stop other groups cheating - and I don't really want to deal with all the fuss that always accompanies benchmark results. Benchmarks are used to mislead, far more often than they are used honestly.
  7. why no benchmark?[ Go to top ]

    Gavins comments seems very reasonable, more anti-hype than hype. But it makes me think of another useful piece of information I would like to have,

    "Hibernate really starts to shine in applications with very complex data models, with hundreds of tables, complex interrelationships + a lot of business logic"

    How common is this kind of applications in the day of the typical TSS worker-ant?

    Jean-Pol:
    I don't really want to deal with all the fuss that always accompanies benchmark results.
    I can understand you. Maybe it is time to done the outfit that the russian police uses when thay tax-raid some rougue maffia-company. Black masks and anonymous alias. ;)

    Regards
    Rolf Tollerud
  8. why no benchmark?[ Go to top ]

    But it makes me think of another useful piece of information I would like to have,"Hibernate really starts to shine in applications with very complex data models, with hundreds of tables, complex interrelationships + a lot of business logic"
    How common is this kind of applications in the day of the typical TSS worker-ant?
    To me, that's most of the web applications I've worked on. They've often started quiet small and grown later to a much larger DB. Of course, if you just write an electronic commerce application with items, customers and shopping carts, it's certainly not what you need ... except if you're building another Amazon. The difficulty is always to anticipate on the requirements and know if it is what you need. That's the role and responsibility of the J2EE architects.
  9. Strangely, my experiences are different. IMO the relational database is a concept that is quite easy to explain, even to LOL (little old ladies). People grasp it immediately, and start enthusiastically to add tables and primary keys.

    To exemplify take for example a CRM system that everybody can understand. The core tables is always Companies, Contacts and Activities in these types of systems, no matter how large it is (many hundreds of tables). In a project for some time ago, we decided not to add any more table before we could not think of any more functionality to add to the application At that time we only had 5-6 tables, beside the three main we had one for categories, one for system wide settings and the obligatory audit-trail table.

    We finished up with 150.000 lines of code before adding another table. Carefully handwritten code not generated.

    The point is that people add tables too generously, before they are finished with what they got. The result is just a lot of pages that look like they are made be a tool, one table on every page where you can edit, delete and add.

    Which app do you think was most valuable? The one with five tables and 150.000 lines of functionality, or the one with hundreds of tables and pages?

    Regards
    Rolf Tollerud
  10. That all good for new apps[ Go to top ]

    Strangely, my experiences are different. IMO the relational database is a concept that is quite easy to explain, even to LOL (little old ladies). People grasp it immediately, and start enthusiastically to add tables and primary keys. To exemplify take for example a CRM system that everybody can understand. The core tables is always Companies, Contacts and Activities in these types of systems, no matter how large it is (many hundreds of tables). In a project for some time ago, we decided not to add any more table before we could not think of any more functionality to add to the application At that time we only had 5-6 tables, beside the three main we had one for categories, one for system wide settings and the obligatory audit-trail table.We finished up with 150.000 lines of code before adding another table. Carefully handwritten code not generated. The point is that people add tables too generously, before they are finished with what they got. The result is just a lot of pages that look like they are made be a tool, one table on every page where you can edit, delete and add.Which app do you think was most valuable? The one with five tables and 150.000 lines of functionality, or the one with hundreds of tables and pages?RegardsRolf Tollerud
    What happens if you work on an existing application that has been running for a decade and absolutely must keep existing tables. How does the approach you describe fit with really badly designed database models. I'm sure everyone has seen atleast one really bad database model. Since most of my work is consulting, I have to be able to gracefully add/extend and minimize any potential problems. Some people refuse to deal with bad database models and choose to only work on new projects. For the rest of the world, dealing with hundreds of tables is a hard requirement.
    I'm also going to guess the database model you described doesn't need a lot of many-to-many relationships, if 5 tables are sufficient. Some domain problems simply are complex and no amount of head scratching will make it simple or easy.
    In those situations, ORM is critical to building a manageable application.
  11. That all good for new apps[ Go to top ]

    How does the approach you describe fit with really badly designed database models ?
    On the long term, I doubt ORM will be the solution to your problems. If the DB model is badly designed, no tool will miraculously resolve this problem. It can just hide it for a certain time, but at some point it will blow in your face. At that moment, it will be even harder to fix it.
    Some domain problems simply are complex and no amount of head scratching will make it simple or easy. In those situations, ORM is critical to building a manageable application.
    Totally agree.
  12. Mapping flexibility[ Go to top ]

    It's not all black and white. Mapping flexibility says a whole lot about how easy it will be to map to a poorly designed schema. If the schema has some natural inheritance relationships, then you're going to want to go beyond CMP. If the schema is overnormalized, you're going to want the ability to map one class to multiple tables. Some frameworks can; others can't.

    Other times, your mapping solution will directly impact the amount of work that you have to do, or the performance that you'll get. When you have blobs that may not be frequently used, you'd like to be able to map them lazily. Some can; come can't; others require custom types.

    Inheritance mappings are similarly problematic. Do you want one table per concrete class, one big, flat table, or one table per class? Do you want to be able to mix approaches? JDO 2 seems to have a pretty powerful model here.

    My experience has shown that even when I've got a model that leaves a lot to be desired, Kodo seems to be able to handle it better than most. If I'm able to control the schema pretty well, there's not much that I can't do with Hibernate. But you'll always be able to paint an extreme scenario where ORM simply won't fit. But unless you're talking about the end cases, you're talking about a continuum of tradeoffs.
  13. Which app do you think was most valuable? The one with five tables and 150.000 lines of functionality, or the one with hundreds of tables and pages?
    No idea. While I can understand your point of view about people adding too many tables, the answer to this question depends only on how the functionalities match the user requirements. And this is not reflected by the number of lines or the number of tables. If you've developed much more functionalities with your 5 tables than just what the users need, then your application is not better because users will be lost by the many unecessary functionalities.
    In our case, the only thing that drives the relational model (and thus the number of tables) is the number of different kind of entities detected by the analysts. We normalize it and then only denormalize some tables when the performances are not good enough for a specific functionality or, even better because it clearly proves the performance problem is in the DB layer, for a set of related functionalities (and always after having tried to increase the performances by other means, for example having checked we have the proper indexes in tables that are joined ...). Denormalization happens in fact very infrequently. Having a relatively normalized data model allow us to easily add new functionalities, at least on the DB side. But, of course, adding new functionalities doesn't always mean adding tables. No data is redondant in our data models (except when it's the result of the denormalization process).
    Also, only one category of person have the right to create tables : the DB developers (and not the Java developers or even the analysts). And those people from the DB side don't look at all like little old ladies, they look more like Unix admins (you know, the kind of people that are always unpleasant because they're always very busy). If you need a new table, you have to prove them you really need it and, as they know very well the data model, you have to prove them the data doesn't already exist in another table.
  14. only at gunpoint[ Go to top ]

    Well I mainly agree with you. Willingly a new table, but only at gunpoint and only when approved by the right person! Also you have to (regretfully) sometimes denormalize..I see you are no beginner, :) Nevertheless, it is definitely the feeling (I may be wrong) I have that these J2EE consultant out there are not as mature and experienced as you. And with Hibernate and similar systems they are encouraged in the wrong direction...

    Regards
    Rolf Tollerud
  15. only at gunpoint[ Go to top ]

    And with Hibernate and similar systems they are encouraged in the wrong direction...
    If so, they should re-read the Hibernate documentation, because Hibernate doesn't hide the DB at all. Using it, you must still be able to understand the SQL it generates ... at least if you want to use it the right way. And a badly designed DB won't do it either. Hibernate doesn't solve the problems in the DB layer and will also suffer from a bad data model.
  16. In our case, the only thing that drives the relational model (and thus the number of tables) is the number of different kind of entities detected by the analysts.
    There are many things that drive the relational model. "entities" are known as conceptual model it is not necesary the same as phisical model (data model) both models solve different problems. Good model is normalized and abstract (it means "small" in the most of cases). It is not the rocket science any book about modeling explains this stuff.
  17. Yes, this is a good optimization. If you know how to work with "small" projects them you know how to work with "large" too, make it "small" and use the same way as in "small" project. But you need to code less or to generate more to make projects "small" too. It is one of reasons why OO domain model is "evil" for me.
  18. Hibernate On Weblogic. Is JBoss no good then?[ Go to top ]

    Why were the Hibernate results submited by JBoss performed on a BEA WebLogic Server and not their own server?

    Tim..
  19. Why were the Hibernate results submited by JBoss performed on a BEA WebLogic Server and not their own server?Tim..
    That's a good question. Is there a good answer out there?
  20. Why were the Hibernate results submited by JBoss performed on a BEA WebLogic Server and not their own server?Tim..
    That's a good question. Is there a good answer out there?
    Probably because Weblogic has an optimized JVM (JRockit) and JBoss uses the classic Sun JVM. It's probably faster with Weblogic than with JBoss, but I don't see this as a problem : we can compare Hibernate to CMP and to Kodo on the same application server. This doesn't mean JBoss is not good. Weblogic is faster, but JBoss is more flexible.
  21. This doesn't mean JBoss is not good. Weblogic is faster, but JBoss is more flexible.
    I was under the impression that speed of app server or JVM had nothing to do with the TORPEDO result.

    When does the speed effect the number of database hits that Hibernate might need to make.. It doesn't!! I suspect that the result would be identical on a JBoss server. Just seems strange that a JBoss employee should submit the results tested on a rival app server.

    Conspiracy theory:
    Maybe the esteemed developer of Hibernate & JBoss employee (who probably did the tests) doesn't actually use JBoss. I wonder why?

    Tim..

    PS: I'm not knocking JBoss by the way... I quite like using it :-)
  22. Hibernate On Weblogic. Is JBoss no good then?[ Go to top ]

    This is quite amusing, but I'd better cut you guys off here, before it gets even sillier :-)

    Actually, I never ran the tests in *any* appserver. I had absolutely no time to either port Torpedo from WebLogic to JBoss, or to install and learn WebLogic. (And I deinstalled WebSphere - with glee - a year ago.) Really, I spent maybe an hour and a half on this stuff....

    As has already been observed in this thread, the appserver and database are utterly irrelevant in the results of this benchmark.

    Bruce actually ran the Hibernate tests on WebLogic, after I sent him a working main() method. I've never ever used WebLogic myself, but I hear its quite good. :-)
  23. I've never ever used WebLogic myself, but I hear its quite good.
    I've used both, and yes they're both good competitors in the J2EE market. Personally, I prefer JBoss, but it's more based on personal feelings from a developer point of view (source code availability, a bit closer to the specs, more customisable, ...) than on real evidences. On the other side, I'm sure an admnistrator would prefer Weblogic and it is also faster (that's the price to pay for the added flexibility in JBoss).
  24. This doesn't mean JBoss is not good. Weblogic is faster, but JBoss is more flexible.
    I was under the impression that speed of app server or JVM had nothing to do with the TORPEDO result.
    [ashamed] Yes, you're right as it just counts the DB hits. Sorry for my mistake.
  25. Why were the Hibernate results submited by JBoss performed on a BEA WebLogic Server and not their own server?Tim..
    That's a good question. Is there a good answer out there?
    Probably because Weblogic has an optimized JVM (JRockit) and JBoss uses the classic Sun JVM. It's probably faster with Weblogic than with JBoss, but I don't see this as a problem : we can compare Hibernate to CMP and to Kodo on the same application server. This doesn't mean JBoss is not good. Weblogic is faster, but JBoss is more flexible.
    But this wasn't a performance benchmark, or am I mistaken?
  26. This is definitely welcome. Given the often emotive debate between supporters of different ORM tools--and those who think that ORM is evil--it's good to have some empirical data.

    Seems like a good start, but I would like to see the benchmark expanded to examine throughput and effect on the database under heavy load.
  27. Nice work![ Go to top ]

    Good work. I think by reading this report, those who are on the fence with regard to O/R mapping will have something concrete to take a look at.

    However, in order to really understand the full benefit of O/R mapping software, it helps to have a more sophisticated Object model. The object model for TORPEDO was pretty flat (I didn't see any inheritance), but nevertheless I think it makes an interesting starting point.

    -geoff
  28. A couple questions[ Go to top ]

    I understand from the TORPEDO documentation that all transsactions are supposed to be serializable. It also appears that TORPEDO specifies HSQLDB by default. From the HSQLDB javadoc:

    "Up to and including 1.7.1, HSQLDB supports only Connection.TRANSACTION_READ_UNCOMMITTED."

    So, given that HSQLDB does not support serializeable transactions, I am presuming that the TORPEDO code does not actually run concurrent transactions on the same row. Obviously at READ_UNCOMMITTED there would be a loss of data integrity. So please fill me in on how that works.

    My second question concerns the choice of SSB, and the EJB part in general. Although I certainly understand that this is not a performance benchmark, I also understand that the use of SSB in this benchmark doesn't seem to make it any easier to collect the data you need (SQL logging). So why the use of SSB, which would seem only to obfuscate the testing? How does the presence of the application server (as opposed to a simple Web Server), illuminate the results or ease in the collection of the results?

    Finally, understanding the implications of RUBiS is something any developer working with EJB and/or ORM should try to do. Could the TORPEDO designers comment on what lessons learned from RUBiS (if any) contributed the architecture for thie study?

    -geoff
  29. A couple questions[ Go to top ]

    My second question concerns the choice of SSB, and the EJB part in general. Although I certainly understand that this is not a performance benchmark, I also understand that the use of SSB in this benchmark doesn't seem to make it any easier to collect the data you need (SQL logging). So why the use of SSB, which would seem only to obfuscate the testing? How does the presence of the application server (as opposed to a simple Web Server), illuminate the results or ease in the collection of the results?
    Good question. It would be a good idea to focus on the actual persistence code, so that it could be run in different environments.
  30. suggestions[ Go to top ]

    Number of hits to the database doesn't say much about quality of queries, but of course you avoid network issues. Sometimes 2 or 3 queries may be faster then 1 query with a lot of joins.

    suggestion:

    1 - profile the queries raised to the database server and collect them to later run a JDBC batch and check performance of the queries. This would not check the O/R product, but the queries issued to the server.

    2 - move to Apache Derby (cloudscape) which is much better embedded database.

    3 - use inheritance model, (datastore and application identity JDO). This may change the current results. To avoid the limitation of some CMP engines that does not support inheritance why not two Torpedos, one Torpedo flat model and the other one using inheritance.
  31. A couple questions[ Go to top ]

    Geoff, great questions. Let me try to answer them. Many of the answers have to do with the difference between TORPEDO, the specification and the TORPEDO reference implementation.

    The specification is behavioral. It says things like TORPEDO results need to be obtained from a multi-tier application environment. As you point out it also says that the TORPEDO operations must be atomic. So, for example listAuctionTwiceWithTransaction must be serialized with respect to concurrent placeBid operations.

    That being said, we wanted to get folks started with a reference implementation for a common scenario -- J2EE application server, Java, Java O/R mapping standards (CMP and JDO) and a common framework of capturing the SQL. All of the results posted at MiddlewareRESEARCH are based on the reference implementation. But again, the specification is behavioral. Other implementations of TORPEDO are possible.

    The TORPEDO reference implementation is distributed with the Hypersonic database. We included it because of its simple administration and because it is open source. However, there is nothing that requires its use. For example, if you look at the Oracle Toplink (POJO) submission, they used the Oracle9i database 9.2.0.2. As others adapt the reference implementation for other databases and other J2EE application servers, we are including their scripts and configuration files in the reference implementation.

    The TORPEDO reference implementation uses a Stateless Session Bean. It depends on Container Managed Transactions and the one-thread-per-request model of Stateless Session Beans. Certainly, an implementation of TORPEDO that does not depend on Stateless Session Beans is possible. The reference implementation, however, was implemented to take advantage of the EJB container.

    Finally, we did look at RUBiS. However, we wanted to do something more focused. In particular, we wanted to study how well O/R mapping software plays in the multi-tiered application environment.
  32. Other results besides hit count[ Go to top ]

    It's really a good idea to start benchmarking or mapping tools, many people were expecting this

    However, I think that hit counts is far from enough to measure the performances of a tool, particularly regarding caching or possible query complexity.

    I am far from being a database expert (that's why I love orm tools so much ;-) ), but I guess that sometimes three simple queries may prove more efficient than one huge one. If you have to cross product on several tables, it might be more db-friendly to issue several queries on one table, though I'm not 100% confident about this.
  33. Other results besides hit count[ Go to top ]

    If you have to cross product on several tables, it might be more db-friendly to issue several queries on one table, though I'm not 100% confident about this.
    It must be better to execute single query, but database can fail to find the best plan and you need to tune it for this reason (statistics collection and indexes help in most of cases). If you issue several queries and it performs better then it meens you found good plan manualy, it is not a very good way if database grows, it can become a "bad" plan later and you will need to change this plan in code manualy.
  34. Good Initiative[ Go to top ]

    I think this is an excellent initiative from TMC, and it has a lot of potential. But the benchmark (since it is labelled as such) MUST include more metrics to have any meaningful value. I'm pretty sure we're all aware of this, so I'm just pointing it out and underlining the fact.

    For example, the Kodo JDO submission implements the "high bids" operation by getting bids, looping on the result set to find the maximum bid, then looping on the result set again to find rows with that value. That can hardly be considered fair/satisfying -- yet it scores highest on the "SQL hits" scale. A degenerate submission might just "select * from all_tables" and implement all subsequent operations in Java, generating zero hits.

    My point is: SQL hits is next to useless as a metric. It's very much like page hits on a HTTP server. It's a starting point at best. I therefore look forward to more in-depth work by TMC on this benchmark.
  35. Good Initiative[ Go to top ]

    I think this is an excellent initiative from TMC, and it has a lot of potential. But the benchmark (since it is labelled as such) MUST include more metrics to have any meaningful value. I'm pretty sure we're all aware of this, so I'm just pointing it out and underlining the fact.
    I agree 100%. In particular, I'm interested in seeing more scalability tests, more complex data models, and more tests of transactions that do lots of work (i.e., that perform a number of interdependent updates and inserts).
    For example, the Kodo JDO submission implements the "high bids" operation by getting bids, looping on the result set to find the maximum bid, then looping on the result set again to find rows with that value. That can hardly be considered fair/satisfying -- yet it scores highest on the "SQL hits" scale.
    Yeah, this is a bit of an issue with the benchmark, and with the JDO1 specification. We're working with the JDO expert group on subquery support, and, as a matter of fact, the final release of Kodo 3.2 will include subquery support.

    We were considering re-writing that query to use a direct SQL query or a Kodo query extension, but we decided not to deviate from the JDO spec at all when working on our submission, especially since the numbers tested for in the benchmark itself would have been the same.

    In the JDO1 spec as it stands, it is possible to choose between minimizing the amount of data transferred or the efficiency of the query processing in this type of situation -- we could have changed the query as written by the TMC folks to order by the amount field (in fact, we probably should have made that suggestion to them anyways, even for the query as they wrote it). By doing that in conjunction with scrolling result support, we could have minimized the data transfer to the minimal amount (assuming the JDBC driver and database support scrolling cursors). But that would still not be an ideal solution.

    When we release the final build of 3.2, we'll probably submit new results that include a subquery for that test.
    A degenerate submission might just "select * from all_tables" and implement all subsequent operations in Java, generating zero hits.
    Well, actually, the benchmark says that you have to restart the appserver between each test. So, for many of the tests, you'd actually get the same number of hits.

    -Patrick

    --
    Patrick Linskey
    Kodo JDO
    http://solarmetric.com
  36. I commend TMC for coming up with the TORPEDO benchmark. Having created the STORM™ Benchmark for measuring performance of OR-Mapping engines, we appreciate how difficult it is to develop a portable benchmark that is also easy to understand, configure, execute, and interpret. Good work! A couple of comments:

    Although number of database trips sure is an important performance parameter, I hope TORPEDO will grow to include other key performance metrics like response time and throughput, which additionally reflect on the internal efficiency of the OR-Mapping engines.

    I agree with a couple of other posts that suggest removing the dependence on an application server for the benchmark. A standalone benchmark that is focused on just the OR-Mapping aspects should be much easier to configure and run.

    Regards,

    -- Damodar Periwal
    Software Tree, Inc.
    Simplify Data Integration
  37. Questions (continued)[ Go to top ]

    OK, so I have read the spec for the TORPEDO test, and after doing so, I still do not understand why a "middle" tier is required to accurately measure the number of SQL statements the O/R mapping software performs.

    JDO is not (yet) a J2EE spec, and examples in the JDO spec make it clear that JDO can operate in a standalone environment. For example, a Swing application that wanted to provide persistence might ship with HSQLDB in embedded mode, and use a JDO driver to persist data entered by the user.

    So, if I want to run this benchmark by creating a simple Swing client, with embedded HSQLDB, I am certain that this will be easier to deploy than an application designed for an App Server.

    TMC, please tell me if this technique would be acceptable.

    -geoff
  38. Questions (continued)[ Go to top ]

    TORPEDO is focused on understanding how well O/R mapping software executes in servers. Servers are where high quality O/R mapping software can especially add value to application code. Servers are long-lived programs, shared by multiple users and can have demanding concurrent work loads. The caching O/R mapping software provides can be a big win in a server environment. An even more challenging environment is a cluster of servers where requests are load balanced across multiple servers. O/R mapping software that can operate transparently in a cluster of servers and efficiently produce correct results is indeed impressive software.

    There are certainly servers that are not based on EJB containers. We selected Stateless Session Beans for the TORPEDO reference implementation because we wanted to be able to test CMP with the TORPEDO reference implementation. A version of TORPEDO that utilized some other kind of server framework could certainly exist. It would obviously not be a testbed for CMP.
  39. Questions (continued)[ Go to top ]

    Bruce
    We selected Stateless Session Beans for the TORPEDO reference implementation because we wanted to be able to test CMP with the TORPEDO reference implementation.
    CMP has been around a few years now, I think it's been well tested :-)
    A version of TORPEDO that utilized some other kind of server framework could certainly exist. It would obviously not be a testbed for CMP.
    Is the point that TORPEDO assumes declarative transaction management, so as to keep transaction management out of the code in the test itself? If so, why not just say that in the spec, identify which methods should be transactional, keep the reference implementation using CMP, and let people do submissions using whatever approach to declarative tx mgt they like.

    Rgds
    Rod
  40. Questions (continued)[ Go to top ]

    Is the point that TORPEDO assumes declarative transaction management, so as to keep transaction management out of the code in the test itself? If so, why not just say that in the spec, identify which methods should be transactional, keep the reference implementation using CMP, and let people do submissions using whatever approach to declarative tx mgt they like.RgdsRod
    Everything you say is fine, although the specification does not actually require declarative transaction management. It says
    Each of the TORPEDO operations is executed as an individual transaction. Results returned from each operation must reflect serialized transactions. Any results that do not are considered incorrect and invalid.
    (The ListAuctionTwiceWithoutTransaction executes as two ListAuction transactions, not one.)
  41. Questions (continued)[ Go to top ]

    Is the point that TORPEDO assumes declarative transaction management, so as to keep transaction management out of the code in the test itself? If so, why not just say that in the spec, identify which methods should be transactional, keep the reference implementation using CMP, and let people do submissions using whatever approach to declarative tx mgt they like.RgdsRodEXACTLY. I want to run this benchmark without declarative transaction management, outside an EJB conbtainer, and that is perfectly in keeping with the stated goals of the benchmark.
  42. ...and another thing :-)[ Go to top ]

    Also, I am not clear on why it is interesting to compare how many SQL statements the ORM tool performs. Let me give you a 'for example'. Did you know that it can be faster to insert 100 rows into MySQL using 100 separate Statements than to execute a single prepared statement 100 times? Surprising huh?

    Therefore, I am concerned that simply counting the number of SQL statements is overly simplistic.
  43. ...and another thing :-)[ Go to top ]

    Also, I am not clear on why it is interesting to compare how many SQL statements the ORM tool performs. Let me give you a 'for example'. Did you know that it can be faster to insert 100 rows into MySQL using 100 separate Statements than to execute a single prepared statement 100 times? Surprising huh? Therefore, I am concerned that simply counting the number of SQL statements is overly simplistic.
    I think, you could be much more surprised, if you get to know, that it possible to insert this 100 row by 1 query :)
  44. ...and another thing :-)[ Go to top ]

    Actually I wouldn't be surprised by that at all ;) ExecuteBatch is great, but in some DB's, getGeneratedKeys doesn't work with batch updates, so sometimes batch updates are not an option. My point was that counting sql statements doesn't give a complete picture of what is going on.

    Consider an application that needs to present the user with a list of widgets. One implementation might do a 'select count ...' first to determine how many widgets there are, and how best to show the results to the user. If there are ten widgets, they could all be listed on the page. If there are 10,000 widgets, the user's query might need to be refined. So in the case that there are 10 widgets, the smart app does 2 queries (a 'select count', then the select). Another implementation might skip the 'select count'. Sure it will work when there are 10 widgets, but obviously the app is going to fail on 10,000 widgets. But if all you are doing is counting SQL statements on a benchmark with 10 widgets, the app using 1 sql statment wins.

    -geoff
  45. ...and another thing :-)[ Go to top ]

    Also, I am not clear on why it is interesting to compare how many SQL statements the ORM tool performs.
    No you are not. And it's clear that 'this' isn't interesting at all. I can write you for eg. codegen based o/r tool that graps the whole database in a memory with only one request (am I a winner then? surely not). Who's idea was this at first place? Only use for this, I think, is to view the statements that gets executed on db, but there is already several JDBC wrappers that can do the same (with performance statistics integrated).

    This test doesn't tell anything about the quality of the O/R mapper. Complex multitable queries (eg. more that 5 tables) can usually be optimized by a few more simple queries. O/R tools usually don't optimize the queries. They normally just use lazy loading and caching but they cannot measure what is the best way to do one particular query. Maybe it's possible if O/R mapper maintains performance statistics and tries to self tune itself, but I do not know any O/R mapper that does this.
  46. Hardware[ Go to top ]

    "TORPEDO is not a performance analysis. It attempts to create a level approach to analyzing the effectiveness of SQL generation of an Object-Relational solution."

    I wonder how you could compare the "effectiveness of SQL generation" without knowing the underlying hardware. I don't understand how you manage to check the "effectiveness of SQL generation" without doing a performance analysis. If you don't want to do a performance analysis, why do you name it "Benchmarking Initiative"?

    The resulting numbers are everything, but NOT comparable. As long as the results do not have a rough description of the hardware the test were executed, the results are worthless numbers. What do you really want to prove? Which company has the biggest hardware budged to show good benchmarks?

    If you want to compare O/R mapping products, add a hardware description to give the numbers at least some value.
  47. One observation I have is that although they are using the Weblogic Server 8.1, their EJB deployment descriptors are still 7.0 deployement descriptors. They are basically testing the BEA Weblogic CMP 7.0 beans running on 8.1 Server.

    Michael Chen
    BEA Systems