Article: Avoiding J2EE data layer bottlenecks

Discussions

News: Article: Avoiding J2EE data layer bottlenecks

  1. Article: Avoiding J2EE data layer bottlenecks (23 messages)

    Christopher Keene has released an article that defines three common causes of application data bottlenecks and suggests approaches for eliminating them. He discusses model-intensive requirements, transaction-intensive requirements, and data-intensive requirements.

    Read Christopher in An ounce of prevention: Avoid J2EE data layer bottlenecks

    Threaded Messages (23)

  2. Ummm, did I read this wrong or is he actually suggesting the usage of Entity EJBs to AVOID data access bottlenecks? Seems a bit odd, no?
  3. Naw, he's just warning against taking automated mappings lightly because they can have severe performance consequences.
  4. Apply Paretto's law[ Go to top ]

    Most code fall into Paretto's law - the 80-20 rule - in
    which the vast majority of code can be automated consistently by code generators and mapping tools without problems - applying MDA.
    Few classes will consume substantial work to have a good performance to access data. The best way to give good performance in this points is a good sql programmer.
  5. Good article. Very clear.

    He mentions Toplink & EdgeXtend as data services layer candidates. Are there any others, or maybe a comparison chart?

    Regards
    Kit
  6. Good article, but leaves out JDO, Hibernate and such which are not as mature as EdgeXtend and TopLink but provide alternative to price at (I apologize if I am mistaken but I think it is close number) 10K and 15K per CPU EdgeXtend and TopLink. Most people have probably noticed Christopher Keene is CEO of Persistence Software maker of EdgeXtend.
  7. Toplink is included with Oracle 9iAS (depending on the edition) so if you're an all-Oracle shop, you probably already have it. OTOH, I'm not sure if you can even buy it separately any more.

    Most people have probably noticed Christopher Keene is CEO of Persistence Software maker of EdgeXtend.

    They have now ..

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Clustered JCache for Grid Computing!
  8. Toplink is included with Oracle 9iAS (depending on the edition) so if you're an all-Oracle shop, you probably already have it. OTOH, I'm not sure if you can even buy it separately any more.
    Yup, you always have been able to buy it separately, and still can. Still supports any app server, any database, any IDE.

     - Don
  9. For the kinds of data access performance problems described in this article, it is not clear that a temporary caching API like JCache offers additional value.

    Many data mapping products, from Toplink to EdgeXtend to some JDO flavors to Hibernate offers not just caching, but clustered caching.

    The applications where JCache would be more appropriate would be applications with no persistent data or with very simple data access such as JDBC.

    - chris
  10. Chris: Many data mapping products, from Toplink to EdgeXtend to some JDO flavors to Hibernate offers not just caching, but clustered caching.

    Five JDO vendors and Hibernate support clustered caching by using Tangosol Coherence, which is (or will be when it becomes published) jCache compliant.

    Chris: The applications where JCache would be more appropriate would be applications with no persistent data or with very simple data access such as JDBC.

    You're joking, right? ;-)

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Clustered JCache for Grid Computing!
  11. For what it's worth, we are evaluating a JDO solution right now that has its own caching built in. I suppose we could use jcache instead of the built in caching, but what would be the point?
  12. Why JDO + JCache makes sense[ Go to top ]

    For what it's worth, we are evaluating a JDO solution right now that has its own caching built in. I suppose we could use jcache instead of the built in caching, but what would be the point?
    Most of the JDO implementations and O/R mapping products that I've used have built-in caching, but their distributed caching is either non-existent or relatively naive. At best, you'll see some sort of variant on a Seppuku pattern, in which a commit in one VM will trigger other VMs to drop the modified data from their caches (if present). So, each VM will have its own partial, semi-autonomous data cache. That's how Kodo's data cache behaves by default, for example.

    This is acceptable for many applications, but when you start really hitting your database hard (i.e., load == 1 on the db machine), this becomes a good area for improvement. It's in this type of situation when a more aggressive caching algorithm, such as clustered caching managing updates coherently (instead of evicting modified data), will be particularly valuable. That's why we added support for Coherence to Kodo JDO. It means that data that is being used and modified a lot is always available from the clustered cache, and that means that the VMs in the cluster are not being forced to always go back to the db for fresh values.

    There are even more aggressive options available, such as aggregating the caching resources of all the VMs into one giant cache, and grouping and coalescing database updates together using write-behind caching. In large-scale clusters, these options make a significant difference in how far the application can scale up without completely soaking the database (i.e., load == 1).

    -Patrick

    --
    Patrick Linskey
    Kodo JDO
    http://solarmetric.com
  13. Most of the JDO implementations and O/R mapping products that I've used have built-in caching, but their distributed caching is either non-existent or relatively naive. At best, you'll see some sort of variant on a Seppuku pattern, in which a commit in one VM will trigger other VMs to drop the modified data from their caches (if present).
    <vendor>
    Not *ALL* O/R mapping products with integrated caching use the "naive" Seppuku pattern. Persistence, for example, sends a single image of modified data to other servers upon commit, which is automatically applied to the remote cache. Thus, remote caches neither drop the modified data nor do they require a trip to the database on the next request for the modified object.

    Incidentally, this works not only in JVMs but tranparently across platforms: Java, C++, and .NET.
    </vendor>

    Jim Barton
    Persistence Software
  14. (I apologize if I am mistaken but I think it is close number) 10K and 15K per CPU EdgeXtend and TopLink.
    Nah, TopLink is at most $5k per cpu. Named user pricing can make it even much less. Moreover, it's royalty free for ISV and embedded apps.

     - Don
  15. Caution modifier + more OR info[ Go to top ]

    The "true" author of the article was actually Richard Jensen, who is a Senior Architect at Persistence Software (JavaWorld mixed up the authors - maybe just because they had my bio handy). So although the article is evilly twisted with subliminal messages to make you want to buy only our OR mapping, it at least has *some* basis in technical fact.

    For more info on this subject, try the folling links:
    1. Scott Ambler's OR mapping white papers http://www.ambysoft.com/onlineWritings.html
    2. Doug Barry's OR mapping white papers
    http://www.service-architecture.com/object-relational-mapping/articles/index.html
    3. A very complete but somewhat unreadable comparison of OR tools is here
    http://c2.com/cgi/wiki?ObjectRelationalToolComparison
    4. Persistence Software's (evilly slanted) OR mapping white papers
    http://www.persistence.com/technology/index.html
    5. Download our Eclipse OR mapping plug-in for Java at:
    http://www.persistence.com/download/index.html

    - chris keene
    The Putative but not Actual Author of "Deep O/R Mapping"
    www.persistence.com
  16. Journalists for you.[ Go to top ]

    Apart from being evil entity sublimely influencing unsuspecting public (BTW it made me want to buy a bottle o'vanilla coke) I still see chance of redemption for you Chris. Giving credit when it is due (Richard’s article) is a rare occurrence in today’s power and fame hungry world.

    You must forgive my suspiciousness on account of my growing up in a communist country where one had to suspect even one’s brother, for it were strenuous times.

    Regards.
  17. Journalists for you.[ Go to top ]

    Oh, I'm just as power and fame hungry as the next capitalist running dog. Just craftier and hence more dangerous ;-)
  18. An approach I never find discussed is to work with SQL user-defined types (UDTs). For example, you can define an SQL type ADDRESS to be used in a table CONTACT:

    CREATE TYPE ADDRESS (
        STREET VARCHAR(40),
        CITY VARCHAR(40),
        ...
    );

    CREATE TABLE CONTACT (
        FIRSTNAME VARCHAR(40),
        LASTNAME VARCHAR(40),
        LOCATION ADDRESS,
        ...
    );

    The ADDRESS type can then be mapped one-to-one to a Java Address class implementing the java.sql.SQLData interface part of JDBC 2.0. No impedance mismatch, and you get all the usual benefits of a database, including caching.

    Anyone having real life experience using that approach? What are the pros and cons? After all, all leading database vendors support UDTs and JDBC 2.0 is around for quite a few years now...
  19. You can't define new operators for your new types, so UDTs in SQL have been (probably will be) useless. It's also pretty much non-portable and very very weak in features. BTW, a good book you can read in one afternoon about databases, SQL and it's flaws: http://www.amazon.com/exec/obidos/tg/detail/-/0201485559

    So far, I've only heard from one person using UDTs (with PostgreSQL) and never seen any database that extensively uses them. There are many reasons for that so we can still dream that someone implements a real relational type system.
  20. One interesting alternative for simplifying the object-relational mapping is to use stored procedures. This enables a considerable amount of massaging/mapping to be done within the database, with the objective of making the actual object-relational mapping itself easier.

    In companies with a strong DBA group, stored procedures can sometimes be the only way "unwashed" programmers are allowed to get access to the pure inner sanctum of corporate data.

    Although OR mapping to stored procedures can be an elegant approach, it may not be supported by your CMP or JDO or whatever flavor of data layer you're using. Self-servingly, this is something supported by "best-of-breed" data layers like Oracle Toplink or Persistence Edgextend.

    See this link for an example of how this can work:

    http://support.persistence.com/Discussion/viewmsg.asp?Mid=89

    - chris
  21. Mapping to stored procs[ Go to top ]

    One interesting alternative for simplifying the object-relational mapping is to use stored procedures.
    One downside of this approach in the context of this thread is that pushing additional work onto the database server (as opposed to the presumably clustered application tier) is that this will actually increase load on the database, making it more of a point of failure than less. So, while stored procs can help out with dumbing down your mappings, this can be bad for overall system scalability.
    Although OR mapping to stored procedures can be an elegant approach, it may not be supported by your CMP or JDO or whatever flavor of data layer you're using. Self-servingly, this is something supported by "best-of-breed" data layers like Oracle Toplink or Persistence Edgextend.
    Well, I guess that by that definition, Kodo is also "best-of-breed", with a couple differences -- you just need to return to us the primary keys at a minimum, and we'll work out the rest from there, and you can also use stored procs for inserts/updates/deletes as well as selects. See http://docs.solarmetric.com/ref_guide_enterprise_sql.html for details about selects via stored procs.

    -Patrick
  22. You are right that tying the OR mapping to stored procedures can make more work for the database. However, there are several situations where the OR mapping needs to map to a set of database stored procedures, not just for selects, but also for deletes, updates and (especially) inserts. Examples include:
    1. Some companies have standards for data access that reqire all data access to go throught stored procedures to ensure data integrity (DBAs never trust those pesky programmers to get all the data semantics right ;-))
    2. As mentioned before, stored procedures may be used to create a virtual table by integrating data from several other tables
    3. Inserts are often done through stored procedure to ensure that the right rules are followed in creating a primary key (see trust issue above)

    In all of these situations, the object-relational mapping needs to map all object operations (create, read, update, delete) to the appropriate stored procedure.

    For more information, see [url=http://www.persistence.com/technology/mapping.html]object-relational mapping[/url]

    - chris
    The technology leader for object-relational mapping and distributed caching
    www.persistence.com
  23. Mapping to stored procs[ Go to top ]

    One interesting alternative for simplifying the object-relational mapping is to use stored procedures.
    One downside of this approach in the context of this thread is that pushing additional work onto the database server (as opposed to the presumably clustered application tier) is that this will actually increase load on the database, making it more of a point of failure than less. So, while stored procs can help out with dumbing down your mappings, this can be bad for overall system scalability
    Personally I don't think so.The overall performance will be better since you get rid of network latency(unless you try to do some heavy FP calculations in the stored proc:-))). In some engines SPs will guarantee you a better access plan reuse. The packages in Oracle can simulate(emulate?) pretty well OO(unfortunately you'll become an Oracle shop.)
    And the author of one OR tool mentioned in his blog that he doesn't really understand what's a hash join. And THAT is scarry and not scalable.
  24. re: stored procedures[ Go to top ]

    DODO DODO: The overall performance will be better since you get rid of network latency(unless you try to do some heavy FP calculations in the stored proc:-))). In some engines SPs will guarantee you a better access plan reuse. The packages in Oracle can simulate(emulate?) pretty well OO(unfortunately you'll become an Oracle shop.)

    You're absolutely right, and if you're already an Oracle shop and plan to remain that way, it's not such a terrible thing. ;-)

    If the application is well architected up front, you can put the "functionality" into Stateless Session Beans, with a "least common denominator" CMP/CMR entity EJB implementation behind it, for example. Then, you can start to optimize specific parts of the application by moving the logic into stored procedures, called directly from the session EJBs. The down side is that it can cause some issues with applications that use some entity EJB-based data access and some stored procedures, since the entity cache can end up with stale data. It's a very doable architecture, but it does require careful planning, and some detailed understanding of how the application server manages entity EJBs, particularly caches that cross transactional boundaries (e.g. some of the WebLogic advanced features.)

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Clustered JCache for Grid Computing!