Java Development News:
Optimizing CMP Entity Beans with Databases
By Debu Panda
28 Jul 2004 | TheServerSide.com
Every body shuns CMP entity beans be either they try to blame J2EE or get some political mileage for their unveiled agenda. This happens almost every thread at least in community forums like TheServerSide.com or JavaLobby. However I was surprised to see a houseful when I presented “Optimizing Container Managed Persistence with Relational Databases” with Mike Keith in JavaOne this year. I was expecting very few attendees with ongoing slaughter of CMP entity beans in the community forums and the fact that EJB is having a quantum loop in the EJB 3.0 spec. And almost everyone stayed in the room. After the presentation, I spoke to few attendees and almost every one of them use CMP entity beans in their production application servers such as Weblogic, OC4J or Websphere. Most of them are happy with their the scalability requirements of their applications.
Before I took over the responsibilities of EJB container for OC4J about one and half years back I always thought EJBs and especially CMP entity beans are useless and no body uses them at all. But soon realized, there are thousands of Oracle customers who use EJBs and CMPs in production. After working with a lot of developers those complained about performance of CMP entity beans do not understand the impact of entity beans on the databases. In this article, I will provide a guideline how you can tune optimize the CMP entity beans with relational databases like Oracle.
First advice is to examine that you really need CMP entity beans for your applications.
Following are guidelines that you can use to optimize the performance of your CMP entity beans:
- Reduce Database operation
- Tune Database/SQL Access
- Caching in the middle-tier
- Use Right Locking/Isolation Mode
Let’s examine each of these sections and see how you can effectively use to improve the performance of CMP entity beans.
Reduce Database operation
Reducing database operation or number of SQL generated by CMP entity beans is easier said than done! You have to look for vendor specific parameters how your container optimizes the SQL generation.
Enforce primary key constraint at database level
Several EJB containers cannot rely on the existence of primary key constraints in the underlying database table and executes a SELECT statement to make sure the insert does not lead to a duplicate entry.
For example, you can disable this extra constraint checking by OC4J by setting do-select-before-insert=”false” for your CMP entity bean.
Defer EJB creation to end of transaction
Creating an entity bean with a relationship is a two-step procedure, by default this involves two SQL statements: an INSERT statement in ejbCreate() to persist the entity bean, and an UPDATE to establish the relationship. Some containers defer persistence of the entity until the end of the transaction. This means that the row is inserted into the database after the ejbPostCreate() method instead of ejbCreate(), avoiding an extra UPDATE.
If you are using OC4J, make sure that you have this setting for your entity bean data-synchronization-option="ejbPostCreate"
Use DELETE CASCADE at database level
When an entity with relationships is deleted with multiplicity ONE, the container is responsible for deleting all dependent entities. The container generates and executes the required SQL code to remove all of the related child records from the database. The native cascade delete facility in the Oracle database is much faster as this avoids multiple delete statements when removing the dependent records. The database removes all related records when the CASCADE DELETE option is specified for the table.
Batch update operations
Calling a set method on a bean field modifies the bean value and requires the new state to be stored in the database. The EJB container does this by calling the ejbStore method and issuing an UPDATE to the database. Good EJB container implementations provide a way to batch bean field updates into a single UPDATE operation when multiple set methods are called within a transaction. Reducing the number of calls to the database improves application performance.
Avoid Association Tables for maintaining relationship
Association tables are maintenance nightmare and require unnecessary SQL statements and JOIN between the entity table and association tables to maintain relationship
Set pre-fetch size to reduce roundtrips to the database
Oracle Database JDBC drivers have extensions that let you set the number of rows to pre-fetch to the client while a result set is being populated during a query. This reduces the number of roundtrips to the server, and can drastically improve the performance of finder methods that return a large number of rows.
You can set the pre-fetch size for your finder method in OC4J as follows in the vendor specific deployment descriptor:
<finder-method query="" prefetch-size="15" > <!-- Generated SQL: "select * from EMP where EMP.ENAME = ?" --> <method> <ejb-name>EmployeeBean</ejb-name> <method-name>findByName</method-name> <method-params> <method-param>java.lang.String</method-param> </method-params> </method> </finder-method>
Exploit eager loading of relationships
Some EJB containers allow the eager loading, or pre-loading, of related entity beans when an entity bean is loaded. You can enable this feature by setting a parameter for your EJB in the vendor-specific deployment descriptor. This may improve performance if the related entity bean is used in the same transaction. In this case, the EJB container uses a single SQL statement to load the related entities. As a rule of thumb, use this feature only if your application needs to use the related entity bean in the same transaction. Otherwise, you’ll cause extra data operations that aren’t required.
Use Lazy Loading with caution
Use lazy-loading only when retrieving large number of objects or entity contains a large field such as BLOB.
Tune Database/SQL Access
Make sure your finder methods use indexes
Queries should always use indexes for optimal performance. Avoid full-table scans when possible. For CMP entity beans, finding the SQL statement that does a full-table scan is a challenge, since the container generates the statements that are executed. To ensure that the EJB container is generating efficient queries when executing finder or select methods, make sure the tables being accessed are indexed correctly. To review the SQL statements generated by the EJB container, you can use open-source tools like P6spy (http://www.p6spy.com) to intercept and log the generated SQL statements. With these statements, you can use the Oracle database “explain plan” facility to determine how the query is executed, and whether any indexes can be created for the tables to improve database access.
Avoid findAll on entities based on large tables
When the findAll method is used, the container tries to retrieve all rows of the table. Avoid using this method on entities based on large tables, as it will slow down your database.
Exploit statement caching
The Oracle database provides a feature that caches SQL statements. This reduces the parse time when the SQL statement is issued to the database multiple times. For example in OC4J you can enable statement parsing for your data-source by adding the statement-cache-size attribute:
<data-source> ... stmt-cache-size="100" </data-source>
Define your own SQL Finder method
If you think you are really good in SQL, you can make a difference. Several containers like OC4J and TopLink CMP engine provide a way to define your own SQL for the finder method or an ability to override the SQL statement generated by your container. You have to use this in caution as this may have adverse effect on your application. You can write your highly optimized SQL statements for your finder method with Appropriate JOINs and SQL Hints
Use the Right Concurrency Mode and Locking Strategy
Choose the right isolation level for entity beans
EJB containers support the specification of isolation modes for entity beans. In general, it’s better to avoid performing this work at the EJB container level and leave isolation management to the database. Oracle databases support COMMITTED and SERIALIZABLE [u1] isolation levels. DBAs will probably lean toward maintaining assured data consistency; thus, they’ll want to use the SERIALIZABLE option for mission-critical systems. However, COMMITTED performs better.
Choose the right locking mode
In addition to EJB isolation modes, containers usually provide a locking mode for concurrency control of CMP entity beans. For better performance, choose either an OPTIMISTIC or READ-ONLY locking mode. Using a PESSIMISTIC locking mode, which locks a database row for the duration of a transaction, can adversely affect performance.
Avoids database calls by using read-only patterns
For entity beans that are rarely modified, most application servers support READ-ONLY and READ-MOSTLY patterns to avoid repetitive loading of the same beans from the database. By identifying where you can use this approach and configuring the specific beans accordingly, you’ll reduce database accesses and improve performance.
Cache in the middle-tier
You have to check whether your container provides support for caching of entity beans and whether these are applicable to your applications. You can use flexible caching options to ensure you get maximum performance. Following are few advantages of caching:
- Performance is boosted because frequently accessed data does not have to be constantly built and rebuilt.
- Memory footprint and garbage collection kept low because multiple readers share objects
About the author
Debu Panda firstname.lastname@example.org
I'm Debu Panda (email@example.com) and I'm the Product Manager responsible for EJB Container in the Oracle Application Server development team. I've around 13 years of experience in IT mainly in the Database and Application Server technologies. I've authored articles in several journals and presented in several conferences.