Developing performance-oriented JDBC or Hibernate applications is not easy. JDBC and ODBC drivers do not throw exceptions to tell you when your code is running too slow.
This series of articles presents some general guidelines for improving JDBC application performance that have been compiled by examining the JDBC implementations of numerous shipping JDBC applications. These guidelines include:
- Using Database MetaData Methods Appropriately
- Retrieving only Required Data
- Selecting Functions that Optimize Performance
- Managing Connections and Updates
- Designing and Running Benchmarks for Performance
In the last article, we discussed selecting functions that optimize performance. These general rules about managing connections and updates should help you solve some additional common JDBC system performance problems.
Managing Connections and Updates
The guidelines in this section will help you to manage connections and updates to improve system performance for your JDBC applications.
Connection management is important to application performance. Optimize your application by connecting once and using multiple statement objects, instead of performing multiple connections. Avoid connecting to a data source after establishing an initial connection.
Although gathering driver information at connect time is a good practice, it is often more efficient to gather it in one step rather than two steps. For example, some applications establish a connection and then call a method in a separate component that reattaches and gathers information about the driver. Applications that are designed as separate entities should pass the established connection object to the data collection routine instead of establishing a second connection.
Another bad practice is to connect and disconnect several times throughout your application to perform SQL statements. Connection objects can have multiple statement objects associated with them. Statement objects, which are defined to be memory storage for information about SQL statements, can manage multiple SQL statements.
You can improve performance significantly with connection pooling, especially for applications that connect over a network or through the World Wide Web. Connection pooling lets you reuse connections. Closing connections does not close the physical connection to the database. When an application requests a connection, an active connection is reused, thus avoiding the network I/O needed to create a new connection.
In addition to connection pooling tuning options, JDBC 3.0 also specifies semantics for providing a statement pool. Similar to connection pooling, a statement pool caches PreparedStatement objects so that they can be re-used from a cache without application intervention. For example, an application may create a PreparedStatement object similar to the following SQL statement:
select name, address, dept, salary from personnel
where empid = ? or name like ? or address = ?"
When the PreparedStatement object is created, the SQL query is parsed for semantic validation and a query optimization plan is produced. The process of creating a prepared statement is extremely expensive in terms of performance with some database systems such as DB2. Once the prepared statement is closed, a JDBC 3.0-compliant driver places the prepared statement into a local cache instead of discarding it. If the application later attempts to create a prepared statement with the same SQL query, a common occurrence in many applications, the driver can simply retrieve the associated statement from the local cache instead of performing a network roundtrip to the server and an expensive database validation.
Connection and statement handling should be addressed before implementation. Spending time and thoughtfully handling connection management improves application performance and maintainability.
Managing Commits in Transactions
Committing transactions is slow due to the result of disk I/O and potentially network I/O. Always turn Autocommit off by using the WSConnection.setAutoCommit(false) setting.
What does a commit actually involve? The database server must flush back to disk every data page that contains updated or new data. This is usually a sequential write to a journal file, but nonetheless, is a disk I/O. By default, Autocommit is on when connecting to a data source, and Autocommit mode usually impairs performance because of the significant amount of disk I/O needed to commit every operation.
Furthermore, most database servers do not provide an Autocommit mode natively. For this type of server, the JDBC driver must explicitly issue a COMMIT statement and a BEGIN TRANSACTION for every operation sent to the server. In addition to the large amount of disk input/output required to support Autocommit mode, a performance penalty is paid for up to three network requests for every statement issued by an application.
Although using transactions can help application performance, do not take this tip too far. Leaving transactions active can reduce throughput by holding locks on rows for long times, preventing other users from accessing the rows. Commit transactions in intervals that allow maximum concurrency.
Choosing the Right Transaction Model
Many systems support distributed transactions; that is, transactions that span multiple connections. Distributed transactions are at least four times slower than normal transactions due to the logging and network I/O necessary to communicate between all the components involved in the distributed transaction (the JDBC driver, the transaction monitor, the Hibernate vs JPA framework and the database system). Unless distributed transactions are required, avoid using them. Instead, use local transactions when possible. It should be noted that many Java application servers typically provide a default transaction behavior that utilizes distributed transactions.
For the best system performance, design the application to run under a single Connection object.
Using updateXXX Methods
Although programmatic updates do not apply to all types of applications, developers should attempt to use programmatic updates and deletes. Using the updateXXX () methods of the ResultSet object allows the developer to update data without building a complex SQL statement. Instead, the developer simply supplies the column in the result set that is to be updated and the data that is to be changed. Then, before moving the cursor from the row in the result set, the updateRow() method must be called to update the database as well.
In the following code fragment, the value of the Age column of the ResultSet object rs is retrieved using the method getInt(), and the method updateInt() is used to update the column with an int value of 25. The method updateRow() is called to update the row in the database that contains the modified value.
int n = rs.getInt("Age"); // n contains value of Age column in the resultset rs ... rs.updateInt("Age", 25); rs.updateRow();
In addition to making the application more easily maintainable, programmatic updates usually result in improved performance. Because the database server is already positioned on the row for the Select statement in process, performance-expensive operations to locate the row to be changed are not needed. If the row must be located, the server usually has an internal pointer to the row available (for example, ROWID).
Use getBestRowIdentifier() to determine the optimal set of columns to use in the Where clause for updating data. Pseudo-columns often provide the fastest access to the data, and these columns can only be determined by using getBestRowIdentifier().
Some applications cannot be designed to take advantage of positional updates and deletes. Some applications might formulate the Where clause by using all searchable result columns, by calling getPrimaryKeys(), or by calling getIndexInfo() to find columns that might be part of a unique index. These methods usually work, but might result in fairly complex queries.
Consider the following example:
ResultSet WSrs = WSs.executeQuery ("SELECT first_name, last_name, ssn, address, city, state, zip FROM emp"); // fetch data ... WSs.executeUpdate ("UPDATE EMP SET ADDRESS = ? WHERE first_name = ? and last_name = ? and ssn = ? and address = ? and city = ? and state = ? and zip = ?"); // fairly complex query
Applications should call getBestRowIdentifier() to retrieve the optimal set of columns (possibly a pseudo-column) that identifies a specific record. Many databases support special columns that are not explicitly defined by the user in the table definition but are "hidden" columns of every table (for example, ROWID and TID). These pseudo-columns generally provide the fastest access to the data because they typically are pointers to the exact location of the record. Because pseudo-columns are not part of the explicit table definition, they are not returned from getColumns. To determine if pseudo-columns exist, call getBestRowIdentifier().
Consider the previous example again:
... ResultSet WSrowid = getBestRowIdentifier() (... "emp", ...); ... WSs.executeUpdate ("UPDATE EMP SET ADDRESS = ? WHERE ROWID = ?"; // fastest access to the data!
If your data source does not contain special pseudo-columns, then the result set of getBestRowIdentifier() consists of the columns of the most optimal unique index on the specified table (if a unique index exists). Therefore, your application does not need to call getIndexInfo to find the smallest unique index.