July 2004
Introduction
Developing performance-oriented JDBC applications is not easy. JDBC drivers
do not throw exceptions to tell you when your code is running too slow.
This series of articles presents some general guidelines for improving JDBC
application performance that have been compiled by examining the JDBC implementations
of numerous shipping JDBC applications. These guidelines include:
- Using Database MetaData Methods Appropriately
- Retrieving only Required Data
- Selecting Functions that Optimize Performance
- Managing Connections and Updates
- Designing and Running Benchmarks for Performance
In the last article, we discussed selecting functions that optimize performance.
These general rules about managing connections and updates should help you solve
some additional common JDBC system performance problems.
Managing Connections and Updates
The guidelines in this section will help you to manage connections and updates
to improve system performance for your JDBC applications.
Managing Connections
Connection management is important to application performance. Optimize your
application by connecting once and using multiple statement objects, instead
of performing multiple connections. Avoid connecting to a data source after
establishing an initial connection.
Although gathering driver information at connect time is a good practice,
it is often more efficient to gather it in one step rather than two steps. For
example, some applications establish a connection and then call a method in
a separate component that reattaches and gathers information about the driver.
Applications that are designed as separate entities should pass the established
connection object to the data collection routine instead of establishing a second
connection.
Another bad practice is to connect and disconnect several times throughout
your application to perform SQL statements. Connection objects can have multiple
statement objects associated with them. Statement objects, which are defined
to be memory storage for information about SQL statements, can manage multiple
SQL statements.
You can improve performance significantly with connection pooling, especially
for applications that connect over a network or through the World Wide Web.
Connection pooling lets you reuse connections. Closing connections does not
close the physical connection to the database. When an application requests
a connection, an active connection is reused, thus avoiding the network I/O
needed to create a new connection.
In addition to connection pooling tuning options, JDBC 3.0 also specifies
semantics for providing a statement pool. Similar to connection pooling, a statement
pool caches PreparedStatement objects so that they can be re-used from a cache
without application intervention. For example, an application may create a PreparedStatement
object similar to the following SQL statement:
select name, address, dept, salary from personnel
where empid = ? or name like ? or address = ?"
When the PreparedStatement object is created, the SQL query is parsed for
semantic validation and a query optimization plan is produced. The process of
creating a prepared statement is extremely expensive in terms of performance
with some database systems such as DB2. Once the prepared statement is closed,
a JDBC 3.0-compliant driver places the prepared statement into a local cache
instead of discarding it. If the application later attempts to create a prepared
statement with the same SQL query, a common occurrence in many applications,
the driver can simply retrieve the associated statement from the local cache
instead of performing a network roundtrip to the server and an expensive database
validation.
Connection and statement handling should be addressed before implementation.
Spending time and thoughtfully handling connection management improves application
performance and maintainability.
Managing Commits in Transactions
Committing transactions is slow due to the result of disk I/O and potentially
network I/O. Always turn Autocommit off by using the WSConnection.setAutoCommit(false)
setting.
What does a commit actually involve? The database server must flush back to
disk every data page that contains updated or new data. This is usually a sequential
write to a journal file, but nonetheless, is a disk I/O. By default, Autocommit
is on when connecting to a data source, and Autocommit mode usually impairs
performance because of the significant amount of disk I/O needed to commit every
operation.
Furthermore, most database servers do not provide an Autocommit mode natively.
For this type of server, the JDBC driver must explicitly issue a COMMIT statement
and a BEGIN TRANSACTION for every operation sent to the server. In addition
to the large amount of disk input/output required to support Autocommit mode,
a performance penalty is paid for up to three network requests for every statement
issued by an application.
Although using transactions can help application performance, do not take this
tip too far. Leaving transactions active can reduce throughput by holding locks
on rows for long times, preventing other users from accessing the rows. Commit
transactions in intervals that allow maximum concurrency.
Choosing the Right Transaction Model
Many systems support distributed transactions; that is, transactions that span
multiple connections. Distributed transactions are at least four times slower
than normal transactions due to the logging and network I/O necessary to communicate
between all the components involved in the distributed transaction (the JDBC
driver, the transaction monitor, and the database system). Unless distributed
transactions are required, avoid using them. Instead, use local transactions
when possible. It should be noted that many Java application servers typically
provide a default transaction behavior that utilizes distributed transactions.
For the best system performance, design the application to run under a single
Connection object.
Using updateXXX Methods
Although programmatic updates do not apply to all types of applications, developers
should attempt to use programmatic updates and deletes. Using the updateXXX
() methods of the ResultSet object allows the developer to update data without
building a complex SQL statement. Instead, the developer simply supplies the
column in the result set that is to be updated and the data that is to be changed.
Then, before moving the cursor from the row in the result set, the updateRow()
method must be called to update the database as well.
In the following code fragment, the value of the Age column of the ResultSet
object rs is retrieved using the method getInt(), and the method updateInt()
is used to update the column with an int value of 25. The method updateRow()
is called to update the row in the database that contains the modified value.
int n = rs.getInt("Age");
// n contains value of Age column in the resultset rs
...
rs.updateInt("Age", 25);
rs.updateRow();
In addition to making the application more easily maintainable, programmatic
updates usually result in improved performance. Because the database server
is already positioned on the row for the Select statement in process, performance-expensive
operations to locate the row to be changed are not needed. If the row must be
located, the server usually has an internal pointer to the row available (for
example, ROWID).
Using getBestRowIdentifier()
Use getBestRowIdentifier() to determine the optimal set of columns to use in
the Where clause for updating data. Pseudo-columns often provide the fastest
access to the data, and these columns can only be determined by using getBestRowIdentifier().
Some applications cannot be designed to take advantage of positional updates
and deletes. Some applications might formulate the Where clause by using all
searchable result columns, by calling getPrimaryKeys(), or by calling getIndexInfo()
to find columns that might be part of a unique index. These methods usually
work, but might result in fairly complex queries.
Consider the following example:
ResultSet WSrs = WSs.executeQuery
("SELECT first_name, last_name, ssn, address, city, state, zip
FROM emp");
// fetch data
...
WSs.executeUpdate ("UPDATE EMP SET ADDRESS = ?
WHERE first_name = ? and last_name = ? and ssn = ?
and address = ? and city = ? and state = ?
and zip = ?");
// fairly complex query
Applications should call getBestRowIdentifier() to retrieve the optimal set
of columns (possibly a pseudo-column) that identifies a specific record. Many
databases support special columns that are not explicitly defined by the user
in the table definition but are "hidden" columns of every table (for
example, ROWID and TID). These pseudo-columns generally provide the fastest
access to the data because they typically are pointers to the exact location
of the record. Because pseudo-columns are not part of the explicit table definition,
they are not returned from getColumns. To determine if pseudo-columns exist,
call getBestRowIdentifier().
Consider the previous example again:
...
ResultSet WSrowid = getBestRowIdentifier()
(... "emp", ...);
...
WSs.executeUpdate ("UPDATE EMP SET ADDRESS = ?
WHERE ROWID = ?";
// fastest access to the data!
If your data source does not contain special pseudo-columns, then the result
set of getBestRowIdentifier() consists of the columns of the most optimal unique
index on the specified table (if a unique index exists). Therefore, your application
does not need to call getIndexInfo to find the smallest unique index.
PRINTER FRIENDLY VERSION
|