Selecting Functions that Optimize Performance

Java Development News:

Selecting Functions that Optimize Performance

By John Goodson

01 Jun 2004 | TheServerSide.com

Part I - Using Database MetaData methods appropriately

Part II - Retrieving Only Required Data

Part III - Selecting Functions that Optimize Performance

Part IV - Managing Connections and Updates


Introduction

John Goodson is a JDBC Expert Panel Member and VP of Research and Development at DataDirect Technologies. Check out his Tech Talk on TheServerSide.com, where he discusses Database Access Technologies.

Developing performance-oriented JDBC applications is not easy. JDBC drivers do not throw exceptions to tell you when your code is running too slow.

This series of tip presents some general guidelines for improving JDBC application performance that have been compiled by examining the JDBC implementations of numerous shipping JDBC applications. These guidelines include:

  • Using Database MetaData Methods Appropriately
  • Retrieving only Required Data
  • Selecting Functions that Optimize Performance
  • Managing Connections and Updates
  • Designing and Running Benchmarks for Performance

In the last performance tip, we discussed some tips of how to retrieve only the data that you require. These general rules about selecting functions that optimize performance should help you solve some common JDBC system performance problems.

Selecting JDBC Objects and Methods

The guidelines in this section will help you select which JDBC objects and methods will give you the best performance.

Using Parameter Markers as Arguments to Stored Procedures

When calling stored procedures, always use parameter markers for the argument markers instead of using literal arguments. JDBC drivers can call stored procedures on the database server either by executing the procedure as any other SQL query, or by optimizing the execution by invoking a Remote Procedure Call (RPC) directly into the database server. When you execute the stored procedure as a SQL query, the database server parses the statement, validates the argument types, and converts the arguments into the correct data types.

Remember that SQL is always sent to the database server as a character string, for example, “{call getCustName (12345)}”. In this case, even though the application programmer might assume that the only argument to getCustName is an integer, the argument is actually passed inside a character string to the server. The database server would parse the SQL query, consult database metadata to determine the parameter contract of the procedure, isolate the single argument value 12345, then convert the string ‘12345’ into an integer value before finally executing the procedure as a SQL language event.

By invoking an RPC inside the database server, the overhead of using a SQL character string is avoided. Instead, a JDBC driver will construct a network packet that contains the parameters in their native data type formats and execute the procedure remotely.

Case 1

In this example, the Stored Procedure cannot be optimized to use a server-side RPC. The database server must treat the SQL request as a normal language event which includes parsing the statement, validating the argument types, and converting the arguments into the correct data types before executing the procedure.

CallableStatement cstmt = conn.prepareCall (
   "{call getCustName (12345)}"); 
ResultSet rs = cstmt.executeQuery ();  

Case 2

In this example, the stored procedure can be optimized to use a server-side RPC. Because the application avoids literal arguments and calls the procedure by specifying all arguments as parameters, the JDBC driver can optimize the execution by invoking the stored procedure directly inside the database as an RPC. The SQL language processing on the database server is avoided and the execution time is greatly improved.

CallableStatement cstmt - conn.prepareCall (
   "{call getCustName (?)}");
cstmt.setLong (1,12345);
ResultSet rs = cstmt.executeQuery();

Using the Statement Object Instead of the PreparedStatement Object

JDBC drivers are optimized based on the perceived use of the functions that are being executed. Choose between the PreparedStatement object and the Statement object depending on the planned use. The Statement object is optimized for a single execution of a SQL statement. In contrast, the PreparedStatement object is optimized for SQL statements that will be executed two or more times.

The overhead for the initial execution of a PreparedStatement object is high. The benefit comes with subsequent executions of the SQL statement. For example, suppose I am preparing and then executing a query that returns employee information based on an ID. A JDBC driver will most likely process the prepare request by making a network request to the database server to parse and optimize the query. The execute then results in another network request. Inside JDBC drivers, reducing network communication is the ultimate goal. If my application will only make a request like this once during the life of the application, then use the Statement object. With a Statement object, the same query execution will result in only a single network roundtrip to the database server.

This guideline is complicated by the use of prepared statement pooling because the scope of execution is longer. When using prepared statement pooling, if a query is ad-hoc and will likely never be executed again, then use the Statement object. If a query will be executed infrequently, but may be executed again during the life of a statement pool inside a connection pool, then use a PreparedStatement. Under the same circumstances without statement pooling, use the Statement object.

Using Batches Instead of Prepared Statements

Updating large amounts of data typically is done by preparing an INSERT statement and executing that statement multiple times, resulting in numerous network roundtrips. To reduce the number of JDBC calls and improve performance, you can send multiple queries to the database at a time using the addBatch() method of the PreparedStatement object. For example, let us compare the following examples, Case 1 and Case 2.

Case 1: Executing Prepared Statement Multiple Times

PreparedStatement ps = conn.prepareStatement(
   "INSERT into employees values (?, ?, ?)");

for (n = 0; n < 100; n++) {

  ps.setString(name[n]);
  ps.setLong(id[n]);
  ps.setInt(salary[n]);
  ps.executeUpdate();
}

Case 2: Using a Batch

PreparedStatement ps = conn.prepareStatement(
   "INSERT into employees values (?, ?, ?)");

for (n = 0; n < 100; n++) {

  ps.setString(name[n]);
  ps.setLong(id[n]);
  ps.setInt(salary[n]);
  ps.addBatch();
}
ps.executeBatch();

In Case 1, a prepared statement is used to execute an INSERT statement multiple times. In this case, 101 network roundtrips are required to perform 100 insert operations?1 roundtrip to prepare the statement and 100 additional roundtrips to execute each iteration. When the addBatch() method is used to consolidate 100 INSERT operations, as demonstrated in Case 2, only two network roundtrips are required?one to prepare the statement and another to execute the batch. Although more database CPU cycles are involved by using batches, performance is gained through the reduction of network roundtrips. Remember that the biggest gains in performance with JDBC drivers is found by reducing network communication between the JDBC driver and the database server.

Choosing the Right Cursor

Choosing the appropriate type of cursor allows maximum flexibility for your application. This section summarizes the performance issues of three types of cursors.
A forward-only cursor provides excellent performance for sequential reads of all of the rows in a table. For retrieving table data there is no faster way to retrieve result rows than with a forward only cursor. However, it cannot be used when the application must process rows in a non-sequential manner.

Insensitive cursors used by JDBC drivers are ideal for applications that require high levels of concurrency on the database server and require the ability to scroll forwards and backwards through result sets. The first request to an insensitive cursor fetches all (or many but not all rows when a JDBC driver using “lazy” fetching) of the rows and stores them on the client. Thus, the first request is very slow, especially when long data is retrieved. Subsequent requests do not require any network traffic (or limited network traffic when a driver uses lazy fetching) and are processed quickly. Because the first request is processed slowly, insensitive cursors should not be used for a single request of one row. Developers should also avoid using insensitive cursors when long data is returned, because memory can be exhausted. Some insensitive cursor implementations cache the data in a temporary table on the database server and avoid the performance issue, but most cache the information local to the application.

Sensitive cursors, sometimes called keyset-driven cursors, use identifiers, such as a ROWID, that already exist in your database. When you scroll through the result set, the data for the identifiers is retrieved. Because each request generates network traffic, performance can be very slow. However, returning non-sequential rows does not further affect performance.

To illustrate this further, consider an application that would normally return 1000 rows to an application. At execute time or when the first row is requested, a JDBC driver would not execute the SELECT statement that was provided by the application. Instead, the JDBC driver would replace the SELECT list of the query with the key identifier, for example, ROWID. This modified query would then be executed by the driver and all 1000 key values would be retrieved from the database server and cached for use by the driver. Each request from the application for a result row directs the JDBC driver to look up the key value for the appropriate row in it’s local cache, construct an optimized query that contains a WHERE clause similar to ‘WHERE ROWID = ?’, execute the modified query, and then retrieve the single result row from the server.

Sensitive cursors are the preferred scrollable cursor model for dynamic situations when the application cannot afford to buffer the data from an insensitive cursor.

Using get Methods Effectively

JDBC provides a variety of methods to retrieve data from a result set, such as getInt(), getString(), and getObject(). The getObject() method is the most generic and provides the worst performance when the non-default mappings are specified. This is because the JDBC driver must do extra processing to determine the type of the value being retrieved and generate the appropriate mapping. Always use the specific method for the data type.

To further improve performance, provide the column number of the column being retrieved, for example, getString(1), getLong(2), and getInt(3), instead of the column name. If column numbers are not specified, network traffic is unaffected, but costly conversions and lookups increase. For example, suppose you use getString("foo") ... A driver might have to convert the column delimeter foo to uppercase (if necessary), and then compare "foo" with all the column names in the column list. If instead, the driver went directly to result column 23, a significant amount of processing would be saved.

For example, suppose you have a result set that has 15 columns and 100 rows, and the column names are not included in the result set. You are interested in three columns, EMPLOYEENAME (a string), EMPLOYEENUMBER (a long integer), and SALARY (an integer). If you specify getString(“EmployeeName”), getLong(“EmployeeNumber”), and getInt(“Salary”), each column name must be converted to the appropriate case of the columns in the database metadata, and lookups would increase considerably. Performance would improve significantly if you specify getString(1), getLong(2), and getInt(15).

Retrieving Auto-Generated Keys

Many databases have hidden columns (called pseudo-columns) that represent a unique key over every row in a table. Typically, using these types of columns in a query is the fastest way to access a row because the pseudo-columns usually represent the physical disk address of the data. Prior to JDBC 3.0, an application could only retrieve the value of the pseudo-columns by executing a SELECT statement immediately after inserting the data.

For example:
//insert row
int rowcount = stmt.executeUpdate (
   "insert into LocalGeniusList (name) values ('Karen')"); 
// now get the disk address - rowid - for the newly inserted row
ResultSet rs = stmt.executeQuery (
   "select rowid from LocalGeniusList where name = 'Karen'");

Retrieving pseudo-columns this way has two major flaws. First, retrieving the pseudo-column requires a separate query to be sent over the network and executed on the server. Second, because there may not be a primary key over the table, the search condition of the query may be unable to uniquely identify the row. In the latter case, multiple pseudo-column values can be returned, and the application may not be able to determine which value is actually the value for the most recently inserted row.

An optional feature of the JDBC 3.0 specification is the ability to retrieve auto-generated key information for a row when the row is inserted into a table.

For example:
int rowcount = stmt.executeUpdate (
   "insert into LocalGeniusList (name) values ('Karen')",
// insert row AND return key
Statement.RETURN_GENERATED_KEYS); 
ResultSet rs = stmt.getGeneratedKeys (); 
// key is automatically available

Now, the application contains a value that can be used in a search condition to provide the fastest access to the row and a value that uniquely identifies the row, even when a primary key doesn't exist on the table.

The ability to retrieve auto-generated keys provides flexibility to the JDBC developer and creates performance boosts when accessing data.

Choosing the Right Data Type

Retrieving and sending certain data types can be expensive. When you design a schema, select the data type that can be processed most efficiently. For example, integer data is processed faster than floating-point data or decimal data. Floating-point data is defined according to internal database-specific formats, usually in a compressed format. The data must be decompressed and converted into a different format so that it can be processed by the database wire protocol.

Retrieving Result Sets

Most JDBC drivers cannot implement scrollable cursors because of limited support for scrollable cursors in the database system. Unless you are certain that the database supports using a scrollable result set, (for example, rs), do not call rs.last() and rs.getRow() methods to find out how many rows the result set has. For JDBC drivers that emulate scrollable cursors, calling rs.last() results in the driver retrieving all results across the network to reach the last row. Instead, you can either count the rows by iterating through the result set or get the number of rows by submitting a query with a COUNT column in the SELECT clause.

In general, do not write code that relies on the number of result rows from a query because drivers must fetch all rows in a result set to know how many rows the query will return.

PRINTER FRIENDLY VERSION