Java Development News:
EJB 2.1 adds new functionality to EJB QL: Monson-Haefel's Guide to Enterprise JavaBeans
By Richard Monson-Haefel
01 Nov 2002 | TheServerSide.com
Richard Monson-Haefel is the author of Enterprise
JavaBeans, 3rd Edition, Java
Message Service and one of the world's leading experts on Enterprise
Java. He is co-founder of OpenEJB, an open source EJB container used in
Apple Computer's WebObjects, and an independent consultant.
If you read the most recent edition of my book Enterprise JavaBeans, 3rd Edition, you may remember that, at the end of the chapter on EJB QL, I grumbled about several problems with the query language. My biggest complaint was the lack of an ORDER BY clause. I also pointed out other deficiencies like the limited number of functions available.
I'm happy to report that EJB 2.1 has largely addressed these shortcomings with the addition of several new functions and an ORDER BY clause. The purpose of this article is to describe these new features, not to provide an overview of EJB QL in general. At the end of the article I'll also present a short list of features and failings that the next version of the specification should address.
EJB 2.1 has added five new aggregate functions for the SELECT clause as well as a new MOD function for the WHERE clause.
EJB 2.0 defined six functions you can use in the WHERE clause of an EJB QL statement: CONCAT(), LENGTH(), LOCATE(), SUBSTRING(), ABS(), and SQRT(). EJB 2.1 introduces a new modulus functional expression; MOD() takes two integer values as parameters, divides the first by the second, and returns the remainder. For example:
MOD( 7, 3 ) = 1.
A far more important change, however, is the definition of five new aggregate functions to be used in the SELECT clause of a EJB QL statement including: COUNT(), MAX( ), MIN( ), AVG( ) and SUM( ). Aggregate functions are used with queries that return a collection of values. They are fairly simple to understand and can be handy, especially the COUNT() function. It's important to understand that aggregate functions can be used only with select methods, not with find methods. The find methods may return only a entity (local or remote) EJB Object references (either a single reference or a Collection of them).
Let's look at each of the new aggregate functions in detail. The examples use a Customer EJB that has a one-to-one relationship with an Address EJB and a one-to-many relationship with a Job EJB. A Job is a service which is bought for a lump sum, such as landscaping. In other words, there are no line items.
Entity relationship diagram.
COUNT(identifier or path expression)
This function returns the count of items in the final result set of the query. For example, the following query provides a count of all the customers who live in Wisconsin.
SELECT COUNT( c ) FROM Customers AS c WHERE c.address.state = 'WI'
The COUNT() function can be used either with identifiers, in which case it always counts entities, or with path expressions, in which case it counts either CMR fields or CMP fields. For example, the following provides a count of all the zip codes that start with the characters "554."
SELECT COUNT(c.address.zip) FROM Customers AS c WHERE c.address.zip LIKE '554%'
MAX(path expression), MIN(path expression)
MAX() and MIN() can be used to find the largest or smallest value in a collection of any type of CMP field. It cannot be used with identifiers or paths that terminate in a CMR field. The result type will be the type of CMP field that is being evaluated. For example, the following query returns the highest amount paid for a Job.
SELECT MAX( job.totalSale ) FROM Customer AS c, IN (c.jobs) AS job
The MAX() and MIN() functions can be applied to any valid CMP value including primitive types, strings, and even serializable objects. As you may recall, a CMP field can be a serializable application object that is converted to a binary field and stored in a database. The result of applying the MAX() and MIN() functions to serializable objects is not specified, however, because there is no standardized way of determining which serializable object is greater than another.
The result of applying the MAX() and MIN() functions to a String is type depends on the underlying data store used. This uncertainty results from problems inherent in String comparisons, which are addressed fully in the section "The Problems with Strings."
AVG(numeric ), SUM(numeric)
The AVG() and SUM() functions can be applied only to path expressions that terminate in a numeric primitive type (byte, short, int, long, double, float) or one of the corresponding wrappers (Byte, Short, Integer, etc.)
For example, the following query uses the SUM() function to get the total paid for all jobs by a customer (specified by input parameter).
SELECT SUM( job.totalSale ) FROM Customer AS c, IN( c.jobs ) AS job WHERE c = ?1
DISTINCT, nulls, and empty arguments
You can use the DISTINCT operator to eliminate duplicate values before the query applies any aggregate function. The following query modifies one shown earlier. Employing the DISTINCT operator changes the outcome of the query. Instead of counting all the zip code entries that match that pattern, it counts the number of different zip codes that match the string pattern.
SELECT DISTINCT COUNT(c.address.zip) FROM Customers AS c WHERE c.address.zip LIKE '554%'
In this case, the DISTINCT operator first eliminates duplicate zip codes, so if multiple customers live in an area with the same zip code, that zip code is counted only once. Then the COUNT() function counts the number of unique zip codes. The results of this query are dramatically different from those of the earlier query. Without the DISTINCT operator, 100 customers living in the same zip code increases the total count by 100, rather than by 1.
AVG(), SUM(), MAX(), and MIN() return null when the argument they evaluate is an empty collection. For example, the following query attempts to obtain the average price paid by a specific Customer for all jobs.
SELECT AVG( job.totalSale ) FROM Customer AS c, IN( c.jobs ) AS job WHERE c = ?1
If the Customer specified by the input parameter has no jobs, then the collection that the AVG() function is supposed to operate on will be empty (no jobs, no total sales), and the query will return null. Unlike the rest of the aggrate funtions, COUNT() returns 0 (zero) when the argument it evaluates is an empty collection.
The exact return type of the aggregate funtions is still being discussed by the EJB 2.1 Expert Group, but it likely that COUNT() will return an integral type (i.e. long or Long), SUM() and AVG() will return an integral or floating-point type, and MAX() and MIN() will return the same type that they operate on.
The new ORDER BY clause
The ORDER BY clause allows you to order the resulting collection of a query the way you want it; without it, output order is unpredictable. The reason EJB 2.0 didn't include an ORDER BY clause has a lot to do with the problem of comparing string types. In this section I'll first explain this problem and then explain how to use the ORDER BY clause.
The problem with strings
The following excerpt from my book Enterprise JavaBeans, 3rd Edition, (O'Reilly 2001) summarizes the problem.
According to Sun, the ORDER BY clause was not included in [ EJB 2.0 ] because of problems dealing with the mismatch in ordering behavior between the Java language and databases. The example they gave had to do with string values. The semantics of ordering strings in a database may be different [from] those of the Java language. For example, Java orders String types according to character sequence and case (uppercase vs. lowercase). Different databases may or may not consider case while ordering, or they may not consider leading or trailing whitespace. In light of these possible differences, it seems as though Sun has a reasonable argument, but only for limiting the portability of ORDER BY, not for eliminating its use all together. EJB developers can live with less than perfect portability of the ORDER BY clause, but they cannot live without the ORDER BY clause altogether.
The problems with string comparisons have not changed since then, and they probably never will. The difference is that in EJB 2.1 Sun has finally decided to accept less-than-perfect portability of string comparisons in order to support the desperately needed ORDER BY clause. In fact, EJB 2.1 also allows string comparisons using the greater-than and less-than symbols ( > , < ) in WHERE statements.
For EJB QL to maintain its status as an abstraction of native query languages (SQL-92, JDOQL, OQL, etc.) it cannot dictate String ordering, because native query languages may have very different ordering rules. In fact, relational database vendors differ on the question of String ordering, which makes it all but impossible to standardize ordering even for SQL-compliant databases.
Of course this is all academic if you plan on using the same database well into the future. In this case, it's best to examine the documentation for the database. Finding out how it orders strings will tell you exactly how your EJB QL comparisons will work.
Using the ORDER BY clause
Now that you have an appreciation for the problems associated with string comparisons, you are better prepared to use the ORDER BY clause.
The ORDER BY clause will be familiar to people with SQL experience because the semantics are basically the same. As an example, you can construct a simple query that uses the ORDER BY clause to return an alphabetical list of all Customers.
SELECT OBJECT( c ) FROM Customers AS c ORDER BY c.lastName
This might return a Collection of Customer EJB's in the following order (assume their last and first names are printed to output).
Aares, John Astro, Linda Brooks, Hank . . Xerces, Karen Zastro, William
You can use the ORDER BY clause with or without the WHERE clause. For example, you can refine the previous query so it lists only those customers who reside in Boston, Massachusetts.
SELECT OBJECT( c ) FROM Customers AS c WHERE c.address.city = 'Boston' AND c.address.state = 'MA' ORDER BY c.lastName
You can specify the order of the listing as ascending or descending by using the keywords ASC and DESC. The default order is always ascending, which means that lesser values are listed before greater values. You can list the Customer EJBs in descending order by specifying DESC, as in this example:
SELECT OBJECT( c ) FROM Customers AS c ORDER BY c.lastName DESC
Zastro, William Xerces, Karen . . Brooks, Hank Astro, Linda Aares, John
You can specify multiple order-by fields. For example, you can sort the Customer EJB by lastName in ascending order and firstName in descending order as follows:
SELECT OBJECT( c ) FROM Customers AS c ORDER BY c.lastName ASC, c.firstName DESC
If you have five Customer EJB's with the lastName equal to "Brooks", this query will sort the results as follows:
... Bond, James Brooks, William Brooks, Henry Brooks, Hank Brooks, Ben Brooks, Andy Brundage, Avery ...
The fields used in the ORDER BY clause must be CMP fields, they cannot be entity identifiers or CMR fields. In addition, you must be careful which CMP fields you specify in the ORDER BY clause. If the query selects a collection of entities, the ORDER BY clause can be used only with CMP fields of the entity type selected. For example the following query is illegal, because the CMP field used in the ORDER BY clause is not a field of the entity type selected.
SELECT OBJECT( c ) FROM Customer AS c ORDER BY c.address.city
Because the city CMP field is not a direct CMP field of the Customer EJB, you cannot use it in the ORDER BY clause. The only CMP fields you can use in the ORDER BY clause are those that are direct CMP fields of the entity type being selected. This is an unreasonable restriction.
A similar restriction applies to CMP results. The CMP field used in the ORDER BY clause must be the same as the CMP field identified in the SELECT clause. For example, the following query is illegal, because the CMP identified in the SELECT clause differs from the one used in the ORDER BY clause.
SELECT c.address.city FROM Customer AS c ORDER BY c.address.state
In the above query, we wanted a list of all the cities, ordered by state. Unfortunately, this is illegal. You can't order by the state CMP field if you are selecting using the city CMP field.
I've heard that this second restriction may have been forced on us by the limitations of one of the major RDBS systems, which cannot order by any columns not present in the select clause - remember that EJB QL is compiled into a native query language, which in RDBS is SQL. The limitations of major vendors will be a factor in creating any abstraction - the least common denominator will prevail.
Problems with EJB QL
Although the improvements made to EJB QL (specifically the addition of the ORDER BY clause and new functions) deserve our appreciation, there remain several features that should still be added to the query language and at least one that should be deprecated:
- Deprecate the use of the OBJECT( ) operator. It's not necessary and simply muddles the syntax.
- Add support for java.util.Date or java.util.Calendar or both as types for identifiers and parameters. There is also a need to support date literals.
- Add support for more functions like UPPER( ), LOWER( ), and CAST( ), and for date functions (DOW(), MONTH(), etc.).
- Add support for multiple expressions in the SELECT statement, so that an EJB QL query can return more than one column of data in the result set.
- Add support for GROUP BY and HAVING clauses.
- Add support for subqueries.
The purpose of this article was to explain new features that EJB 2.1 adds to EJB QL. The article is not comprehensive and doesn't touch on some other enhancments. For example, EJB 2.1 provides a much cleaner explanation of how null values should be handled and pins EJB QL down to "inner join" semantics.
That said, this article should provide you with a clear vision of what is to come. Obviously, EJB QL is a lot better off with the ORDER BY clause and the new aggregate functions, but the query language has a way to go before its functionality is complete - if that is even possible.
An important thing to remember is that EJB QL, as an abstraction, will never give you the level of control that you get with native query languages. Even relational database vendors that use the same basic schema offer unique native query extensions. It's been the practice of EJB vendors to offer extensions to EJB QL, too. Some vendors, for example, have been supporting the ORDER BY clause all along. These extensions are good, but should be approached with caution. For most projects the added power of custom extensions to EJB QL is welcome, but the disinction between vendor-specific EJB QL features and standard features should be prominently documented.
The power of EJB QL is portablity, and not just portablity of code. Most organizations choose a database vendor and stick with it - if they need to change to a new vendor, updating EJB QL statements will be the least of their worries. For many the real value of EJB QL will be the portablity of skill. As a developer you should become adept with basic EJB QL syntax and semantics, and be able to apply those skills to any EJB product. Of course, you'll also want to know the proprietary extensions offered by different vendors - this is also true of SQL - but EJB QL can serve as an important baseline for portablity of your skills.