recently I've implemented a consecutive integer primary key generator.
I used a stateless session bean which makes direct JDBC calls to the database, specifically two statement for every key request, a SELECT and a UPDATE. The sql commands are looked up in JNDI so that tables can be changed without recompiling.
At first I've made the mistake to use the following sql sentences:
SELECT VALUE FROM SEQUENCES WHERE SEQNAME=?
UPDATE SEQUENCES SET VALUE=? WHERE SEQNAME=?
but obviously this caused duplicated keys with cuncurrent requests.
Then I changed the second sql statement to
UPDATE SEQUENCES SET VALUE=? WHERE SEQNAME=? AND VALUE=?
This way I update the value only when no other request has hit the record in the time between the select and the update (transaction isolation of JDBC connection is READ_UNCOMMITED).
In the session bean I repeat the two statements until the update succeeds (executeUpdate returns 1 row updated).
I've tested the code generating keys from different threads (the only operation they do is getting the key) and with 20 concurrent threads most operations is completed in one loop, while others are completed in two loops.
If I use 50 concurrent threads, requests loop at most ten times. Session pool and connection pool size was set to 100.
I would like to hear some hints on how to improve this method and if it has some hidden problems with concurrency/clustering I have not considered.
Specifically I would like to know if it is best to use direct JDBC access like I've done or use an entity bean and catch the rollback exceptions.
You can improve performance by caching groups of IDs in memory rather than looking up the ID in the database every time. By retrieving ids values from the database once every 1000 or so primary keys, you significantly reduce calls to the database, which also reduces concurrency problems.
The only question is where to cache the id values. The simplest solution is to put them in instance variables your Session EJB. Ordinarily caching data values in Session EJB instance variables between requests is a bad idea. Since it does not really matter which id value the client gets, though, instance variables can be used safely in this particular case.
More sophisticated options exist. You might take a look at the discussions on primary key generation on p. 106 of Floyd Marinescu's EJB Design Patterns book, available in the Books and Articles section of this site.
Thank you for your comments.
I've read the section on primary key generation in Floyd's book (i've bought the book a couple of weeks ago and i've read almost all of it, very good reading!).
I understand caching some IDs in memory is very efficient but I wanted to create an EJB that worked for sequential keys. Maybe I can use another session bean in front of it that caches keys for use when I don't need sequential values (this requires that the key generation bean method accept an increment parameter).
But I'm particularly interested in the performance and robustness comparison between using an Entity Bean or my approach to access the sequences table in highly concurrent environiments.
Hmm. You are right. If the sequence of keys is important, than key caching won't work. Your algorithm is probably close to what you want, then. There are going to be some nasty concurrency issues, though, since every INSERT operation in the system will also perform a SELECT against your SEQUENCE table.
If you are not worried about portability, you might look at your EJB server to see if it supports some kind of autogenerated keys. I know that Weblogic has good support for various key generation algorithms. You might find that what you need is already built into your EJB Server. Alternately, you could look into your Database's sequence generator.
Failing that, I think your approach is probably best. To avoid deadlock, you may want to rewrite your algorithm so that is gives up looking for new keys after a certain number of tries (say 10). You should make this configurable, so you can expirement with Load vs. Max-Retry to get a good balance.
Finally, you DEFINITELY only want to use the algorithm for tables where precise, sequentially generated keys are necessary. For other tables, use Floyd's caching technique instead.