Proposal for a Portable, High Performance Primary Key Generator

Discussions

EJB design: Proposal for a Portable, High Performance Primary Key Generator

  1. Hi all,

    Like most of the developers working in the EJB world, one beautiful day, I came across the problem of generating primary keys for my entity beans. As I wondered around the web to look for some information about the problem, I found lots of interesting discussions, especially the Entity Bean Primary Key Generator thread on theserverside.com, but no comprehensive, complete solution.

    So I decided to try to synthesize what has been said so far in the thread. Since the problem is a fairly basic and general one I decided to write code that would completely implement the solution that I thought was the one emerging from this discussion thread.

    I made the code freely available here . I wanted to make it available on the SourceForge, hoping that it would take advantage of the open source community and offer a quality and stable solution for everyone. But either I am stupid or it is not a simple task. So after trying for 4 hours to setup a project, I decided to give it a miss. Please if anybody can help me on this contact me at ehsforward at yahoo dot com. It would be great if the code was in a central repository available to everyone, so that everyone could use it, improve it, and test it on your respective platforms.

    I think/hope the initial quality is up to it. I tested the code on WebSphere 3.5 and it is working great.

    Please, read Scott Ambler's article ( here) BEFORE making any comments/contributions to this posting. References to postings from this thread will be made, with links directing you to them.

    The code implements the following aspects discussed the Entity Bean Primary Key Generator thread:

    - Uses a key composed of a 112 bits HIGH key, a 16 bits LOW key and a unique enterprise identifier (as per Scott Ambler's article).

    - Stores the key as a String in the database (as per Scott Ambler's article).

    - Supposes that all classes will use the same generic generation mechanism to obtain UIDs for their objects (as per Scott Ambler's article).

    - Creates the HIGH key automatically in the database if it is not found.

    - Uses singleton/factory pattern as discussed by Serge Monkewitz (August 8, 2000). This means that there will be one factory per JVM.
    [ NOTE: As mentioned in various postings, some EJB servers, such a SilverStream and Gemstone, are creating and destroying JVMs dynamically. I took the same position as Scott Amblers when he says in his article: "Yes, this is wasteful, but when you are dealing with a 112-bit HIGHs, who cares?" If this is really an issue, decisions might need to be taken at the level of such servers' configuration, but this is a separate discussion all together.]

    - Uses a byte array to represent the HIGH and LOW keys. The byte arrays are encapsulated in a Key class that implements various functionalities such as incrementing the key, converting back and forth to String, etc. I think this solution is acceptable performance wise although I did not study the problem thoroughly.

    - In a first try (e.g. just an idea out of the blue) to solve the "hotspot" problem in index pages updates mentioned by Eddie Fung (July 30, 2000), I used the strategy of incrementing the keys (byte arrays) in inverse order.

    This means that instead of incrementing like this:

    00000000 00000000 ... 00000000 00000000 -> 00000000 00000000 ... 00000000 00000001
    and:
    00000000 00000000 ... 00000000 11111111 -> 00000000 00000000 ... 00000001 00000000

    the ByteArray class would increment the following way:

    00000000 00000000 ... 00000000 00000000 -> 00000001 00000000 ... 00000000 00000000
    and:
    11111111 00000000 ... 00000000 00000000 -> 00000000 00000001 ... 00000000 00000000

    This means that instead of waiting for the end of the byte array to see any difference between HIGH keys, the difference will appear at the beginning (especially at early stages of the life of the UID generator).

    - Uses a Session bean with a "select for update" to get the HIGH key as discussed by Gal Binyamini (September 3, 2000) . According to Gal (please anybody, shout if there is some disagreement here), the resulting bean does not require the use of a TX_SERIALIZEABLE transaction, BUT you should use TX_REQUIRES_NEW to make the bean highly available.

    - Makes sure the AUTOCOMMIT is turn off and then back to its initial setting after the comment from Weicong Wang (September 28, 2000) .

    Cheers,

    Emmanuel

    Threaded Messages (20)

  2. Right,

    There seem to be some problems trying to download the code. Try here. instead.

    Floyd, I would be grateful if you could update the link in the initial posting to avoid confusion...

    Emmanuel
  3. Emmanuel,

       Wow, thanks for taking the initiative to put together the source code! Awesome! To get this pattern more permanent attention, I would suggest posting it into our patterns repository. You could call it the "HIGH/LOW Session Bean Primary Key Generator". I could place the code for download locally onto TheServerSide if you wish. Email me (webmaster@theserverside) for more details.

    Floyd
  4. Hi,
       I had some doubt regarding the singleton pattern that you are using to get a unique key. In a clustered environment, you will be running the same application on different app servers on different machines. Hence each instance of this clustered application will be on its own JVM. In such a case, there will be multiple instances of your singleton object and there is a chance that the key being generated is not unique.

       Is the above arguement correct? If not, why?

    Regards
    Anuj
  5. Anuj,

    No you are not correct.

    Each singleton in their respective JVM will have their own HIGH key, which they get from the database. Therefore all keys generated by each singleton will be unique.

    Study the thread mentioned above more carefully. I think that pretty much all problems have been ironed out.

    Emmanuel
  6. Emmanuel,

    Thanks for this piece of code and well organised summary of the patterns issues. One thing missing in the documentation is what kind of DDL is necessary for the UID high table.

    Could you give a recommendation please?

    thanks,

    Bradley
  7. Hi Bradley,

    Yes you are right I should add it to the package.

    The table I use is called "HIGH_KEY" and as a "NAME" column, which is a CHAR(50) (I think it should be big enough) and a "VALUE" column, which is a CHAR(28) (112 bits = 14 bytes = 28 chars).

    The resulting SQL should be:

    CREATE TABLE HIGH_KEY
    (NAME CHAR(50) NOT NULL,
    VALUE CHAR(28) NOT NULL);

    Emmanuel
  8. I'm thinking the key field for the Entity EJB that is using this generated ID needs to be at least 78 chars in size?

    By the way, I implemented this to our solution package and it is a very ingenius solution to a fairly wide problem.
  9. Hi Tae,

    Glade you are using this stuff.

    To answer to your request, a uoid is a combination of a 112 bits (or 14 bytes) HIGH value, a 16 bits (or 2 bytes) LOW value and a unique identifier. The string representation of a byte used here is made of 2 chars, making the uoid a 32+<length to the unique identifier> long string.

    I would use 20 chars for the unique identifier making the the key field 52 chars long.

    Emmanuel
  10. SQL Server 7[ Go to top ]

    The code as displayed does not work with a SQL Server database. SQL Server does not support the "FOR UPDATE" outside of a DECLARE CURSOR statement.

    I found this in the SQL Server Developers Resource Kit:

    "Locking Requested Rows
    Oracle uses the FOR UPDATE clause to lock rows specified in the SELECT statement. Usually you don't need to use the equivalent clause in SQL Server. Don't confuse the FOR BROWSE and the FOR UPDATE clause. The FOR BROWSE clause is a specialized facility for use in client application programs that need additional metadata at run time."

    Removing the FOR UPDATE does allow it to work in SQL Server 7, but I'm not sure if that is the best approach to a portable solution.

    Jason
  11. SQL Server 7[ Go to top ]

    Jason,

    Hmmm. You need the FOR UPDATE statement. Otherwise you do not have exclusive access to the row and the whole thing is not concurrently safe anymore.

    If you find out how to introduce a DECLARE CURSOR in the code and make the whole thing work with it, I am happy to include it in the release.

    Emmanuel
  12. SQL Server 7 and More[ Go to top ]

    I ran into a small bug that I am having (again with SQL Server). In the Key class the method asBytes can throw a NumberFormatException when trailing spaces are included in the String argument. The simple solution is to add the line:
    str = str.trim();
    to the first line of the asBytes method.

    When I get some time I can look into the Cursor stuff in SQL Server and try to come up with something to make it work correctly.

    Jason
  13. SQL Server 7 and More[ Go to top ]

    Thanks for the note Jason, I will include that to my todo list.

    I managed to setup a SourceForge SourceForge. So if anybody has any other bugs to report, use the bug tracker available there.

    Emmanuel
  14. SQL Server 7 and More[ Go to top ]

    Hi Jason,

    Have you figured out how to use DECLARE Cursor for EJBUtils?

    If so, can u forward the code to kyoung at fullerene dot com

    Thanks,

    Kenneth
  15. For a ready-to-use solution, avoiding or solving the issues mentioned in this (and the previous) thread you can try the UIDGenerator provided by X-Steps. Additionally it is portable by not accessing the database directly; a free evaluation version is available.
    Take a look at:
    http://www.xsteps.com/english/products/ejbs/uidGenerator.shtml

    enjoy,

    Messi
  16. Thanks Messi,

    I am sure we will enjoye this one too :)

    Emmanuel
  17. Thanks for this posting... exactly what I had been
    looking for.

    One problem I found... using IAS 4.1.1 and Oracle 8i on
    NT: The session bean fails to find an existing row in the
    database because the database padded the NAME field with
    blanks. In my case I defined a NAME column as CHAR(50),
    but the SELECT fails to find it because I have a shorter
    name than that. Updates fail for the same reason.

    Why does this fail for me but (apparently) not for others
    that have used this...? If I change the string returned
    by provideUniqueIdentifier() to be padded out to 50 chars
    in the Dispenser then it works fine...

    -Mark
  18. From UIDDispenser.java:

    public static final int LOW_KEY_BYTES = 1; // equals 16 bits


    New math? :-)
  19. yep Mark,

    School days are far, far away ;)

    for the latest release (which fixes this) got to http://sourceforge.net/projects/ejbutils .

    You can also monitor the releases there (will send you an email as soon as one comes out that is worth it), participate, submit bugs, submits proposal for improvements and stuff!

    I also setup a mailing list (should be available as of tomorrow)

    I am concentrating a bit on the deployment side of it (deployement file, etc)

    Emmanuel
  20. Mark,

    I can think of one thing for your problem:

    Maybe NAME should be declared VARCHAR (or VARCHAR2 for Oracle). Using VARCHAR is not so good though because I think I remember that performance wise it has quiet an impact compared to using CHAR. I must admit I haven't be able to test all that, but now my environment is starting to take shape.

    Is there a way to force oracle not to add spaces? Is this the case for other databases?

    If you find a solution, send me the code and I will see if I can introduce the changes in a release.

    Also any bug, submit it to SourceForge at http://sourceforge.net/projects/ejbutils !

    Emmanuel
  21. One really simple solution I'm thinking of uses the database to generate the id: -

    create table keygen ( nextkey integer );

    then to generate the next key,

    1) executeQuery("select nextkey from keygen")
    2) executeUpdate("update keygen set nextkey = nextkey + 1 where nextkey = $")
    where $ is the nextkey you selected previously.
    if the update count from the executeUpdate is 0, then go back to step 1.

    This method wouldn't scale very well, but should be extremely portable across all databases. There's a variant of this that does the update first to acquire a write lock before doing the read, but I found that it doesn't work at all on hypersonic sql ( the database doesn't seem to have write locks ).