GUIDs without Singletons and Databases

Discussions

J2EE patterns: GUIDs without Singletons and Databases

  1. GUIDs without Singletons and Databases (44 messages)

    EJB GUID generation techniques rely on a database to provide a sequencing mechanism or singleton objects, or a combination of both. This is a GUID generator that does not need a singleton or a database - can be safely pooled on a single machine and deployed in a cluster to provide completely scalable performance.

    The problem of generating unique IDs can essentially be broken down as uniqueness over space and uniqueness over time which, when combined, produces a globally unique sequence.

    Taking the UUID and GUID Internet standards draft for the Network Working Group by Paul J. Leach, Microsoft, and Rich Salz, Certco, as a starting point we assume that the GUID be represented as a 36-digit alphanumeric (including hyphens) of the format
    xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.

    The first 20 characters represent the uniqueness over time (the low 32-bits of the time stamp as the first 8, the next 4 as the mid 16 bits of the time stamp, the next 4 as the high value of the time stamp multiplexed with the version, and the next 4 a combination of the clock sequence high -multiplexed with the variant field - and the low bits) Note: The internet guidelines suggest the timestamp as a 60-bit value to a precision of 100ns since 00:00:00.00, 15 October 1582 (the date of Gregorian reform to the Christian calendar)

    The last 12 characters are a 48-bit node identifier usually implemented as the IEEE 802 address, which gives uniqueness over space.

    These are combined to produce a unique sequence.
     
    Some of the main problems in an EJB implementation of this technique are: -

    1) there is no access to the traditional IEEE 802 address.
    2) more than one object can exist at the same time and so generate the same time stamp at a granularity of a millisecond (Java's timestamp granularity).
    3) a clock sequence or equivalent is used as a way of reading/writing a value to storage (e.g. a database or system file) that can be used in case the clock is set backwards and as a seed for the number sequence. This is even more of a problem when a cluster of machines do not use the same database.
    4) Singletons are not portable and not recommended in the EJB specs.


    Previous solutions have required a database or a singleton object technique to try and overcome these limitations.


    However, by taking the following format we can address these issues in a simple manner.

    1) (1-8 hex characters) use the low 32 bits of System.currentTimeMillis(). We could use the recommended date format by adding 12219292800000L to the system time long before grabbing the low 32 bits but there doesn't seem much point.This gives us a uniqueness of a millisecond - therefore any clashing object will have to be generated within the same millisecond.

    But this does not give us a uniqueness in space or complete uniqueness in time. -

    2) (9-16 hex characters) the IP address of the machine as a hex representation of the 32 bit integer underlying IP - gives us a spatial uniqueness to a single machine - guarantees that these characters will be different for machines in a cluster or on a LAN.

    We now have machine uniqueness per millisecond - but this does not account for multiple objects on a machine, clock sequence being reset or the same object producing multiple GUIDs per millisecond.

    3) (17-24 hex characters) the hex value of the Stateless Session bean object's hashCode (a 32 bit int) - in the Java language spec - the hashcode value of Object is defined as -
         ** As much as is reasonably practical, the hashCode method defined by class object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java programming language.)**

    This removes the need for a singleton object as the Stateless bean uses its own hashCode as part of the GUID - so even if two objects create a value at the same millisecond on the same machine these characters will be different. Remember the hashCode is not over-ridden here so the implementation will be the one for Object. Note - on machines with low RAM the most significant characters will always be 00 in the hashcode sequence.

    So we now have a number that is unique to the millisecond, the IP address and the object creating it.

    There is still the possibility that the same object can produce many GUIDs within the same millisecond or the clock may be reset.

    4) (25-32 hex characters) a random 32 bit integer generated for each method invocation from the SecureRandom java class using SecureRandom.nextInt(). This method produces a cryptographically strong pseudo-random integer. The Java lang defines this as -
       ** Returns a pseudo-random, uniformly distributed int value drawn from this random number generator's sequence. The general contract of nextInt is that one int value in the specified range is pseudorandomly generated and returned. All n possible int values are produced with (approximately) equal probability.**

    By adding a random int to the GUID for each new request we now have a GUID that has a combination of the following four properties

    1) is unique to the millisecond
    2) is unique to the machine
    3) is unique to the object creating it
    4) is unique to the method call for the same object

    The only possible theoretical conflicts are:
    1) that two objects on the same machine are assigned the exact same hashCode (I do not know of any implementations that do this but there may be some out there) and at the same millisecond must also get the same integer value from the SecureRandom implementation.
    2) The same int value is returned from the SecureRandom object in subsequent method calls for the same object.
    3) A reset clock (which would require a redeployment of the bean) will produce an identical hashcode for the new deployment as a previous one AND the random values will have to be the same in the same repeated millisecond value as in a previous sequence.


    There is no problem using this approach to create a pool of GUID generators or using such pools in a cluster. Also performance is excellent as the SecureRandom seed can be initialised in the ejbCreate method and the subsequent nextInt() numbers are quick to produce.

    The basic implementation code can be found below - there are some utility methods etc omitted. A demo of this code is available at www.activescript.co.uk.



    --------------- Code follows -----------------

    // class variables used to cache some of the data
      private int timeLow; //32 bits
      private int node; // 32 bits
      private String hexInetAddress; //32 bits
      private String thisHashCode; // 32 bits
      private String midValue;

    //In order to speed up the method calls we can cache some infomation in the create method


      public void ejbCreate() throws CreateException {
            try {
    StringBuffer tmpBuffer = new StringBuffer();

    // initalise the secure random instance
                seeder = new SecureRandom();

    // get the inet address
                InetAddress inet = InetAddress.getLocalHost();
                byte [] bytes = inet.getAddress();
                hexInetAddress = hexFormat(getInt(bytes),8);

    // get the hashcode
                thisHashCode = hexFormat(this.hashCode(),8);

               /* set up a cached midValue as this is the same per method
    / call as is object specific and is the
    / ...-xxxx-xxxx-xxxx-xxxx.. mid part of the sequence
    */
                tmpBuffer.append("-");
                tmpBuffer.append(hexInetAddress.substring(0,4));
                tmpBuffer.append("-");
                tmpBuffer.append(hexInetAddress.substring(4));
                tmpBuffer.append("-");
                tmpBuffer.append(thisHashCode.substring(0,4));
                tmpBuffer.append("-");
                tmpBuffer.append(thisHashCode.substring(4));
                midValue = tmpBuffer.toString();
                
                // load up the randomizer first value
                int node = seeder.nextInt();
            } catch (Exception e) {
                throw new CreateException ("error - failure to create bean " + e);
            }
        }

    //The actual method call to get the unique numbers is then very simple

    public String getUUID() throws RemoteException
        {
         
            long timeNow = System.currentTimeMillis();
            timeLow = (int) timeNow & 0xFFFFFFFF;
          
            int node = seeder.nextInt();
         
           return (hexFormat(timeLow, 8) + midValue + hexFormat(node, 8));
      }

    ----------------Code Ends --------------------

    Any comments would be welcome regarding the implemetation or anything I might have missed that is obvious.

    Cheers
    Steve Woodcock

    Threaded Messages (44)

  2. Great technique, Steve!
    A while ago I needed a similar GUID generator and used similar fields. However, I didn't know about the GUID standard, and so I didn't comply.
    I have a couple of minor notes:
    1. Instead of using this.hashCode(), I used System.identityHashCode(). This method returns whatever Object.hashCode() would return, regardless of whether or not it was overriden. On all VMs I know, this is indeed an internal "pointer", so there's no risk.
    2. I had a counter inside the session bean instance, so two invocation on the same instance were bound to return different GUIDs.

    I also used a different encoding mechanism, but yours is just as strong (for all reasonable cases) and more efficient.
    Thanks for the contribution.

    Gal
  3. Why not just use the RMI Uuid class and call toString? Was there some reason you didn't want to use this?

    Billy
  4. Using RMI Uid instead[ Go to top ]

    Hi Billy,
    I had a look at this class before writing this bean- however there are a couple of issues for me with it.

    1) the class is for use on a single machine only - so would have problems on a cluster - you would still have to add the inet address
    2) it does not conform to the internet draft standard so you would have to take the returned string apart every method call and recombine it to fit the format - which could be expensive (also to fit the standard what would you do with the "-" sign as part of the negative increment vlaues? - removing it would decrease the counter's range to only positive short values - a very small range if there were lots of beans pooled and used simultaneously)
    3) it uses a Thread.sleep call so I am not sure how this would affect its use inside all EJB containers - when you are not supposed to manage threads manually
    4)the values are static and are incremented uniquely through use of a static mutex object which seemed to me could potentially cause problems with the container managing concurrent access
    5) the count value used to provide the last 4 digits is always initialised from the short.MIN_VALUE (0X-8000) and counted down from there - this seemed too much of a chance of collision if the clock is reset


    However, in saying that - the above solution is only one I have come up with - I am sure there are other approaches and the UID class may well serve very well as part of a different solution.

    cheers
    Steve
  5. Hi Billy,
    As an added point which I forgot to mention - I also ran a couple of tests for app servers that use more than one VM on the same machine - using rmi UID in more than one VM simultaneously produces sequnces where the first 6 characters in both is the same - the last set are always the same sequence -8000 increasing by one each time and the only difference is the millisecond value - therefore there seems too much liklehood that the method could be called in the same millisecond in both VMs which would clash and produce the same result. You would have to add other information such as a SecureRandom to prevent this happening.

    cheers
    steve
  6. Well if it doesn't work so be it.

    Billy
  7. We use a similar class in our systems, but in order to get our unique seed we scan until we find a port on which we can open a datagram socket, hold it open for a millisecond, then close it. This port number is concatenated with the current time and the IP address. So long as the system clock isn't rolled back, there is no chance of collision (even with multiple JVMs or multiple ClassLoaders in a single JVM) because the UDP port space is global. (We actually overlap the upper portion of the IP address with the time, so it is conceivable that if somebody in a different domain than ours ran the code 49 days later they could generate the same seed.) We then concatenate the seed with a 64 bit number incremented by 1 each request, for a total ID size of 136 bits.

    It is my understanding that the rule against the use of synchronized objects is to try to improve the reliability and maintainability of business code. It is very hard to track down intermittent problems in complicated multithreaded code. It does not seems that they wanted the following code to be written as an EJB:

    public class UniqueID() {
      static long next = 0;
      static final String base = ...;

      public static String get() {
        long n;
        synchronized (UniqueId.class) {
          n = next++;
        }
        return base+Long.toHexString(n);
      }
    }

    What do you think? Given that there is no chance of an ID collision, does the speed advantage (huge) of violating the spec make it OK?

    - Nathan
  8. Hi Nathan.
    This is a nice solution, but it cannot be used in EJB if you want to go "per spec".
    The EJB spec does not allow EJBs to listen on a port. Unlike some parts of the spec, this is acutally implemented by many EJB servers. They simply only put "connect" in the socket permissions section of their policy file.

    Gal
  9. Gal,

    True, it violates the spec in several ways, but it might work for some people.

    The globally unique IDs were originally used in our system to implement a memory-caching hack (O(N^3) algorithm needed to build in-memory structures from the data available in the DB!). We have since replaced it with a faster and much cleaner design using JTS synchronization objects (not quite vanilla EJB, but a more promising future).

    Nathan
  10. Steve,

    There is a hidden problem, not specific to this pattern but related to all patterns presented here (3 in all).

    And all started from Scott's Ambler article.
    Scott is a very nice guy, but he's an OO purists so relational database considerations are always secondary to nice OO design.


    Let's compare your pattern with the common old-fashioned (client-server) pattern of using a database generated values, and assess the performance factor.

    In the case of your pattern you store the Primary Key as 36 bytes. By default all databases use B-Trees to organize primary keys, and that will lead us to a simple calculation.

    If we consider the generated ID solution to held on average 4 bytes that means a database index page will store 9 times less values than one would when using other solutions.

    You won't feel that when you're developing , because databases and hardware are so fast these days. But be sure that if you let it go to production it really has a big impact.

    Possible remedies:
           a) Use better encoding.
              What's the point of having readable database keys by using hexadecimal notation ?
           b) Warn your DBA, so he would know who's dealing with and take appropriate measures :)

    Cheers,
    Costin
  11. Costin-

    I didn't quite follow all of the points you made in your last post. Could you clarify a little?

      - Steve's solution creates a 36 byte PK that is a string
      - Databases store PKs in a B-tree
      - What is the correlation that you are making between auto-generated PKs, their length in bytes, and those generated by a database (which are less than 36 bytes)?

    Thanks
  12. I'm saying that is very inefficient for primary key comparing with a 4 bytes or a 8 bytes integer.
  13. Hi Costin,
    Sorry - I don't follow the inefficiency part - maybe I am not as up the database part as I should be :-). Are you saying that it is much more inefficient to compare in a db lookup a 36-byte string with a 4/8-byte integer that a db would produce as a primary key?

    If that is the point then yes I agree that is self-evident - but I don't see how assigning a 4/8-byte integer derived from a db auto-increment or similar method can provide us with a key that complies with the need for uniqueness in both space and time in a cluster/server farm that also is unaffected by database rebuilds/restructures etc.

    Maybe I have got the wrong end of the stick with this - if that is so then perhaps you could expand your points a bit further.

    cheers
    steve



  14. Steve,

    I'm not saying that your idea is not good, but it could be finished better to get the slam dunk.

    The reason for inefficiency is not the comparison itself (strings vs integers), but the dimension of the keys.
    Because most database operations under heavy load are IO bound rather than CPU bound, in your case the database will have to read on average much more index pages.

    Even your key solution can be a 8 bytes integer, of course not all databases support eight byte integers, but if yours does, than you should try to reach this goal: eight byte integer.
    This is a good compromise between efficiency, and the need for galactic uniqueness :)
    In case you don't really need more that one billion of unique keys (and for many database tables this is usually the case), than you can even shoot for a 4 bytes integers.

    I saw, that you represented the key in hex notation, and even poadding with the '-' character to make it more readable. This is a bad goal.
    Artificial keys SHOULD NOT be seen by the user, not even touched by the DBA but with very rare exceptions, just manipulated by the database and the programs.

    Then what do you gain by making the key readable ?

    Cheers,
    Costin
  15. Hi Costin,
    I agree that the smaller keys would be far better for performance and if uniqueness could be achieved in a smaller number it would make for a far more efficient solution- (as a note - the hex notation and the hyphens are so the key conforms to the draft internet spec format - in fact it can be represented as an unsinged 128-bit integer).

    However, to my mind a smaller key - e.g 64 bits would not give enough of a spread to give the keys a uniqueness - it is not really about about having a vast number of keys >1 billion per table - it more about having a wide enough dimension to ensure that clashes do not occur when producing keys simultaneously across lots of machines and VMs that may or may not use the same DB tables.

    So the real problem domain is that if we have n servers talking to n databases, even if only 10,000 keys are produced per machine - which could easily be represented in a 32-bit integer -, these keys must have no chance of clashing given that the machines may not be able to communicate with each other as they produce the keys or be able to talk to a central db server to see what numbers have been used.

    As I see it this is not the same problem as generating IDs for a traditional client-server app that always talks to the same table and which you can easily use an auto-increment method to get keys for each entity - in that case a 4-byte or 8-byte database driven solution is probably a better solution.

    cheers
    steve
  16. In general, I would say you can easily produce 32 bit unique integers (and if not, well, we have 64 bit integers also), as long as you do not "lose" IDs (i.e. by inserting a timestamp... you'll lose many IDs because there may be minutes or even hours when no ID is needed).
    However, the only way to not lose IDs (unless there is some algorithm I don't know of ;-) is to count up in a sequence... which would require database accesses.
    My point is now that (considering our implementation) the amount of time wasted with database accesses is neglectible if you are using the proposed high-low approach (although the original approach violates J2EE specs and can lead to some serious problems)... if I need a database access every 2^16 IDs (for a 64 bit ID)... well, I guess I can live with this. And I also guess that upon read/seek (which you will probably need more often than write) the sequence number approach is faster, because it generates smaller IDs (what about write/building the index/balancing the tree? any ideas?).
    Thus I prefer the "use store for sequence" approach because it can be implemented so it does not violate J2EE specs at all, it doesn't lose IDs and has no problem with clock resets etc. and (I didn't test the proposed generator) I guess it is not slower than what is proposed here because it simply adds a database access now and then.

    Just let me know what you think,

    regards

    Messi
  17. Messi,

    Sounds good, but you will "lose" IDs when you allocate a block of 2^16 IDs to a singleton and then that process only uses 1 ID. This means that the new limit is 2^16 processes before the 2^32 space is used, which is not enough.

    It's easy enough to work around this problem in a variety of ways, but all of the solutions involve a tradeoff in space, time, or complexity.

     - Nathan
  18. Nathan,

    Well, you have the possibility of using 64 bit integers (still lots faster than strings or something similar), this would lead to 2^48 pools.
    But it does not seem reasonable to ignore that you could lose 2^16 IDs so easy... besides, after losing the pool you'll have to do a new database query.
    Fortunately, J2EE/EJB gives as a nice solution for this which seems perfect for such a case: a stateless session bean.
    There is a pool of these beans, every bean having its ID pool. A client can now ask for an instance and will get one, query for an ID and thats it.
    The risk of losing many IDs is only if the appserver is stupid and removes SBs immediately (I know of none that does this) or, and we explicitely state this in the documentation, if you generate a big bunch of IDs concurrently (so many session beans are instantiated) and then do not need an ID for a long time. But this seems to occur very seldom often in applications.
    Regarding the performance tradeoff: According to our tests there is nearly none because the appserver will do "local" calls, the tx-flag is "required". But of course you are invited to try it (www.xsteps.com).
    This solution also offers better possibilities of avoiding index hotspots because the key you'll get is more predictable, you just have to rotate it for some bits.
    In general, regarding the hotspots: Not all databases offer Oracle's "reverse index", and with artificial keys there is no need that it must be in any specific order, so we should give it to the database as it needs them, because this will always be faster than the database doing a "reverse index".
    So alltogether you'll have 2^64-some IDs, lets say you lose _many_ IDs this is still 2^63, I guess this should be enough for _any_ system, and if not: Well, partition your system into smaller parts and use a generator for every part. or for every EB. and if that is not enough I want to get my hands on the database and the hardware which can efficiently store 2^63 rows in a table ;-)

    regards

    Messi
  19. Messi,

    You are right that it would take a while to go through 2^16 stateless session beans in most environments, but I don't feel safe that a misconfigured or thrashing app server couldn't eat thousands of ID blocks over a weekend. 2^48 makes me feel safe :)

     - Nathan
  20. A previous post raised the issue of whether high/low-style IDs were efficient from a database index perspective. Anybody have an answer to this ? We're using two schemes based on the high-low approach ... one of them involves reordering the bits of the generated high/low key so that uniqueness is still guaranteed but the key values are spread across the full keyspace range. This was based on the suspicion that it might be more efficient for inserts/reads on an RDBMS ... but I don't have data to back this up.
  21. I found this paper very helpful, even if it is specific for Oracle and that the greater part is quite general. However, there is a subchapter about reversed key indexes which helped me to understand what a hotspot problem is.

    http://technet.oracle.com/products/oracle8/htdocs/xo8i4twp.htm.

    /Theis.



  22. As I see it the major problem with this approach is that IP addresses are not unique. Just think how many 192.168.0.1 boxes there are out there. IDs generated with this scheme on machines that are not directly connected to the Internet may clash.

    The webDAV RFC talks of the security concerns of encoding a MAC address within a UUID. This is likely much higher with an IP address.

    Why not use the random host identifier approach mentioned by Leach and Salz - though this still requires a singleton for the sequence part

    --
    Peter Sumner
  23. hi Peter,
    Thanks for the comments - Yes I agree that there could be a multitude of servers addressed 192.168.0.1 etc. The GUID solution here is intended to be a high perfomance way of generating UniqueIDs for EJBs for a company - and that is almost certainly going to have individual IPs for its servers- not as a global ID scheme for all java objects in the world - which is something say the Microsoft GUID representation seeks to address.

    Also, I do not see that using the alternative of a 47-bit random number as a unique host identifier mentioned in Leech/Salz is going to be any better than IP addresses. What is to stop the same number being generated on two separate machines?
    The only realistic way to ensure that there are no clashes is to use the MAC - which is the only guaranteed unique identifier in hardware that is accessable without refering to a central point continually (unfortunately in EJB you can't get that).

    The security risk you mentioned is not really relevant as the text from the WebDAV says :
    --------------------start of text---------
    Risks Connected with Lock Tokens
    This specification, in section 6.4, requires the use of Universal Unique Identifiers (UUIDs) for lock tokens, in order to guarantee their uniqueness across space and time. UUIDs, as defined in [ISO- 11578], contain a "node" field which "consists of the IEEE address, usually the host address. For systems with multiple IEEE 802 nodes, any available node address can be used." Since a WebDAV server will issue many locks over its lifetime, the implication is that it will also be publicly exposing its IEEE 802 address.

    There are several risks associated with exposure of IEEE 802 addresses. Using the IEEE 802 address:


       * It is possible to track the movement of hardware from subnet to
       subnet.

       * It may be possible to identify the manufacturer of the hardware
       running a WebDAV server.

       * It may be possible to determine the number of each type of computer
       running WebDAV.

    --------- end of WebDAV --------------
    As you can see the risk is about identifying the hardware and exploiting that info - I do not see how the IP address presents similar problems.

    cheers
    Steve
  24. Someone talk about Scot Ambler. His has written many interresting papers/books on OO programming, modeling and mapping.
    Concerning UID, I recommends:
    <http://www.ambysoft.com/mappingObjects.html>
  25. Hi all,
    I have read Scott's excellent paper on this and I agree with his comments on producing a UUID. For a single database and single app server then there are a few approaches you can take and the high-low pattern is as good as any. However, for a clustered or distributed environment there are problems with this - as scott says
    -------------quote-------------
    As mentioned previously, OIDs must be unique. But how do you guarantee uniqueness in a distributed
    environment where a client may obtain the HIGH value for its OID from a number of servers? The answer is
    simple: the server machines must also have a strategy for obtaining unique HIGH values between
    themselves. There are several strategies to doing so. First, the servers could use their own internal
    HIGH/LOW approach, obtaining a HIGH value from a single source. Second, the servers could obtain a
    block of HIGH values from a centralized source, obtaining a new block when they run out.
    -------/quote-------------------------

    So in a clustered or distributed system - rather than try and pin the responsibility on a centralised source (a bottleneck) or having a high communication overhead as lots of instances try and contend to see who should allocate what numbers to whom - it seemed more appropriate to use a non-database driven solution as suggested in this pattern.

    Also as mentioned before - IMHO (and also in Scott's paper) 64 bits is just not enough to produce uniqueness in a large scale system and 128-bits seems to be about the minimum.

    The performance overhead of using 128 bit strings - either from this or a high-low sequence pattern is something that may be acceptable in your project or not - in my own projects I have not found it to be an issue - especially when we start to scale into cluster deployments and the need for unique sequences outweighs the lookup penalty of using strings as IDs.

    cheers

    steve
  26. Ok, so I downloaded your "demo" bean from your Web-site, and it takes 5 secs to get a GUID. Did I miss something or is this the expected performance?

  27. Hello Michael,

    This can be expected for the first ID you "draw" because the AppServer will need to instantiate the Entity Bean and the Session Bean (if you are using our recommended approach, the SB won't be instantiated if you are using the Singleton). For subsequent UID generations the retrieve process should be very fast.
    You should also consider running the example and its client, since it generates quite a bunch of IDs (although it is also quite slow because of the screen output, but still lots faster than you describe).
    However, in general we are always here to give support, could you please tell us which AppServer/Database you are using and how you configured the UIDGenerator? (please mail this to support at xsteps dot com)

    kind regards

    Messi
  28. Hi Michael,
    I assume this is from the first request after deployment. There are two delays here - one is the time the container needs to instantiate the bean for the first time -
    the other is the lookup for the bean from the client and the time it takes to do this

    In order to use the bean effectively it is best to do the lookup and store the home prior to calling the any methods on it - e.g. in an init method for a servlet. (This is the same for most EJBs).

    You should see a performance of 10ms or less for each subsequent method call - even the first one if you have already performed the lookup.

    If you are not seeing this behaviour then can you let me know how you are calling the bean.

    cheers
    Steve
  29. The biggest problem that I see with this approach is that the keys generated are "too un-intelligent". Long keys make it hard for developers and DBAs to do their work. Just imagine a situation where a developer needs to communicate to a DBA that there is row in development database that needs to be updated (for whatever reason). About the only way you can communicate the row's primary key in this case is via email. That's a hassle.

    Long references are easy to handle for computers. They are very hard to handle for us, human beings.
  30. How about using PI as your unique key generator, grabbing 19 digits at a time (long), shifiting over by one each time?
  31. The hashcode appears to be unnecessary. The hashcode distinguishes generator instances in the same VM:

    The random number distinguishes ids generated in the same millisecond. The probability that two ids generated on the same machine in the same millisecond will conflict is then 1/2^32.

    However, the random number itself goes a long way to disambiguate generator instances in the same VM. If the hashcode is removed and one assumes evenly distributed RNG seed initialization, then the probability that two ids generated on the same machine in the same millisecond will conflict becomes n/2^32, where n = the average number of generator instances active in the same millisecond.

    Unless there are many highly concurrent generators--on the order of 1000--then the difference is neglible in practice. It is difficult to imagine a deployment/load scenario where n exceeds 10.

    A variation that reduces the probability of conflict of the base implementation is to replace the hashcode by a counter. The counter is initialized to a random number and incremented with each id, wrapping to 0 if necessary. This has the effect of removing all possibility of conflict in the same generator instance, but introduces a small probability of conflict between instances. Specifically, the probability that two ids generated on the same machine in the same millisecond will conflict is p*q, where

      p = the probability that the incremented counter
          conflicts for two distinct instances = 1/2^32

      q = the probablity that distinct generator instances
          will generate the two ids

    q decreases with server speed and increases with server load, but never exceeds 1. Hence the counter variation is no worse, and is typically better, than the base implementation at distributing unique ids.

    The access profile comes into play here. If a use case calls for creating two ids in a single Session Bean call, then the same generator might well be used within the same millisecond, favoring the counter variation.

    The counter subsumes the function of both the hashcode and the random number, so a lagniappe is that the random number could be eliminated. However, the resulting clustered sequential id distribution could adversely impact database indexing.
  32. I supposed the day light saving clock reset will have an undesirable impact on this pattern.

    3) A reset clock (which would require a redeployment of the bean) will produce an identical hashcode for the new deployment as a previous one AND the random values will have to be the same in the same repeated millisecond value as in a previous sequence.

  33. Steve since size does matter for the index keys (longer they are less performance you get from RDBMS) I suggest you replace the HEX encoding of your UUID with 64 based encoding like Base64 Alphabet (0=A, 25=Z, 26=a, 51=z,52=0,61=9,62=+ and 63=/) and lose the &#8216;-&#8216; delimiter.

    This will make a lot of Db Admins much happier :)

    Adrian
  34. Actually the way I would do it would be:

    IP address + System.identityHashCode + System.currentTimemillis() + Counter

    Where counter is a private long (synchronized) incremented during the lifetime of the singleton implementing the UUID.

    The end result will be a 8 bytes (chars) long UUID if you represent the concatenated number in base 64.

    Adrian
  35. "IP address + System.identityHashCode + System.currentTimemillis() + Counter"

    "The end result will be a 8 bytes (chars) long UUID if you represent the concatenated number in base 64."

    FWIW -
    4 + 4 + 8 + 4 = 20 bytes = 40 hex chars or 28 base-64 chars

    Peace,

    Cameron Purdy
    Tangosol, Inc.
  36. Of course is not 8 bytes! Just checking if this pattern topic is still alive and read with care.

    You need to compromise between absolute uniqueness, readability and database primary-key (index) performance.

    Here is a quick implementation resulting in a 16 bytes long unique ID.


    ---------------------------------



    /**
     * <p>
     * This class is a unique ID generator. To improve databases performance the
     * length was reduced to a minimum necessary.
     * </p><p>
     * The 16 characters long printable unique ID is a base64 representation of a
     * 12 bytes long ID obtained by concatenating:
     * </p><p>
     * 1) the 2 less significant bytes of the machine IP address (in a cluster or
     * in a subnetwork the most significant 2 bytes will probably be the same)
     * </p><p>
     * 2) the 4 bytes memory address of the generator class instance
     * </p><p>
     * 3) the less significant 4 bytes of the current time in milliseconds (we
     * don't expect the generator instance to run for more than a century)
     * </p><p>
     * 4) 2 byte counter to refine possible conflicts within the same millisecond
     * </p><p>
     * The uniqueness of the ID has two dimensions: space and time.
     * </p><p>
     * Items 1) and 2) will assures the uniqueness in the network/machine space.
     * Since it does not change during the lifetime of the generator it will be
     * computed only once at the construction time.
     * </p><p>
     * Items 2) and 3) assures the uniqueness in time. In order to avoid collisions
     * within the same millisecond a synchronized counter will refine the time at
     * 15 nanoseconds intervals.
     * </p><p>
     * The class can easily be converted to a singleton or stateless session EJB
     * @author Adrian
     * @version 1
     */
    public class UniqueID {

        // Data
        private String addressID;
        private String timeID;
        private short loopCounter = 0;
        
        
        /**
         * Creates new UniqueID
         */
        public UniqueID() {
            try {
                // Prepare a buffer to hold the 6 bytes for the addressID
                byte[] bytes6 = new byte[6];
                
                // Get the local IP address as a byte array
                byte[] bytesIP = InetAddress.getLocalHost().getAddress();
                
                // Copy the 2 less significant bytes of the IP address
                bytes6[0] = bytesIP[2];
                bytes6[1] = bytesIP[3];
                
                // Get the memory address for this object
                int memAdr = System.identityHashCode(this);
                
                // Prepare a buffer to hold the 4 less significant bytes of the addr
                byte[] bytes4 = new byte[4];
                
                // Convert the memory address into a byte array
                toFixSizeByteArray (new BigInteger(String.valueOf(memAdr)), bytes4);
                bytes6[2] = bytes4[0];
                bytes6[3] = bytes4[1];
                bytes6[4] = bytes4[2];
                bytes6[5] = bytes4[3];
                
                // Encode the information in base64
                addressID = ECBase64.encodeBytes(bytes6);
            }
            catch(Exception ignore) {}
        }

        
        /**
         * Returns a 16 printable characters unique ID.
         * The ID is base64 encoded
         */
        public String getNewID() {
            // Prepare a buffer to hold the 6 bytes for the timeID
            byte[] bytes6 = new byte[6];
            
            // Get the current time
            long timeNow = System.currentTimeMillis();
            
            // Ignore the most 4 significant bytes
            timeNow = (int) timeNow & 0xFFFFFFFF;
            
            // Prepare a buffer to hold the 4 less significant bytes of the time
            byte[] bytes4 = new byte[4];
            
            // Convert the memory address into a byte array
            toFixSizeByteArray(new BigInteger(String.valueOf(timeNow)), bytes4);
            bytes6[0] = bytes4[0];
            bytes6[1] = bytes4[1];
            bytes6[2] = bytes4[2];
            bytes6[3] = bytes4[3];
            
            // Get the current counter reading
            short counter = getLoopCounter();
            
            // Prepare a buffer to hold the 2 bytes of the counter
            byte[] bytes2 = new byte[2];
            
            // Convert the counter into a byte array
            toFixSizeByteArray(new BigInteger(String.valueOf(counter)), bytes2);
            bytes6[4] = bytes2[0];
            bytes6[5] = bytes2[1];

            // Encode the information in base64
            timeID = ECBase64.encodeBytes(bytes6);
            
            // Return the unique ID
            return addressID + timeID;
        }
        
        
        /**
         * Get the counter value as a signed short
         */
        private synchronized short getLoopCounter() {
            return loopCounter++;
        }
        
        
        /**
         * This method transforms Java BigInteger type into a fix size byte array
         * containing the two's-complement representation of the integer.
         * The byte array will be in big-endian byte-order: the most significant
         * byte is in the zeroth element.
         * If the destination array is shorter then the BigInteger.toByteArray(),
         * the the less significant bytes will be copy only.
         * If the destination array is longer then the BigInteger.toByteArray(),
         * destination will be left padded with zeros.
         */
        private static void toFixSizeByteArray (BigInteger bigInt, byte[] destination){
            // Prepare the destination
            for (int i = 0; i < destination.length; i++) {
                destination[i] = 0x00;
            }
            
            // Convert the BigInt to a byte array
            byte[] source = bigInt.toByteArray();
            
            // Copy only the fix size length
            if (source.length <= destination.length) {
                for (int i = 0; i < source.length; i++) {
                    destination[destination.length - source.length + i] = source[i];
                }
            }
            else {
                for (int i = 0; i < destination.length; i++) {
                    destination[i] = source[source.length - destination.length + i];
                }
            }
        }
    }

    ---------------------------------
  37. By the way, the autoincrement feature of DB2 for example use a 13 bytes integer so you would pay a 3 byte penalty only (16-13=3) and you don't need to read back the generated number (INSERT followed by SELECT - unless JDBC 3.0) and the key is unique across tables.

    Adrian
  38. If you are using the first 8 characters of the UUID for millisecond time, (16^8 milliseconds) which is only 3.2years!!! What happens when your application runs for longer???

    Another question... If you use IP address as the next 8 characters, what happens when IPv6 comes along???

    Dont you think this is a HUGE problem???

    dav
  39. You either have to coordinate across a cluster (e.g. Coherence or a database) or you have to make the GUID sufficiently large to use arbitrary rules and pseudo randomness to avoid collisions.

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Easily share live data across a cluster!
  40. Valid Period For Pattern[ Go to top ]

    If you are using the first 8 characters of the UUID for millisecond time, (16^8 milliseconds) which is only 3.2years!!! What happens when your application runs for longer???

    >
    Hey There.

    How do you figure on 3.2 years for the pattern?
    16^8 ms = 0xFFFFFFFF ms
    0xFFFFFFFF ms = 0d4294967295 ms
    4294967295 ms = 4294967.295 seconds
    4294967.295 seconds = 71582.78825 minutes
    71582.78825 minutes = 1193.0464708333 hours
    1193.0464708333 hours = 49.7102696181 days

    Is this the reality, or am I missing something. 50 days doesnt really cut if for me, but I want to save on space.

    As well, anyone have any good ideas on unique object hash codes for singletons running under multiple vm's on the same machine? I'm going with the singleton approach, but the System identity hash code of the object is not buying me much here.
  41. /**
     *
     * This class is a unique ID generator. To improve databases performance the
     * length was reduced to a minimum necessary.
     * </p>
     * The 16 characters long printable unique ID is a base64 representation of a
     * 12 bytes long ID obtained by concatenating:
     * </p>
     * 1) The 1 less significant byte of the machine IP address. This limits
     * the size of the cluster (or subnet) to 256 machines.
     * </p>
     * 2) The 4 bytes memory address of the generator class instance. This limits
     * the machine memory space to 4GB.
     * </p>
     * 3) The less significant 5 bytes of the current time in milliseconds. This
     * limits the lifetime of the ID to 35 years.
     * </p>
     * 4) A 2 byte synchronized counter to refine possible conflicts within the
     * same millisecond. This will refine the time precision at 15 nanoseconds.
     * </p>
     * The uniqueness of the ID has two dimensions: space and time.
     * </p>
     * Items 1) and 2) will assures the uniqueness in the network/machine space.
     * Since it does not change during the lifetime of the generator it will be
     * computed only once at the construction time.
     * </p>
     * Items 3) and 4) assures the uniqueness in time.
     * </p>
     * The class can easily be converted to a singleton or stateless session EJB
     * @author Adrian
     * @version 1
     */
    public class UniqueID {

        // Binary ID
        private byte[] bytes12;
        // Counter
        private short loopCounter = 0;
        
        
        /**
         * Creates new UniqueID
         */
        public UniqueID() {
            try {
                // Prepare a buffer to hold the 12 bytes for the unique binary ID
                bytes12 = new byte[12];
                
                // Get the local IP address as a byte array and copy the less significant byte.
                byte[] bytesIP = InetAddress.getLocalHost().getAddress();
                bytes12[0] = bytesIP[3];
                
                // Get the memory address for this object and copy the 4 bytes.
                int memAdr = System.identityHashCode(this);
                byte[] bytes4 = new byte[4];
                toFixSizeByteArray (new BigInteger(String.valueOf(memAdr)), bytes4);
    bytes12[1] = bytes4[0];
                bytes12[2] = bytes4[1];
                bytes12[3] = bytes4[2];
                bytes12[4] = bytes4[3];
                
                // At this point the first 5 bytes are set for the liftime of this object
            }
            catch(Exception ignore) {}
        }

        
        /**
         * Returns a 16 printable characters unique ID.
         * The ID is base64 encoded
         */
        public String generateBase64ID() {
            // Get the current time and ignore the most 3 significant bytes. Copy the remaining 5 bytes.
            long timeNow = System.currentTimeMillis() & 0xFFFFFFFFFFL;
            byte[] bytes5 = new byte[5];
            toFixSizeByteArray(new BigInteger(String.valueOf(timeNow)), bytes5);
            bytes12[5] = bytes5[0];
            bytes12[6] = bytes5[1];
            bytes12[7] = bytes5[2];
            bytes12[8] = bytes5[3];
            bytes12[9] = bytes5[4];
            
            // Get the current counter reading and copy the 2 bytes
            short counter = getLoopCounter();
            byte[] bytes2 = new byte[2];
            toFixSizeByteArray(new BigInteger(String.valueOf(counter)), bytes2);
            bytes12[10] = bytes2[0];
            bytes12[11] = bytes2[1];
            
            // Encode the information in base64 and return the unique ID
            return Base64.encodeBytes(bytes12);
        }
        
        
        /**
         * Returns a 16 printable characters unique ID.
         * Takes the base 64 encoding and replace '+' with '$'
         * and '/' with '_' so the resulted string can be used to
         * generate database table names and LDAP names.
         */
        public String generateID() {
            // Return the unique ID
            return generateBase64ID().replace('+', '$').replace('/', '_');
        }
        
        
        /**
         * Get the counter value as a signed short
         */
        private synchronized short getLoopCounter() {
            return loopCounter++;
        }
        
        
        /**
         * This method transforms Java BigInteger type into a fix size byte array
         * containing the two's-complement representation of the integer.
         * The byte array will be in big-endian byte-order: the most significant
         * byte is in the zeroth element.
         * If the destination array is shorter then the BigInteger.toByteArray(),
         * the the less significant bytes will be copy only.
         * If the destination array is longer then the BigInteger.toByteArray(),
         * destination will be left padded with zeros.
         */
        private static void toFixSizeByteArray (BigInteger bigInt, byte[] destination){
            // Prepare the destination
            for (int i = 0; i < destination.length; i++) {
                destination[i] = 0x00;
            }
            
            // Convert the BigInt to a byte array
            byte[] source = bigInt.toByteArray();
            
            // Copy only the fix size length
            if (source.length <= destination.length) {
                for (int i = 0; i < source.length; i++) {
                    destination[destination.length - source.length + i] = source[i];
                }
            }
            else {
                for (int i = 0; i < destination.length; i++) {
                    destination[i] = source[source.length - destination.length + i];
                }
            }
        }
        
    }
  42. You got the idea.
     
    Adjust based on your application requirements. In my case the generated ID lifetime was relatively short so I could optimize for speed (less bytes).

    In regards of the IP address even now I am not using all of it because I just need uniqueness within a cluster (private subnet).

    If you want to generalize you might end up with a 30 char or 60 char long unique ID. The question is: do you realy need it. If you do, go ahead if not give your DB engine a break.
  43. Do I have to use EJB pattern because it uses EJB?

    Why not use singleton? Is there any performance matrix for comparing both solution?

    thx. deepak
  44. Also, for clustered singleton, see:

    http://www.tangosol.net/forums/thread.jspa?forumID=6&threadID=82

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Easily share live data across a cluster!
  45. This mechanism works just fine. we use something very simillar to it in our system and have not had an issues with it for over 1 year now.