Discussions

J2EE patterns: Long-Lived Optimistic PseudoTransactions (Version Numbering)

  1. Motivational Scenerio:

        A user goes of the website goes to edit his user profile. Naturally, the user profile is represented as an entity bean in the back end. So the web tier looks up the user profile and obtains a value object copy of the entity bean, and populates the edit profile webpage with it. Meanwhile, some other part of the system changes that user profile, so the copy used to populate the webpage is now old. Then the user finishes entering his changes and clicks submit. It would be desirable to detect that there is a transactional collision (T1 reads, T2 reads, T1 writes, T2 writes over T1's changes...hence we have a race condition). Once we detect the collision, we can allow our application, and in turn the user, to handle the case appropriately. No problem, this is what transactions are designed for. Only, there is a problem... the "transaction" representing the users update through the webpage could potentially live 15 minutes or more since there is no guarantee how low the user will take to enter his changes. Therefore, JTA is not appropriate for this. We need a way to detect the transactional collision without requiring the transaction live longer than a few milliseconds.

    General Idea of the Pattern:

        To trap the collision we simply need to recognize that the version of the data loaded by the user during the page display is not the same version as what the user is overwriting during the submit update operation. So we add a new integer field, version, to both the entity bean and the database to do just this. Everytime a change is made to the entity bean, the version number is incremented. Now in our scenerio, the user loads version 4 in a transaction that lasts just a few milliseconds. Then elsewhere in the system, someone updates the database so it now contains version 5. When the application goes to enter an update on behalf of the user based on version 4 in another transaction (which lasts just a few milliseconds), we can discover that version 5 is already in the database, trap the error, and handle it appropriately, perhaps by asking the user "The data in the database has changed since you last loaded it. Are you sure you want to overwrite?". This is very similar to when two text editors are editing the same text file, and one notices that the other has saved and asks the user if he wants to reload.

    How it works:

    + In the entity bean (and in the database) add a new integer field called version.
      This number starts at zero and increases monotonically (we'll talk about when this happens later).


    + This field also shows up in the value object copy of the entity bean. For simplity sake, we'll make getValueObject and setValueObject be the only two business methods on the bean (although this isn't required.)


    + Whenever getValueObject() is called on the entity bean, all fields are copied over to the value object, including the version number.


    + Whenever setValueObject() is called, several things can happen:
      1) if (valueObject.getVersion() != this.getVersion()) an exception is thrown indicating the state of the entity bean has changed since the value object was obtained from the entity bean.
      2) if (valueObject.getVersion() == this.getVersion()), the state of the entity bean is updated and this.version is incremented by one.


    + Both getValueObject() and setValueObject() require the transaction attribute TX_REQUIRED.

    Why it works:

    + If getValueObject() and setValueObject() occur within the same transaction, then the app server already guarantees us the appearance isolated view of the entity bean, so we don't have to worry about collisions.


    + If getValueObject() and setValueObject() occur in seperate transactions, and the version counter in the database is incremented from 4 to 5 by another process occuring between the get and the set, during the setValueObject method, we detect that valueObject.getVersion() is 4 and entitybean.getVersion() is 5, and we know the collision occured and can handle it properly.

    Other possible implementations:

    + We could use a conditoinal update in the ejbStore() method, using something to the effect of "Update EntityBeanTable e .... Where e.id="myPrimaryKey" e.version=4". However, this requires bean managed persistence. I prefer the implementation above because it works for all types entity beans.

    Assumptions:

    This pattern makes the assumption that during the entity.setValueObject() transaction, ejbLoad() setValueObject() andejbStore() will all occur while there is a lock on the items in the database (Pessimistic Concurrency). If our app-server/database doesn't provide this guarantee then things break about 1% of the time (when two separate transactions are trying to write at the same time, see Optimistic and Pessimistic Concurrency in TheServerSide newsletter #2).

    Threaded Messages (21)

  2. We have successfully used this pattern in our EJB application. I have one simple question regarding this article, however. One of the 'How it Works' bullets states that the transaction level on both the getValueObject() and setValueObject() must be TX_REQUIRED. We have been using TX_REQUIRED on the setValueObject() and just TX_SUPPORTS on the getValueObject(). We haven't run into any problems, but I'm curious to know whether we are missing some subtle chance of failure due to our more relaxed transaction settings.

    Thank you,

    James Harris
  3. James,

    You're quite right in pointing out that getValueObject() need not be transactional as it is a read-only operation. Therefore, TX_SUPPORTS seems quite appropriate.

    However, section 11.6.3 of the EJB1.1 spec caveats that the exact behavior of the app server is unspecified in the case where no transaction context is required. It is possible that the app server will still call ejbStore() and save to the DB after invoking getValueObject(). And because the ejbLoad() and ejbStore() wouldn't necessarily take place within the same transaction, it's possible that concurrent access by two clients to the same data in the database could lead to a lose of information.

    Consider the following scenario where Clients 1 and 2 are concurrently accessing the same entity bean, but in a distributed fashion where the app server defers concurrency control to the database. I denote operations by client1 with a plus (+) sign and operations by client2 with a minus (-) sign. The scenario is as follows:

    + Client1 calls getValueObject() with no current TX.
      + App server sees TX_SUPPORTS and thus doesn't start a TX.
      + App server loads from DB and calls ejbLoad().
      + App server calls getValueObject().
      + App server calls ejbStore() and saves to DB.
    - Client2 calls getValueObject() with no current TX.
      - App server sees TX_SUPPORTS and thus doesn't start a TX.
      - App server loads from DB and calls ejbLoad().
      - App server calls getValueObject().
      - App server calls ejbStore() and saves to DB.
    + Meanwhile, Client1 calls setValueObject() with no current TX.
      + App server sees TX_REQUIRED and starts a TX.
      + App server loads from DB and calls ejbLoad().
      + App server calls setValueObject().
      + App server calls ejbStore() and saves to DB.
      + App server commits the TX.
    - Client2's call continues...
      - App server calls ejbStore() and saves to DB, overwriting the changes made by Client1. Information is lost.

    What we have is a bug in our program that has allowed the change made by Client2's call to setValueObject() to be completely lost. Worse yet, it's lost without any error messaging being logged. This is the worst type of bug (the aptly named Heisenbug) because it's highly unpredictable, occurs only a part of the time, and in general is very difficult to trace or debug.

    The bug's behavior is also very dependant on the way the app server chooses to manage the non-transactional getValueObject() call. For example, in WebLogic, we have the option of providing an isModified() method to prevent the DB store from ever taking place after the getValueObject() operation. Inprise, on the other hand, simply doesn't allow the programmer to use TX_SUPPORTS at all, to avoid this type of bug all together.

    In any case, the behavior of the application is non-deterministic and non-portable across app servers. Using the TX_REQUIRED attribute instead helps to avoid this problem all together.

    Doug

  4. Can you explain further using this example below
    ( i always thought a timestamp/Version on the table would be needed, although it has overhead in terms reading and comparing timestamp prior to update with timestamp from table to insure they match if they dont its obvious this information has been updated)

    What would happen if

    1.Teller John read account data

    2.Teller Jane read account data

    3.Teller Jane updated account data

    4.Teller John attempts to update.

    Assuming each one of them are in a seperate sessions, how would Version numbering solve this situation.
  5. I want to add to my question above, i am looking for information in a situation where the data is from multiple tables and updates are done in multiple tables(ex:- Customer, Account).

    Assuming a Customer can have multiple Accounts and Tellers are viewing a aggregate balance from Customer table.

    Both tellers initiated update on checking account of a customer as mentioned above.

    The Version/Time stamp pertains to Account table and probably resides in it.

    I am trying to get familiar with Java. Based on what read we need to use BMP beans not Entity beans in this scenario.

    How do we insure concurrency in this situation ?

    Thanks for help.
  6. Quick note -- I think the hash idea I proposed below might help you here, since you are dealing with multiple tables. The "race condition" testing with a hash happens in EJB land, not database land, so you are just looking at hashes in the entities, not versions in the table(s).

    Dan
  7. Dear All,

    Wouldn't the best way to solve this to be using timestamps (milliseconds since whenever)? Within the users profile the "transaction" field could store the time of the last update and when the data is being displayed the time could be stamped then on the form. Then when you submit the timestamp of the form could be compared to the timestamp of the entity bean, and if the time stamp of the form being displayed is before the last update time stamp then an exception could be thrown, otherwise the transaction goes through and the Entity timestamp is updated. This would work because all timestamps could come from the ejb server.

    Todd Lynch
  8. This is essentially the method we are using. Version or
    timestamp it doesn't matter except that the timestamp tells
    you when - which is a nice piece of information. Keeping
    the userid of the last person who changed it is also good.
    The only issue is whether you want to store the info as a
    matter of course.

    The EJB spec only deals with concurrent updates. It doesn't
    deal (nor should it have to) with situations where users
    have both viewed the data and are not in a current transactional state. In this case both users view the data and are in 'think time'. If the seconds client went and
    updated the database 1 second or 1 hour later the issue would be the same - updating based on a stale view of the data.

    This has been a 'problem' sinces the days of the mainframe
    and unless someone wants to build a framework to handle it
    (there is a Websphere Redbook which recounts one - I think San Francisco project may have done this), then this
    is the easiest way to ensure that updates are not overwritten by a client with 'stale' data..
  9. Version numbers are always more reliable than timestamping because you cannot guarantee , regardless of the time quantum that you'll never get the same timestamp.
  10. was conversation on using tx_supports instead of tx_required for getValueObjct -
    <--snip-->
    + Client1 calls getValueObject() with no current TX.
      + App server sees TX_SUPPORTS and thus doesn't start a TX.
      + App server loads from DB and calls ejbLoad().
      + App server calls getValueObject().
      + App server calls ejbStore() and saves to DB.
    - Client2 calls getValueObject() with no current TX.
      - App server sees TX_SUPPORTS and thus doesn't start a TX.
      - App server loads from DB and calls ejbLoad().
      - App server calls getValueObject().
      - App server calls ejbStore() and saves to DB.
    + Meanwhile, Client1 calls setValueObject() with no current TX.
      + App server sees TX_REQUIRED and starts a TX.
      + App server loads from DB and calls ejbLoad().
      + App server calls setValueObject().
      + App server calls ejbStore() and saves to DB.
      + App server commits the TX.
    - Client2's call continues...
      - App server calls ejbStore() and saves to DB, overwriting the changes made by Client1. Information is lost.
    <--snip-->

    I think when you say TX_REQUIRED you actually mean TX_MANDATORY.

    The objective is: make sure client calls get and set in the same transaction.

    Client _could_ use JTA to cause two separate transactions, however it is much easier with TX_REQUIRED - REQUIRED starts a new transaction whereas MANDATORY says the invokation must already be in a transaction.

    Best Regards.


  11. Doug,

    isn't it strange how worries about concurrency keep coming back to haunt us? We buy application servers to kick these concerns out of the front door, but the patterns we adopt means they come in again by the back door!

    I think the pattern you describe is entirely valid, but we may be able to avoid some of this by integrating data and behaviour more closely at the entity level.

    I came to your pattern from the link on the home page where the description reads "imagine two bank tellers looking at an account information web page..". This example talks about both tellers updating the balance (based on the values at the time they read the data) and the potential for one to overwrite the other's changes.

    I would say: why let a teller update the balance at all? Why not hide all balance alterations behind debit and credit methods? Then it doesn't matter when the teller read the balance; it doesn't matter either what has happened in the mean time. All changes will be based on the current situation.

    Of course, the balance example might be contrived. But couldn't it be valid to partition the changes that can be made to an entity into those that depend on previous state and the rest? Where there is a dependency on previous state we should encapsulate the state change in a method where the existing state is taken into account. In the remainder we may not need to worry about concurrency at all - the last change is the one we keep.

    Do you think this is practical? Or does its incompatibility with value object copies make it a non-starter?

    Kind Regards,

    Matt
  12. Hash rather than version number[ Go to top ]

    Doug,

    One minor suggestion -- use a hash rather than a version number. You could create the hash in a number of ways and with different algorithms (the native hash should be the fastest). The hash ensures that you won't have version number collisions (perhaps from sloppy programming), and that you will not throw an exception when another transaction has occurred but the entity is the same. Also, your data model doesn't have to be changed (and all that entails). The load and access requirements are about the same. You would probably want to set things up so this specific hashing support (checking on load/update) is implicit.

    Dan
  13. I am wary of using hash codes. As I understand it two String objects representing the same string should return the same hash number from hashCode(). However, this is not true of other objects. The default implementation of hashCode() for Object tries to return a distinct hash number for each distinct object no matter what their attribute values, and so the same bean loaded into a different JVM would get a different hash number. In contrast Float.hashCode() will return the same hash number for any number of different values as long as they have the same rounded value.

    You really need to build your own algorthim which (unless it is automatic in someway) may be error prone.
  14. Hi Doug,

    This is a very good pattern for solving the age old problems of con-currency. We are trying to use this pattern in our system to handle con-currency. We are using weblogic 6.0 with EJB 1.1 as application server and Oracle 8 as database.

    Putting pessimistic locking (Exclusive) in the weblogic avoids other clients instantiating the Entity object even for reading.

    Oracle 8 does optimistic locking.

    How do I make sure that pessimistic locking is used and yet the system allows more than one user to read the same row?

    Thanks
  15. I dont know why you guys are discussing using this pattern. You can achieve the desired level of of optimism or pessimism by setting the transaction isolation level at the method level.

    So if the business method is prone to concurrency problem then set the highest level of isolation in transaction ! (there are 4 different isolation levels available)
  16. Why this pattern is needed....[ Go to top ]

    Ritesh:

    Isolation levels are not the silver bullet you might expect if you simply read an EJB book. JTS/EJB transactions are meant to only span milli-seconds. In real-world systems (esp. with web front-ends), a user can sit on a data entry form page for minutes or longer. You can't keep the data locked that entire time; in a large-scale system with thousands of concurrent users you would drain system resources with long transactions. Also, I think it's still true that some db vendors implement the highest isolation levels by locking multiple rows of data instead of just one row (page-level locking vs row-level locking). You are much more likely to see complications due to this if the transaction is long-lived.

    This patterns solves these problem.
  17. Hey guys ...

    good pattern...but just had one question out here...

    we have a mixture of "valueObject" pattern and individual get/set methods incase we need to get/set certain individual attributes.

    Incase I do a setValueObject , I will have ,

    if(valueObject.version != this.version)
    {
    throw (new EntityStaleDataException("Stale data is being used for update. The original data has been updated by another process"));
    }
    this.name = val.name;
    this.title = val.title;
    this.salary = val.salary;
    this.version = valueObject.version + 1;

    But how do I do the compare and the increment , if I am setting a particular attribute

    eg: In case of

    setName(String name)
    how would I check the version , since I am not holding the version value anywhere ...?

    Am i missing anything..?

    Thanx,
    Krish.

    Krishnan.Venkataraman
    Symphoni Interactive
    Technical Lead.
    kvenkataraman at symphoni dot com
    412 414 5385(mobile)
    412 446 2219(Work)
    1 800 439 7757 (# 2219) (Work)
    412 343 6549(Res)
    WEB:http://members.123india.com/krishnan

  18. The property version works fine until the entity bean is released.

    The Application server can put the entity in "latency time" and unload from memory base on latency or overflow of cache size.

    When you need the entity again, the container iterates with database and loads a new entity with version 1.

    This strategy works only if the field is persisted on a table column
  19. Sandbox.[ Go to top ]

    I think in case of two or more components working on the same piece of information or data should be based on sandbox approach.
    Here is an example solution to this.
    These components are talkinng to centralized manager in our case EJB app server, in sandbox scenario easch components will quiery its own set of information/data from server and cash it on its own side and add itself listener for data changes with server.
    Everytime any of these components will need to change data, they will start transaction with server tell it what to change and commit them, and if transaction was succesfully commited the server will tell all change listeners (including the component which started transaction) that the data has changed so that they in turn will update their sandbox if necessary.
    In this approach if any of this component is in dirty mode (like user is modifiing something) they can popup a message asking user to resynchronizing its state/cash or stay out of synch untill he/she will commit its changes.
    As you see the component can stay out synch as long as it wants to, and if he startes transaction and tries to same some data which will violate server side state (like update data which has been already deleted by another transaction) the server will deny that transaction and tell back what was wrong so that component in turn would tell user why it failed and if he wants to resynch itself with server.
  20. The way I've seen this pattern applied in say Enterprise Objects Framework with WOF is that the update statement that's generated at the time of the update simply qualifies the record by the primary key _and_ the locking attribute value that was previous read in. If you don't update anything it means that that update didn't qualify any records because somebody either removed the record or upped the version number.
  21. Does the version scenario works in case of a transaction diamond ? It looks like it will work only if you call the set method on this bean only once in your entire transaction. Let's say you access the same entity through two different session beans with in a transaction which results in a diamond. A set value in the first bean causes the version to change. The container calls a store on the first bean and load on the second bean , on the set method on the second bean you will have a version mismatch and your transaction will fail even though you incremented the version in the first bean. Any comments ? Thanks,

    Ravi
  22. Version Numbering doesn't work[ Go to top ]

    Hasn't anybody realized that in the event of two transactions updating
    the data at the same time the last commit wins. Assuming the database isolation level is repeatable read. The update of the first transaction will lock the data and with it the second transaction. The version number is incremented by the first transaction. After commit the block is released. Since the isolation level is repeatalbe read the second transaction finds the version number unaltered. It updates the data a second time and commits. The updates of the first transaction will be overwritten and no exception will be raised.