Bluddy Big Transactions: How to deal with deadlocks

Discussions

J2EE patterns: Bluddy Big Transactions: How to deal with deadlocks

  1. Bluddy Big Transactions
    =======================

    With ejb declarative transactions, it is very easy to declare transactions that span across many ejbs, and as a result, database writes that span across many ejbs/ tables.

    What is wrong with this?
    ------------------------
    You probably have too many ejbs (don't we all?), but putting that aside:

    1. transactions should be kept as small as possible.. big transactions means other processes will wait often for access to any locked resources.

    2. deadlock is much more likely to occur as more tables are locked within a single transactions.

    Item 1. is very important, but for the purpose of this discussion I will concentrate on 2, deadlock.

    What is deadlock?
    -----------------
    Deadlock occurs when two or more processes get into a situation where they are each holding locks the other is waiting to acquire.

    For example:-
    process 1 (P1):
    update A (get a lock on A)

    process 2 (P2):
    update B (get a lock on B)

    P1:
    update B (blocks because P2 has a lock on B)

    P2:
    update A (deadlock... both processes are blocked
    and cannot proceed).


    Solutions to Avoid Deadlock
    ===========================

    Don't Use Transactions, or Make them Very Small
    ------------------------------------------------
    Don't use transactions unless you _really_ have to. Transactions get complicated and problematic.

    If you _must_ have transactions - make them as small as possible. Small transactions make for reduced resource contention: higher performance, and less chance of deadlock.

    Problems:
    -if more then once resource is locked in a (small) transaction, deadlock can still occur

    Access Resources in a Specified Order
    --------------------------------------
    If all transactions access resources in a specified order, then deadlock is avoided. For example if there if there are three resources, A, B, C, then all processes should agree on an order of access. If P1 accesses A, then C, and P2 accesses B then C, deadlock cannot occur.

    Problems:
    -this option might not be available if you are sharing a database with another application/ user group. You need to be able to control or specify guidelines for all transactions from all applications sharing the database.
    -foreign keys complicate this - if table A has a foreign key to table B, inserting a new row into table A will cause table B to get locked as well.
    -low level locking considerations: page versus row level locking further complicates this picture
    -each _row_ is a resource. Most people consider only tables in the "order of resource access" guidlines. Two transactions may update multiple rows in a single table in a different order (I've seen deadlock from this)

    Points:
    -I've considered the idea of writing a "resource order" register. In this case, a config file would specify the valid order of resource access. Each piece of code that accessed a resource should first register this with the resource order manager, which would throw an exception if access out of order was attempted. This would be particularly useful during unit and regression testing.
    -Doesn't the container handle this for you? - No, I've seen deadlocks thrown as SQLExceptions from several app server/ database combinations

    Simply Catch the Exception, Sleep, and Try Again
    ------------------------------------------------
    When a transaction fails due to deadlock, catch the exception, possibly sleep or use some back-off algorithm, and then try again.

    I've received this as advice for the solution from many sources, including database vendors.

    I've always felt this was a "dirty" solution; but heck, if it works *shrug*

    If this is the best solution, however, what is the best way to handle this ?

    Is there a non database dependant way of determing an exception is due to a deadlock ?

    Do people simply look for the word deadlock in the SQLException error message ?

    Is this really the best solution we have ? / What other solutions do people have?


    ==

    I'm sure everyone has experienced this kind of problem. I've never found an ejb book which gives more then a cursory glance at ejb transactions - most of them (cowardly ;) ignore the deadlock issue.

    Is there a book I'm yet to discover?

    Thanks in advance to all ye who contribute,

    Hugh

    Threaded Messages (20)

  2. You are touching a sensitive, but very common issue in most business environments. My personal solution to the problem is to avoid the deadlocks well before the database or other mechanism kicks in. I can't say I always succeed, but ...

    Deadlocks occur if you update more than you insert. You know, you can live with a system without updates, but you can't with one without inserts ;-)
    Deadlocks occur when you concentrate on specific portions of your data, which quickly become the points of contention.
    Briefly, I would just update data I know it's very rare two or more would access at the same time (like the personal information of an end-user) and insert, in form of history or detailed data I would have to aggregate, for those parts that are frequently accessed - updated.
  3. This issue is very sensitive , but an integral part of any enterprise development. But I am not sure that , deadlocks can avoided or drastically reduced , if there is and business process(BP) that may lead to deadlock. Because nither we can change BP nor we can control user interaction on that resoureces.

    But how can we avoid updates on a table that requires update is not clear to me.Because if we insert a new entity every time , then all other dependent entity have to be updated to be in synch .

    I am describing a pattern that i once thought of but never implemented it .

    Say two BP P1 and P2 where P1 first acuires a lock on Resource R1 then on R2 and P2 first acuires a lock on Resource R2 then on R1.
    So this may lead to a deadlock secnario.

    So we create a shared resource for this deadlock scenario ( Say Database or a thread safe singleton) .
    Now P1/P2 reads this flag and if it is free then enter into the process after update the flag . At the end of P1 , it dupdated the flag into free . Similarly P2 if it does not find that flag is free , then either is wait for infiniet time or fixed time or throws Exception ( depending on requirement ) . So deadlock scenario can be managed .

    Waiting for your feedback and comments .

    Ashim chakraborty
  4. Hai

    As amy friend Mr.Ashim ,told i think its a very nice idea. But think abt the problem related to instance variables.Will it work in this case???



  5. Hugh,

    Deadlocks occur because we are optimists... I mean, generally we adopt an optimistic view of concurrent access to the databases. So, the only way to eliminate deadlocks is to colapse the whole system into a single resource and access the database single-threaded... Obviously this is not an option for most applications.

    So, let's start from the principle that locks, timeouts and deadlocks are indeed unavoidable. But their frequence can, and should, be reduced. A few points were already mentioned:
    - Try to reduce the time a resource is locked, but transactions must be atomic.
    - Try to reduce the number of resources locked.
    - Try to grab the resources in a predefined order, but not always easy to do.

    Let's look make up a simple scenario for a banking system.

    To transfer money from account A to account B we are competing with other transactions getting balances, statements, making deposits, withdraws and transfers from the same accounts. If we have 1000 transactions per minute and only a few accounts, we are garanteed to get lots of deadlocks.

    Generally, we can't break a "transfer" transaction into 2 transactions by separating the withdraw and deposit because of the risk of data inconsistencies.

    But if we're dealing with the same volume of transactions over millions of accounts, we'd expect to see only a small number of deadlocks.

    Now, let's say that we might also need to keep a overall balance of all accounts per branch. We've just complicated our scenario a little further again with more locks, deadlocks and poor response times, because now the branch table is the bottleneck.

    So one of the key things is to design the application taking in consideration the possible bottlenecks and avoiding them beforehand. Those bottlenecks are not always obvious and sometimes they're found later during system or stress test... Just before the deadline.

    Second. Because the deadlocks are unavoidable we must take take them into consideration. The easy way is to fail the transaction and ask the user to retry. The hard way is to automatically detect that the EJB failure is "transient" (i.e. there is nothing intrinsically wrong with the data) and it could be retried.

    Anyone to suggest a proper pattern to handle the automatic retry of a transaction? Which approach is easier: command, facade or using a message-driven bean?

    Ismael
  6. It seems like we always need to be aware of the possibility of deadlocks, and prepared to deal with them. "that data was busy, please try again". Like getting a busy signal on the phone.

    I always access my EJBs through Proxy/Adapter-type classes, similar to Business Delegates. I experimented with making a generic superclass for these guys, that wraps each call to the EJB in a loop that detects transaction rollbacks and retries a specified number of times. I had to wrap each call in an anonymous Command object to get it to work. I think aspect-oriented programming would help a lot with the coding here.

    Your other solution is interesting, send Command objects to the server, and have the Command Processor put them back on the queue if they generate a deadlock.
  7. A somewhat different approach which I've used in the past is to create a transaction queuing infrastructure. My implementation was done using plain Java classes, though I suspect that you'd want to do it using JMS to handle these situations across a cluster.

    The idea is simple - instead of trying to do a piece of work immediately, the thread executing the work (the Requestor) submits it as a job lock request to the transaction queueing system. A Lock Manager sits on the other end of the queue, and ensures that one and only one requestor can process any given job type at a given time.

    As and when the lock for the requested job type frees up, the Lock Manager send back a message to the next Requestor in queue indicating that it has now acquired the lock.

    Requestors must be well-behaved, in the sense that they must send a message back to the Lock Manager to free up a lock when they finish processing.

    An additional benefit is that you will also be able to easily parallelize job execution for threads which don't require that the (typically) multiple jobs they execute would have to execute serially.
  8. Hi all,
    This problem of deadlock is not new at all, it was even worse in the past with 4GLs when the database records where locked during user-thinking time.
    The solutions are still the same:
    - avoid use of transactions whenever not necessary(auto-commit)
    - if necessary, try to group all the updates for one transaction in one shot using batching updates to reduce the total locking time.

    The use of these techniques will not remove the fundamental problem that a record could be updated by another process in the middle of a transaction. In this case, optimistic locking may help. The idea is that the database record is analysed and locked just before the update to see if its content has been changed by another process. Then the update is made within an independent transaction.

    Some interesting articles about this:
    JDBC transaction optimization
    Optimistic locking pattern

    Best regards.
    Robin.
  9. I agree with the sentiment.

    I've worked on large tp systems (with non-trivial transactions) in the past, in C++.

    These systems where written in an OO way so there was lots of SQL and data flying around between the C++ and the RDBMS (Sybase).

    Deadlock is a huge issue which is worsened by OO programming techniques, where data access ordering would violate the encapsulation OO techniques provide.

    To get round the deadlock issue, we wrote a transaction management framework - think of it as an App Server for C++. This framework provided 2 basic mechanisms to recover deadlocks and reduce their frequency (you can't eliminate them).

    a) Automatic retry. The system had the equivalent of message driven beans (but better :-), so retrying a transaction was an option provided the message was popped back onto the queue prior to starting the TX again.
    We ensured that the transaction stack (nested transactions) from C++'s point of view was correctly unwound and that the object state was returned to the same as at the start of the TX, then we just ran it again. There was some ethernet style backoff logic in there to to increment the delay between each try.
    The reason you have to reset the program state is because the RDBMS has already dropped the rolled back your DB transaction when the app gets told about the deadlock.
    This is obviously harder to do in Java because you can't rely on destructors running predictably, in order and on demand :-(
    [as an aside, I've seen some write thru caches been badly damaged in the event of a deadlock, precisely because they can't 'rollback' cache state]

    b) Defer and batch all updates and inserts to the point where the app has to commit, then blast the whole lot down to the DB in one big batch. This holds locks for the shortest time possible, reduces contention, increases performance (significantly) and reduces deadlocking.

    Add optimistic locking into all of the above to complete the recipe.

    Other things to do are to ensure good index spreading etc. As others in this thread have hinted, deadlock can come from inside the DB internals - eg. index maintenace conflicts. But now you're into DB tuning territory (which is extremely important - a friendly DBA involved in physical DB design is worth loads. Just make sure you get him in before you go live!).


    When I came to Java app servers, I was amazed that deadlock wasn't mentioned anywhere (products, books, specs), as you say.

    Also, the J2EE spec is so tight, its very hard to build compensating mechanisms to help out, like I did for C++.

    App Servers need to take this more seriously. You need transactions, and sometimes the business process demands that they're long. Mechanisms to deal with this should be part of the app server JTA/JDBC/CMP/MDB spec.


    John

  10.  John,

     How does row level locking fit into the equation
    in relation to OO design? Many older designs reduce contention by limiting the majority of the transaction
    operations to the record level.

     Fred
  11. Hi

    Row level locking will help.
    The resource you are requesting is just the one row in the DB that you want to change, not a page containing multiple rows. Rowlocks are expencive - talk to your DBA.

    You can also (at least on Informix 7.x) get a deadlock when INSERTing lots of new records where the spread of the unique keys are narrow.

    Some years ago we had this problem on an Informix system. The solution was to find all the error codes ( ~10) that deals with locks and deadlocks and if one occurs - sleep one second and retry the db updates. Retry 10 times and if it doesn't work return a deadlock error.

    Remember that when you have your deadlock you have also been rolledback and NO changes are in the DB (thanks !) so perhaps you must keep your own copy of the data to make it possible to try again automaticly. This is the approach for the current project.

    Sorry that this is not so much a comment of the pattern but my share of knowledge.

    Jan Nilsson
  12. Hi everyone,

    I can definitely vouch that row level locking (when available) is a good thing. But you still have to be careful, since it's possible for a single write to generate both row and page level locks, thereby increasing the chances of deadlocks.

    A few years back I was involved in implementing a TP system on top of MS SQL Server 6.5 (I can hear the groans already! :) At that stage it supported row level locking for inserts only, and page level locks for other types of activity (updates, deletes and selects).

    In our case we had multiple copies of a single process (so resource allocation was guaranteed to be the same) that inserted lots of records into a nested set of parent / child tables. The first couple of times we ran more than one copy of it, it immediately deadlocked itself (announced by Squirrel Server's reassuring "Your connection has been selected as deadlock victim" message :)

    What was happening was:
    1. P1 inserts into T1 (fine - takes out RLL)
    2. P2 inserts into T1 (also fine - takes out RLL)
    3. P1 inserts into T2 and blocks (since it implicitly issues a select on T1 to ensure the foreign key - selects only have PLL so it blocks on P2s insert RLL)
    4. P2 inserts into T2 and deadlocks (since it implicitly issues a select on T1 to ensure the foreign key, and is blocked by P1s insert, thereby creating a circular wait condition)

    The (seemingly dodgy, but actually quite clever) solution was to create a new column on each of the tables, move our clustered indices onto the new column and populate it with a random number when we inserted new records. This spread out the "hotspots" in the table and significantly reduced (but didn't eliminate) the chances of the process deadlocking with itself.

    Now you may be tempted to dismiss this as an MS SQL Server 6.5 oddity (MSS has had full RLL since v7.0), but this may also occur in databases with full RLL implementations, since you cannot guarantee that lock escalation won't occur.

    The situation gets even worse if you have lots of indices on your tables. In this case your writes can deadlock on index updates, particularly since these are generally much more likely to cause contention (if your insert causes an index page split, for example). The way to avoid this problem is to pad out your index pages with a large (~50% works for me) amount of slack space. This helps to prevent a single index page split from causing a cascade of splits down through your index pages (which has a pretty disastrous impact on concurrency on that table).

    The real lesson (as has already been mentioned) is to ensure you have a knowledgeable and committed DBA on hand. Tracking down these problems can be a royal pain, and unless you've got someone who's both skilled and dedicated, you can be up the proverbial water course without a means of locomotion.

    Great thread BTW! Having come from a C/C++ TP background, transaction issues (deadlock detection, 2 phase commit etc.) is definitely an area where I feel J2EE has some maturing to do.

    Cheers,
    Peter
  13. This is a very interesting topic. For me the natural solution is to retry the transaction, as you mentioned, but I do agree that it doesn't feel quite right. Often, rolling back is an unduly drastic action, but it's unavoidable because we regard all data the same way (rdbms's do) and therefore we cannot arbitrate between transactions.

    Perhaps a systematic way of attacking the problem might to find stereotypes for various types of data, and then try to come up with transaction management based on those stereotypes. Perhaps the stereotypes can become interfaces and be realized as entity beans or java beans, and then we can probably find some meaningful design patterns.

    Guglielmo
  14. Guglielmo,

    That's an intriguing idea, design-time constructs for transaction management with an eye towards deadlocks. Let's explore that. Deadlock handling definitely needs more attention in the J2EE world.

    Unless someone out there is hiding a silver bullet, the only things you can do about dreadlock are reduce its frequency and handle it when it does occur. (I suppose you might stand a chance at prevention if an entire application was a monolith from backend data resource to user interface, but that's neither reasonable nor desirable.)

    The approaches mentioned in this thread (small tx's, ordered access, fail-and-retry) have been around forever, but I've never seen a reusable design approach to their application. So I'm going to look for a light formalization of ordered access and fail-and-retry. Keeping the tx boundaries small (both temporally and number of resources) is just good sense and the idea of "small" will vary with the application, so I'll exclude that one.

    Have you taken this idea any further?

    ok,
       jp
  15. How to deal with deadlocks[ Go to top ]

    I realize I'm coming in late into this discussion, but wanted to share my recent experience with deadlocks. I also want to agree with lack of patterns and coverage around this topic. All the J2EE books I've seen about isolation levels, concurrency, etc. don't really address this issue. Ultimately, I believe it comes down to your database implementation.

    We implemented a modified version of the Sequence Blocks pattern found in EJB Patterns using WLS 6.1 and Sybase 11. We ended up using just a Stateless session EJB (SequenceBean) with JDBC and CMT to ensure sequence uniqueness. We used a serializable isolation level.

    Our SequenceBean performed the following logic:
    begin TX
    select the most recent sequence
    update the sequence value incrementing by 1.
    commit TX

    When we ran a multithreaded test against the SequenceBean and ran into deadlocks.

    We flipped the order of the SQL to:
    begin TX
    update the sequence value incrementing by 1.
    select the most recent sequence
    commit TX

    The deadlocks stopped occurring. Why? Because of the way Sybase handles locking. This is something that should be understood when tackling this type of issue. "Understand the locking mechanism of your database." Sybase uses shared locks for selects and eventually exclusive locks for updates. An update cannot get an exclusive lock until all shared locks are released. You can see with our first approach that if two threads are trying to get exclusive locks there will be deadlock because they are waiting on each other's shared locks. Switching the order of the statements allowed the update statement to get an exclusive lock and allowing other transactions to block (not deadlock) until the transaction had completed.

    Let me know what you think.

    Jim
  16. How to deal with deadlocks[ Go to top ]

    Heya Jim,
    I agree that knowing your databasing locking is very important, however it's certainly not going to be a silver bullet :)

    I assume you also discovered that sybase lets you do select with holdlock?
    (which ensures the select gets and _holds_ it's lock until transaction complete, _or_ the read is upgraded to a write).

    I believe the deadlock you were referring to was occuring because the read lock is acquired with the select, but then the transaction lets go of the read lock even before the transaction has been committed, or the read has been escalated to a write.

    cheers,
    Hugh
  17. Another approach to handle database updates is to use ROW Level locking at database level if database in question supports the same. In this case, resource locked is a single ROW and not Entire Table. Thus probability of deadlock is reduced.

    One more way to handle transactions when they are must, is to split transactions if possible. Many of big transactions are a result of some carelessness on part of designer or programmer.
    Suppose a txn(T) requires database resources A, B and C in the same order. Also assume that result of operations on B and C are not going to impact operation on resource A.
    I such a case, it is better to fork a new transaction (Ta) for Resource A. Ta will be in picture till operation on resource A is going on.Till this time, outer transaction(T) will be suspended and once resource A txn (Ta) is over, outer transaction (T) will continue. And resource A will be available for other transaction.
  18. The best way to avoid Deadlock is try to avoid cyclic dependency between the processes and that can be done with carefull programming.
  19. Template methods might help !
  20. Or stored procedures ?
  21. There is oen very simple remedy to all this...
    (Its not the cure to end all problems but its a goot start)

    Chances are when you have a BBT, you have quite some response time.. And therefore it is quite possible that an immediate response is not necessare/possible.
    Threrefore use MDB, these will be put in a message queue, aquiring and releasing resources sequencially. If this is implemented correctly there should not be any locked resources when the next transaction is initialised.

    If a response is necessary (and the transaction is quick), a reply message can be sent to the user.

    /Johan