Discussions

News: Automatically Detecting Thread Deadlocks in JDK 1.5

  1. A newsletter subscriber brought up the issue of OOME occuring when too many threads were created. The problem with too many threads is that they need stack space. We had both experienced where increasing the maximum heap space actually decreased the number of threads that we could create before getting an OOME. Unfortunately this maximum number of threads seems to be kind-of magical, and depends on the operating system and on the initial stack size per thread.

    This brought me to this new warning system, that notifies me if we have too many threads. In order to not get too many notifications, I take the approach that you get one warning when we pass the thread count threshold. If you slip below the threshold, and go above it again, you will get another warning notification. This is the same approach taken by the memory bean. Better would probably be to have a high- and low-water mark.

    In addition, it can also tell if there are deadlocked threads. My very first Java newsletter, sent in 2000 to some friends and colleagues gleaned from my contact list, demonstrated how you could find thread deadlocks by hand. This system finds them automatically for you. Seems we have progressed in the last 4 years! Looking for deadlocked threads is potentially slow, so this code could affect your performance. However, I would rather have a slower correct program than a lightning fast incorrect program.

    There is absolutely nothing you can do with a deadlocked thread. You cannot stop it, you cannot interrupt it, you cannot tell it to stop trying to get a lock, and you also cannot tell it to let go of the locks that it owns. This is one of the criticism in Doug Lea's book about the primitive monitor-based locking mechanisms. Once you try to get a lock, you will forever try and never give up. The concurrency handling mechanisms of Doug's book are now in the java.util.concurrent package of JDK 1.5.

    This brought me to the question, what is the definition of a deadlock? In Webopedia.com, they describe it nicely:

    A condition that occurs when two processes are each waiting for the other to complete before proceeding. The result is that both processes hang. Deadlocks occur most commonly in multitasking and client/server environments. Ideally, the programs that are deadlocked, or the operating system, should resolve the deadlock, but this doesn't always happen. A deadlock is also called a deadly embrace. (Source Webopedia.com)

    Enough theory, here is the code to the ThreadWarningSystem, that detects when there are too many threads, and finds thread deadlocks:

    Complete newsletter on www.javaspecialists.co.za

    Heinz

    Threaded Messages (18)

  2. link doesn't work[ Go to top ]

    the link to the article doesn't work
  3. Coeect Link inside[ Go to top ]

    http://www.javaspecialists.co.za/archive/Issue093.html
  4. Excellent[ Go to top ]

    The Javaspecialist newsletter is such a fresh breeze of actual content. So nice that there are a few "content providers" that actually say something. Really interesting articles.

    Perhaps this is the NewDirection(tm)(r) that java is to take. Sites that contain actual content, articles that aren't rehashes of the java tutorial and/or some dim blog.
  5. Excellent[ Go to top ]

    The Javaspecialist newsletter is such a fresh breeze of actual content. So nice that there are a few "content providers" that actually say something.
    I agree, after reading this article I began trolling through some of the older articles and found some pretty interesting stuff in there. Best thing is that the articles are a conventient length to fit inside a compile cycle for the ghastly c++ code I am having to hack about in.

    Great work Heinz
  6. Excellent[ Go to top ]

    The Javaspecialist newsletter is such a fresh breeze of actual content. So nice that there are a few "content providers" that actually say something.
    I agree, after reading this article I began trolling through some of the older articles and found some pretty interesting stuff in there. Best thing is that the articles are a conventient length to fit inside a compile cycle for the ghastly c++ code I am having to hack about in. Great work Heinz
    Thanks, Carl :-)

    You will notice that with the latest few newsletters, I have added the version of Java that I used for that particular newsletter. I've been publishing this newsletter for the last four years, and frequently I am asked what version of Java I was using for Issue 0x. My memory is not that fantastic that I can remember exactly which version of Java I was using at the time, and there are so many differences, it can be rather frustrating!
  7. Well well[ Go to top ]

    All this because Thread.stop() is considered unsafe and has been deprecated.


    Say I'm writing a 'java shell', which can spawn arbitrary apps/classes with main methods, how do I terminate a badly behaved app?

    What are projects like http://www.javagroup.org/echidna/ supposed to do?
  8. Well well[ Go to top ]

    All this because Thread.stop() is considered unsafe and has been deprecated.Say I'm writing a 'java shell', which can spawn arbitrary apps/classes with main methods, how do I terminate a badly behaved app?What are projects like http://www.javagroup.org/echidna/ supposed to do?
    Thread.stop() is particularly nasty since it generates an asynchronous exception. The idea of an asynchronous exception makes shivers run down my spine. Fortunately Sun through them out when they moved from Oak to Java (Check out my newsletter about Oak), and just left behind the ThreadDeath exception.

    Most deadlocks I have seen were due to my own silly mistakes, rather than Thread.stop().

    Would it not work to load each app in a separate ClassLoader and then just throw the whole ClassLoader away? Or does that not terminate threads? Hmmm. No, I guess not. *ouch* difficult one to solve.
  9. Hi,

    With the current model, to avoid deadlocks, you can always check for deadlocks before acquiring a lock.

    If deadlock recovery is very critical, Semaphores can help.

    Regards,
    Hemant
    www.pramati.com
  10. Hi,With the current model, to avoid deadlocks, you can always check for deadlocks before acquiring a lock.If deadlock recovery is very critical, Semaphores can help.
    How would that work?
  11. Hi,With the current model, to avoid deadlocks, you can always check for deadlocks before acquiring a lock.
    Hemant,

    Are you saying you can detect deadlocks before they happen?

    Carl
  12. Hi Carl,

    It is possible to detect deadlocks in your application. In our container we provide this facility to handle situations as follows:

    Account1 -> Account2 (transfer)
    Account2 -> Account1 (transfer)

    If these two transactions are happening concurrently and both Account1 and Account2 are locked in their respective transactions, then you can have a deadlock situation.

    The solution is simple.

    1) Each thread maintains a list of all the resources it has locked.
    2) Whenever a thread requests a resource which is already locked by another thread, instead of waiting for that resource, the requesting thread will first check if that wait can cause a deadlock based on deadlock detection algorithm.
    3) If the deadlock detection algo says, this wait can cause deadlock, the thread will not wait and instead will try to rollback the transaction.

    T - A Thread
    R - A Resource

    1) T1 --> request lock --> R1 (granted)
    2) T1 --> R1 (locked resources)
    3) T2 --> request lock --> R2 (granted)
    4) T2 --> R2 (locked resources)
    5) T1 --> request lock --> R2 (already locked, to wait, since cannot deadlock)
    6) T2 --> request lock --> R1 (already locked, rollback, since wait can cause deadlock)

    I am trying an examle using jdk1.5 Semaphores for deadlock recovery. I think if you keep one Semaphore instance per Resource to lock, instead of using the conventional java monitors (via synchronized), it should be possible to recover. Since Semaphores work on permits they can be released by other threads and hence deadlock recovery.

    Let me try oyt this one :-)

    Thanks,
    Hemant
    www.pramati.com
  13. Hi Carl,It is possible to detect deadlocks in your application. In our container we provide this facility to handle situations as follows:Account1 -> Account2 (transfer)Account2 -> Account1 (transfer)If these two transactions are happening concurrently and both Account1 and Account2 are locked in their respective transactions, then you can have a deadlock situation.The solution is simple.1) Each thread maintains a list of all the resources it has locked. 2) Whenever a thread requests a resource which is already locked by another thread, instead of waiting for that resource, the requesting thread will first check if that wait can cause a deadlock based on deadlock detection algorithm.3) If the deadlock detection algo says, this wait can cause deadlock, the thread will not wait and instead will try to rollback the transaction.T - A ThreadR - A Resource1) T1 --> request lock --> R1 (granted)2) T1 --> R1 (locked resources)3) T2 --> request lock --> R2 (granted)4) T2 --> R2 (locked resources)5) T1 --> request lock --> R2 (already locked, to wait, since cannot deadlock)6) T2 --> request lock --> R1 (already locked, rollback, since wait can cause deadlock)I am trying an examle using jdk1.5 Semaphores for deadlock recovery. I think if you keep one Semaphore instance per Resource to lock, instead of using the conventional java monitors (via synchronized), it should be possible to recover. Since Semaphores work on permits they can be released by other threads and hence deadlock recovery.Let me try oyt this one :-)Thanks,Hemantwww.pramati.com
    This is amazing - do you do that automatically, or do you have to program it that way everywhere that you use synchronized?
  14. Hi Heinz,

    We do this inside our container for CMP and BMP entity bean when container is locking. There is a deadlock detection algo which maintains status of all threads and resources acquired/requested and detects deadlock. There is some overhead when it is switched on.

    Hemant
  15. [Sorry, if this gets posted twice. It says, it has been posted but doesn't show up.]
    The concurrency handling mechanisms of Doug's book are now in the java.util.concurrent package of JDK 1.5.
    That's good and bad news in a way: while you can interrupt a thread that is blocked on a java.util.concurrent.ReentrantLock, your scheme doesn't work with them: the VM cannot detect deadlocks with ReentrantLocks involved. These are entirely implemented in the user level based on blocking primitives and atomic operations. When a thread blocks, the VM does not know "on what". So it can't help you.

    I'll raise this issue again on concurrency-interest.

    Matthias
  16. [Sorry, if this gets posted twice. It says, it has been posted but doesn't show up.]
    The concurrency handling mechanisms of Doug's book are now in the java.util.concurrent package of JDK 1.5.
    That's good and bad news in a way: while you can interrupt a thread that is blocked on a java.util.concurrent.ReentrantLock, your scheme doesn't work with them: the VM cannot detect deadlocks with ReentrantLocks involved. These are entirely implemented in the user level based on blocking primitives and atomic operations. When a thread blocks, the VM does not know "on what". So it can't help you.I'll raise this issue again on concurrency-interest.Matthias
    Yes, as soon as life becomes easier with JDK 1.5, it becomes more difficult to detect problems with deadlocks, livelocks, starvation, etc. :-(
  17. Heinz,

    Did you know that some of the later 1.4.1 variants from Sun also contain an
    automatic deadlock detection utility? Generating a thread dump from a running
    process (Ctrl+Break under Windows or "kill -QUIT pid" under Solaris) dumps
    information on threads including deadlocked threads.

    On 1.4.2_04-b05, the sample output looks like (on a trivial example I wrote):

    Full thread dump Java HotSpot(TM) Client VM (1.4.2_04-b05 mixed mode):

    "Krish2" prio=5 tid=0x00a10828 nid=0xc3c waiting for monitor entry [2cdf000..2cdfd8c]
            at krish.Threader.two(Threader.java:26)
            - waiting to lock <0x10033f58> (a [I)
            - locked <0x10033f68> (a [I)
            at krish.MainApp$2.run(MainApp.java:16)

    "Krish1" prio=5 tid=0x00a106c8 nid=0xa58 waiting for monitor entry [2c9f000..2c9fd8c]
            at krish.Threader.one(Threader.java:18)
            - waiting to lock <0x10033f68> (a [I)
            - locked <0x10033f58> (a [I)
            at krish.MainApp$1.run(MainApp.java:9)

    "Signal Dispatcher" daemon prio=10 tid=0x009f9198 nid=0x348 waiting on condition [0..0]

    "Finalizer" daemon prio=9 tid=0x009c1090 nid=0xdc0 in Object.wait() [2b5f000..2b5fd8c]
            at java.lang.Object.wait(Native Method)
            - waiting on <0x10010498> (a java.lang.ref.ReferenceQueue$Lock)
            at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:111)
            - locked <0x10010498> (a java.lang.ref.ReferenceQueue$Lock)
            at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:127)
            at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

    "Reference Handler" daemon prio=10 tid=0x009bfc60 nid=0x31c in Object.wait() [2b1f000..2b1fd8c]
            at java.lang.Object.wait(Native Method)
            - waiting on <0x10010388> (a java.lang.ref.Reference$Lock)
            at java.lang.Object.wait(Object.java:429)
            at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:115)
            - locked <0x10010388> (a java.lang.ref.Reference$Lock)

    "main" prio=5 tid=0x00034c90 nid=0xaac waiting on condition [0..7fc3c]

    "VM Thread" prio=5 tid=0x009f76f8 nid=0x964 runnable

    "VM Periodic Task Thread" prio=10 tid=0x009fb9b8 nid=0xcac waiting on condition
    "Suspend Checker Thread" prio=10 tid=0x009f8850 nid=0xc20 runnable

    Found one Java-level deadlock:
    =============================
    "Krish2":
      waiting to lock monitor 0x009c07cc (object 0x10033f58, a [I),
      which is held by "Krish1"
    "Krish1":
      waiting to lock monitor 0x009c07ac (object 0x10033f68, a [I),
      which is held by "Krish2"

    Java stack information for the threads listed above:
    ===================================================
    "Krish2":
            at krish.Threader.two(Threader.java:26)
            - waiting to lock <0x10033f58> (a [I)
            - locked <0x10033f68> (a [I)
            at krish.MainApp$2.run(MainApp.java:16)
    "Krish1":
            at krish.Threader.one(Threader.java:18)
            - waiting to lock <0x10033f68> (a [I)
            - locked <0x10033f58> (a [I)
            at krish.MainApp$1.run(MainApp.java:9)

    Found 1 deadlock.


    -krish
  18. Heinz,Did you know that some of the later 1.4.1 variants from Sun also contain anautomatic deadlock detection utility?
    Hi Krish,

    Yes, I did know about that :-) However, it is not so convenient when you have to tell your customer: "If the application seems to stop responding, please go to the command prompt (assuming you have one) and press CTRL+Break. Then please select the output and email it to me."

    Firstly, customers get nervous when I tell them to do the above steps. Secondly, they would not necessarily know when to look for a deadlock. With this approach, as soon as a deadlock happens, you can get a notification.

    Kind regards

    Heinz
  19. Heinz,Did you know that some of the later 1.4.1 variants from Sun also contain anautomatic deadlock detection utility? Generating a thread dump from a runningprocess (Ctrl+Break under Windows or "kill -QUIT pid" under Solaris) dumpsinformation on threads including deadlocked threads.[...]
    Heinz talked about that in his article:
    In the past, I would say to customers: "If the application stops responding, please go to the console and press CTRL+Break and then email me the threads that are printed on the screen." Now we can get notified automatically. Ohhhh, what joy!