Discussions

Performance and scalability: Performace trouble with a cluster weblogic 8.1

  1. I have a j2ee app running on weblogic 8.1. On my own machine, a single server ... the app works fine. Performance is no problem. But when we take it to the cluster environment, 1 managed server, 2 nodes and a proxy performance has a big drop.

    We create jobs in the application, and save. When we do this about 10 times consecutively, at one point it just freezes for about 5-8 minutes. Usually it would take only 3 seconds to complete.
    We tried looking at the GC output, but during that long freeze, there is no garbage collection happening. Or at least the server does not show any output of a gc.

    The heap size has been tried at 1g and 512m but no difference.
    The app has many cmps and stateless beans, no stateful session beans.

    Any tips on where i should start looking to fix this? Are there some settings that should be made specifically for a cluster that we are missing?

    Thanks,
    Arun

    Threaded Messages (6)

  2. ... When we do this about 10 times consecutively, at one point it just freezes for about 5-8 minutes. Usually it would take only 3 seconds to complete.We tried looking at the GC output, but during that long freeze, there is no garbage collection happening...

    The solution is obvious: when you are in the middle of one of these freeze situations, you only need to send the SIGQUIT signal to both the JVMs you are using (in UNIX, you must type: $kill -3 <pid> ).

    If you look at the application server console, you can find what is happening inside (you are going to see the activity of all the threads of the JVM).


    Jose Ramon Huerga
    http://www.terra.es/personal/jrhuega
  3. The kill -3 will definitely help determine what you are hanging on at the time to get a good thread dump (make sure to look in the node manager logs to see that output). This will not kill the server. A 5 – 8 minute freeze sounds really out of the norm except in cases of high memory utilization, huge heap and a full GC.

    You can use the following to determine the maximum length of any GC rather than using the WLS console to get this measure:

    -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -Xloggc:/LOGDIR/LOGNAME.out -verbose:gc –

    Then download the Sun PrintGCStats shell script and run it against your GC output as follows (assuming 4 CPU box)

    PrintGCStats -v ncpu=4 LOGNAME.out

    You will want to looks for a GC line that shows the stop the world full GC’s. If it matches your timeout period then you know a GC is the culprit:

    what count total mean max stddev


    GC(s) 2559 210.014 0.08207 5.897 0.3179

    This should help.

    Note it will not work with jrockit.

    Terry Trippany
    Chicago
  4. Will try[ Go to top ]

    Thanks for the tips ... will give it a try.
  5. Will try[ Go to top ]

    if the thread dump does not help try a profiling (OptimizeIT, JProbe, ...)
  6. Will try[ Go to top ]

    i think you are far better off trying PerformaSure. I guess Jprobe and optimizeit are single instance profiling tool...
    it worked for use , we had somewhat off a same problem and this tool PerformaSure told the casue of the issue
  7. Will try[ Go to top ]

    i think you are far better off trying PerformaSure. I guess Jprobe and optimizeit are single instance profiling tool...it worked for use , we had somewhat off a same problem and this tool PerformaSure told the casue of the issue

    Yeah, this is true: JProbe is designed to be used in just one instance, and under a very low load. In order to see what is happening inside the application under a heavy load test, you must use PerformaSure.


    Jose R. Huerga

    http://www.terra.es/personal/jrhuerga