How We Solved our Garbage Collection Pausing Problem

Discussions

News: How We Solved our Garbage Collection Pausing Problem

  1. Greg Luck was suffering a nine second GC pause every fifty seconds, when Sun suggested some garbage collection tunings that solved the problem (down to two major GCs every day, he says, in "How We Solved our Garbage Collection Pausing Problem.") The tunings were pretty simple:
    -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:NewSize=1200m -XX:SurvivorRatio=16
    As Greg says:
    -XX:+DisableExplicitGC - some libs call System.gc(). This is usually a bad idea and could explain some of what we saw. -XX:+UseConcMarkSweepGC - use the low pause collector -XX:NewSize=1200m -XX:SurvivorRatio=16 - the black magic part. Tuning these requires emprical observation of your GC log, either from verbose gc or jstat (a JDK 1.5 tool). In particular the 1200m new size is 1/4 of our heap size of 4800MB.
    This is all documented in "Tuning Garbage Collection with the 1.4.2 Java Virtual Machine," which has a 1.3 version and 1.5 version. Mr. Luck also didn't quite mention which specific JDK he was running, although his reference to jstat indicates it might have been 1.5. If it wasn't 1.5, his use of -server or -client mode might have factored in as well, because -server changes the allocation of memory to Eden and the survivor spaces as well. It's very common to need tuning of this sort in long-running Java processes, in particular in J2EE applications. It's so common that it should be part of the tuning checklist of every application, along with a few other gems like "don't use reflection unless you really have to, and if you have to, cache the method references!" That said, are you aware of how to measure GC times, and tune it? Mr. Luck refers to Sun engineers looking at the verbose garbage collection logs, and offering the settings based on those. Is there a deterministic (or mostly deterministic) way to determine good settings for these? Could a lack of garbage collection tuning be part of J2EE's undeserved reputation for slow runtime performance? Message was edited by: joeo at enigmastation dot com to add the 1.5 GC tuning page, pointed out by Hack Kampbjørn

    Threaded Messages (21)

  2. This settings are of course specific to his workload. We had a problem with JVM occasionally taking all cpu for minutes days or weeks apart. The most difficult troubleshooting was to tell that it was garbage collection. I started reading the tuning document for 1.5 but I should have started with the ergonomics. Our solution was as simple as to add:
    -XX:MaxGCPauseMillis=5000
    http://java.sun.com/docs/hotspot/gc5.0/ergo5.html http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html
  3. Ah, thank you for the 1.5 GC link! I looked but couldn't find it - so much for omniscience, eh. :)
  4. Essential reference[ Go to top ]

    In addition to the 'tuning' page, check out this for GC tuning. http://blogs.sun.com/roller/resources/watt/jvm-options-list.html
  5. Could a lack of garbage collection tuning be part of J2EE's undeserved reputation for slow runtime performance?
    It's rarely the problem. Usually it's bad code and bad design ideas that have been pervasive in the Java world.
  6. I don't know that I'd say it's "rarely" the problem - because in my experience, tuning memory is the lowest-hanging fruit in almost every performance-tuning exercise. To be sure, fixing awful code (i.e., references, often reading system resources that change rarely, poor algorithms) is also a huge win, but saying GC is rarely the problem is not my experience.
  7. I don't know that I'd say it's "rarely" the problem - because in my experience, tuning memory is the lowest-hanging fruit in almost every performance-tuning exercise. To be sure, fixing awful code (i.e., references, often reading system resources that change rarely, poor algorithms) is also a huge win, but saying GC is rarely the problem is not my experience.
    What I am saying is that a lot of Java programs produce way more long-lived garbage that what is reasonable. Looking at the article I have several questions: why was a major GC running every 50 seconds? That's not normal. It was probably caused by either something calling System.gc() or by not giving the VM enough 'breathing room' to grow without running a GC. Actually in the latter case you still need to be careful with the concurrent GC because it can give false positives for OOM Errors (there's another GC setting to cope with that but it's probably just easier to make the max heap a good bit larger than what is strictly needed. Did the author try out the settings separately? Perhaps the disableExplicitGC setting really fixed the issue. I can't tell from the article. Having said that, I'm a big fan of the concurrent GC for aything that needs consistent response time (I use it with Eclipse, for example) and sometimes these issues are out of your hands because they are in the App server, not your code. But the point is that GC problems are *usually* a symptom, not the root cause. That's my experience anyway.
  8. But the point is that GC problems are *usually* a symptom, not the root cause. That's my experience anyway.
    Agile developers are supposed to take the shortest path to meet today's requirements. How often does the finding and correcting the root cause meet that criteria? While the engineer in me despises the idea of not correcting the root cause, there is a subtle beauty to the agile logic.
  9. But the point is that GC problems are *usually* a symptom, not the root cause. That's my experience anyway.


    Agile developers are supposed to take the shortest path to meet today's requirements.

    How often does the finding and correcting the root cause meet that criteria?

    While the engineer in me despises the idea of not correcting the root cause, there is a subtle beauty to the agile logic.
    That's really neither here nor there. The question was "Could a lack of garbage collection tuning be part of J2EE's undeserved reputation for slow runtime performance?" and I say no, because the lack of tuning is not usually the cause of the poor performance. Tuning is a way to resolve the symptoms. Saying that the lack of tuning is the cause is like saying that that my car's axle is damaged because the mechanics have not fixed it. The mechanics' lack of action didn't damage the axle. It was my running over a dead deer that did it.
  10. While the engineer in me despises the idea of not correcting the root cause, there is a subtle beauty to the agile logic.
    That usually comes in the form of a contract extension to fix the hidden bugs, right? :)
  11. But the point is that GC problems are *usually* a symptom, not the root cause. That's my experience anyway.


    Agile developers are supposed to take the shortest path to meet today's requirements.

    How often does the finding and correcting the root cause meet that criteria?

    While the engineer in me despises the idea of not correcting the root cause, there is a subtle beauty to the agile logic.
    I'm going to disagree that tuning the GC to alleviate the symptoms of some other problem would qualify as part of the Agile development process. In the case of a product going live, if this is necessary, then so be it. This is purely a pragmatic solution however, and does not address the problem in an acceptable manner. Sometimes pragmatic solutions are necessary, but I'd expect any decent engineering effort to eventually address issues that cause the VM to hang due to long periods of garbage collection. Revisiting code in order to fix this issue would, in my opinion, qualify as agile.
  12. OT: reflection[ Go to top ]

    It's so common that it should be part of the tuning checklist of every application, along with a few other gems like "don't use reflection unless you really have to, and if you have to, cache the method references!"
    Sorry, completely OT, but my belief is that reflection is alot less of a problem in JRE 1.4+ than it was up to JRE 1.3. OK, for serious tuning, yes I agree. But I'm not sure it is the performance issue it once was. I am prepared to be proved wrong though. Kit
  13. Re: OT: reflection[ Go to top ]

    Well, compared to 1.3, yes, 1.4+ is far improved. However, looking at the context switch involved in using native code - which reflection does - it's very expensive, no matter HOW improved it is, compared to caching method references. Even those can go through the security manager, which isn't a good idea, performance-wise.
  14. Re: OT: reflection[ Go to top ]

    Re: reflection speed: depends on what "very expensive" means to you. Reflection method calls are slow compared to direct method calls. If I do a method lookup and call I can do only about a million calls per second on my laptop. If I cache the "method pointer" I can do 10,000,000 reflection calls per second. Test program somewhere in here: http://forum.java.sun.com/thread.jspa?threadID=669560&messageID=3915259 -- be sure to use "java -server", there is a 6x difference between -server and -client.
  15. is that how YOU do reflection?[ Go to top ]

    I guess if you are committed to apaches bean.getProperty you wont be caching. Any model seriously dependent on reflection would cache the method references. But honestly, your reference to a "context switch" and "native code" is highly suspect. Why does java need to call native code to look up a java method even if its not cached in its internal object representation of the instance? "Native code" just to bypass type checking I think not. A context switch is the term for when a CPU has to switch threads or processes and save off the registers to cache memory to switch tasks. Since the VM does no native IPC I cant see that that would ever really happen in a Java app. Do you know the VM well enough to say thats actually true? Given the amount you post on here I hope you know what youre talking about man. Cuz that sounds like FUD to me.
  16. how about profiling your app[ Go to top ]

    Any app that is GCing that much is poorly conceieved, written and deployed Given that most java apps are written by people who have mediocre concepts of object lifetime, session mgmt and threading and the affect of newbies "newing" objects all the time, I really think its disingeniuous to post an article about your GC probs without discussing the actual code. It sounds like you are asking for everyone elses help to make your app not suck. Is that what TSS is about now? The articles on TSS seem to be getting get worse and worse in favor of Java newbies on a daily basis.
  17. Re: how about profiling your app[ Go to top ]

    Any app that is GCing that much is poorly conceieved, written and deployed Given that most java apps are written by people who have mediocre concepts of object lifetime, session mgmt and threading and the affect of newbies "newing" objects all the time, I really think its disingeniuous to post an article about your GC probs without discussing the actual code.
    I've been repeatedly surprised by what shows up in tenured space in the Sun JVM .. some of our shortest lived objects were showing up there, for example. Understanding why (in this case because exactly one tenured object had a short term reference to the short-term object) allowed us to make minor code changes that increased throughput on some of our benchmarks by 10%. Just to be clear, I'm not bragging .. quite the opposite in fact: It's [unfortunately] very easy to end up with GC behavior that is explainable with an effort of research, optimizable [fortunately] with little effort, but generally the behavior is unknown because of the lack of such methodical inquisition. This particular "GC issue" of ours had been in the software from day one, and that despite the fact that we do regular code reviews, profiling, GC testing and tuning, etc. I guess what I'm saying is that I wouldn't be so harsh, having seen how easy it is to have such issues ;-) Peace, Cameron Purdy Tangosol Coherence: Clustered Shared Memory for Java
  18. I guess what I'm saying is that I wouldn't be so harsh, having seen how easy it is to have such issues ;-)
    I just want to clarify that I agree with the two previous posts and that my point was not that GC tuning isn't ever strictly necessary but that for most Java applications the default setup is more than sufficient and the well-know 'preceived Java performance' problem is related to the performance of old JVM technology and some particularly poor design philosophies that have been (less so lately) popular in the Java world along with some individual algorithmic ignorance.
  19. I guess what I'm saying is that I wouldn't be so harsh, having seen how easy it is to have such issues ;-)
    I just want to clarify that I agree with the two previous posts and that my point was not that GC tuning isn't ever strictly necessary but that for most Java applications the default setup is more than sufficient and the well-know 'preceived Java performance' problem is related to the performance of old JVM technology and some particularly poor design philosophies that have been (less so lately) popular in the Java world along with some individual algorithmic ignorance.
    Yeah, no kidding! I still get to read (on slashdot) how Java is slow because it is interpreted ;-) Peace, Cameron Purdy Tangosol Coherence: Clustered Shared Memory for Java
  20. Re: how about profiling your app[ Go to top ]

    I agree its easy to make mistakes and thats all they are simple mistakes. I dont agree that this is the forum for people who make those easy mistakes to post FUD about the garbage collector without posting the code that caused the problem once they figure it out. Im not a JVM snob. But I know that when you find a problem in your code its your business. When you find a problem in SUNs code then post it AND the code we can actually appreciate solving it and move on.
  21. Re: how about profiling your app[ Go to top ]

    Any app that is GCing that much is poorly conceieved, written and deployed
    I find this to be a rather disingenuous statement. For example, the aforementioned System.gc() could be happening, and a simple code change together with proper gc tuning can eliminate the problem forever. Does this speak to the rest of the app's overall quality? No way. Good GC Tuning is, at any rate, a process that must be managed by advanced developers with fairly broad knowledge of the many factors involved. I suspect this thread is in-fact quite useful to a large number or our readers. Thank you, TSS! Cheers, Gideon
  22. Re: how about profiling your app[ Go to top ]

    system.gc should never be used in an app. Everyone knows that. System.gc == figure out your problem and stop micro managing the VM. This many years into java your still calling system.gc? Cmon man!