November 2007
Dowload the Podcast for this interview
Kirk Pepperdine: I'm at JavaZone in Oslo Norway speaking with Holly Cummins from IBM. Holly, can you introduce yourself to the TSS readership?
Holly Cummins: Sure, I work in the JAVA technology center in the Hursley site and that's where we do quite a lot of the work on our Java Virtual Machine and it's also where we're increasingly doing a lot of work on the tooling for the Java Virtual Machine. I've been in that department for about three years and I do garbage collection research and performance research and also increasingly tooling development.
Kirk Pepperdine: There really haven't been a lot of tools available to the general Java population for dealing with the Java Virtual Machine, the IBM version. Is that changing now?
Holly Cummins: It is, yes. We are bringing out quite a few new tools and we're increasing the size of our tooling team. I think, in fact, there had been quite a lot of tools available in the past for IBM Java Virtual Machines, it's just not always been obvious where to find them because sometimes we've had multiple tools doing quite similar things.
Kirk Pepperdine: So, how are they addressing this now? How are you letting people know where these things are?
Holly Cummins: Well, we've got a new one-stop shop for support which is called the IBM Support Assistant. And the idea is that instead of having to go to multiple sources and trying work out what you really need, you download the IBM Support Assistant and that's a single download and then you tell it what products you've got by installing the product plug-ins and then gives you a list of tools which might be applicable in your situation and it also gives you all the relevant documentation so you don't have to try and search on the web for the documentation – it just comes right up.
Kirk Pepperdine: So with this service, if I download and install the core JDK, then what kind of tools will the Support Assistant provide.
Holly Cummins: It will give you a few. The one that I'm the lead developer for is EVTK. So that's a verbose GC analysis tool, and it will do the visualization of the verbose GC obviously, but it will also do an increasing amount of analysis around it, so it will give you recommendations. It will say you should maybe consider increasing your heap size or it looks like, you know, this isn't quite right in your system, or one of the problems that people have often is that they inherit code and it's calling system GC a lot, and it's just destroying performance. So we look for that kind of situation, and that's what our support people see often, is we get called in because the performance is terrible, then they look at the code and they realize that it's calling system GC, but it can be quite time consuming to discover that. So, the idea is that EVTK will instead on the first pass of the verbose GC log, will flag it up and say ‘hey this might be a problem that you should investigate.'
Kirk Pepperdine: In terms of the garbage collection models, memory models within the IBM virtual machine, up until 1.4 they were quite divergent from the Sun generational model and the JRocket generational model. Can you describe to us what's changed and maybe some of the reasoning behind those changes?
Holly Cummins: Sure, as of 1.5 we've got 3 or 4 policies, depending on how you count it. We've got the standard mark-sweep concurrent. We've got the standard mark, sweep not-concurrent. And we do find in most situations, that does actually give better throughput than the generational policy, which is why it is the default.
But in 1.5, we do introduce the generational model, and so it's just got two generations because we found when we did our performance tuning that you got the greatest benefit from generations by having just two. And some of the downsides of generational policy you avoided by having just two generations. And so you've got in the younger generation you've got a copying collector, which is extremely fast, and gives you very fast allocation, and very fast application access. And then in the older generation it's a concurrent collector and so we found with our gen-con policy the pause times tend to be quite short because in the nursery, the nursery is pretty small, it can, it collects pretty quickly, and then if you do a full collection, it's still a pretty quick collection because it's concurrent, and so it's running in the background for most of the duration of the collection.
Kirk Pepperdine: Right, so you're talking about a two generational space now, right? So the first generation you're calling a ‘nursery'?
Holly Cummins: Yeah, we call it a nursery.
Kirk Pepperdine: …and the older generation you're calling…
Holly Cummins: the tenured area.
Kirk Pepperdine: …the tenured area . So how long do your objects actually stay in the nursery before they get moved to the tenured area?
Holly Cummins: It's configurable. I think the default is 14 collections, although the JVM will do a bit of tuning about it as well, and so if the nursery is quite small then there may be a bit more pressure on the nursery, and few objects may get promoted earlier. And again if you got quite a large nursery, then it's not, it's going to try to avoid promoting objects, if there is space within the nursery.
Kirk Pepperdine: Does this make a slightly different scenario than the standard Sun model, where you have a young generation with Eden, which is the equivalent of nursery and the survival spaces which is sort of a hemisphere type arrangement and this is lead to some strategies in the Sun environment. Do you find these strategies of, making the young generations large and resizing the survivor spaces is somewhat translating over to the IBM approach or what's the consequence of copying and compaction in the nursery, as opposed to moving?
Holly Cummins: Yeah, it's really interesting, it depends so much on the application that it makes it so hard to make general statements, which is quite sad. I think if there was one answer for a teaming strategy, then we'd have it in the JVM, and nobody would ever have to worry about it again. But depending on the application, either the standard mark-sweep will go faster, or the generational will go faster, so it's about fifty-fifty, which goes faster.
And then within the generational policy, if we take one application and we do an experiment, we change the size of the nursery, so we start with a really small nursery and we go to, you know, a huge nursery, which is most of the heap, then sometimes you'll find there's a performance dip out of a midsize nursery. So with a really small nursery, you're doing really well. With a large nursery, you're doing really well, for different reasons. With a small nursery you get extremely rapid collections. But then you tend to end up with a lot of objects in the old area having links to the nursery area, so that can slow things down. And if you got quite a large nursery area, then you get a different problem, and so the cost of the write barrier is less. But then you end up with quite slow collections in the nursery, but then you end up with a very large, very fast allocation, infrequent collections, and so that, for one policy, the middle size nursery is terrible.
Small nurseries are great and big nurseries are great. But then, with another application, you'll find that a middle size nursery is great. If you go really small, your performance drops off, and if you go really large, your performance drops off, so it's easy to articulate the factors which affect the performance, but which ones are going to be the most relevant for your application depends on how rapidly it allocates objects, how long they survive, and the patterns of object connections as well.
Kirk Pepperdine: Do you find that if an application is holding on to objects for a longer period of time, on average, than another application, does a smaller tenured space work better, or a bigger tenured space?
Holly Cummins: Probably, a bigger nursery with a smaller tenured space would be better in that case. Or it might be the switch away from generational would be best, entirely a switch to a more standard mark-sweep model because the generational policies tend to work best if you've got a high rate of object churn, a lot of objects or create a flurry of objects all at once. And then they live for a while and then they die. In that case the cost of collecting them is free because it is a copying collector. That kind of situation is going to work really well with generational.
If you get objects that tend to live for a long time that can be quite unfortunate in generational collection because first of all, when they're still in the nursery, you have to copy these objects back and forth, and that's just wasteful and when tthey get to the tenured area then they're going to hang around, then they're just going to clutter up your object map and make your collections slower.
Kirk Pepperdine: So you'd want to move away from a generational model in that case?
Holly Cummins: Yeah, it would be a good candidate.
Kirk Pepperdine: Would IBM actually support moving away from a generational memory model to just a flat memory model like what was the existing, prior to 1.5 ?
Holly Cummins: Yeah, you'd have to restart the JVM. You couldn't do it on the fly.
Kirk Pepperdine: Right. So in this case, all of the things you'd do before, in terms of having the spaces and wilderness and large object space would apply.
Holly Cummins: Yeah.
Kirk Pepperdine: Ok. So you really haven't dropped the old model, you've just augmented by saying ok, we can use this different model on top of the older model.
Holly Cummins: Yeah, we've got the existing policies, then we've got the new policy which is the gen-con and then we've also got a real time policy as well and that doesn't ship with the standard JVM, it's a separate product called WebSphere Real Time and it's only on limited architectures in that it needs kernel support and that kind of thing. But that's also another option.
Kirk Pepperdine: So what does the real time give you then, say, over the standard shipment?
Holly Cummins: It gives you guaranteed short pauses, and a guaranteed utilization as well. And so, certainly, traditionally, some real time strategies. What they've tried to do is they've gone through the shortest pauses possible and they thought, great we've got short pauses, mission accomplished, but what they haven't really been aware of is that you have a short pause and then you have another short pause and then you have another short pause and another short pause, you may end up with a cluster short pauses that are so tightly clustered that it is if you had along pause and your application response times, they're not gonna meet the targets, and you know, you, your …
Kirk Pepperdine: You get a stuttering effect in that case, which is…
Holly Cummins: Yeah, and, so what the IBM Real Time does, it doesn't necessarily… people sometimes think of Real Time as being really fast, of course, it's not, it's…
Kirk Pepperdine: Predictability.
Holly Cummins: Exactly. And so it's a guaranteed utilization. So you take any time slice, up to, say a lower limit, and you say, well, ok within these hundred milliseconds, I want to be sure that my application has 50 % of the time. And then it says, yeah, fine, guaranteed and then you take it smaller. If your real time constraints are even more tight, ok, within these 50 milliseconds or 50 % of the application time, and again, it will meet that target.
Kirk Pepperdine: So if it's collecting, it will stop after 50 milliseconds and your application runs again?
Holly Cummins: Yeah, the pause time won't necessarily be 50 milliseconds in that scenario. It might do a 5 millisecond pause, and then use your application 5 milliseconds and then another 5 millisecond pause. And so the idea is that the pauses are extremely short. And within any particular time period, the utilization is guaranteed as well.
Kirk Pepperdine: What type of customers you find are using like the real time solutions?
Holly Cummins: A lot of the finance industry are quite keen on it, I'm not sure whether there are many deployments out there yet, it. It's quite good for things where you get interface with hardware and so something that maybe traditionally would have been C, there's now Java instead.
I'm not sure how many real deployments there are yet, I think it's still at the testing stage, because it's quite a new technology and it tends to be in the sort of real time environments that tend to be high stakes environments, so you want to test it out pretty thoroughly for a while yet.
Kirk Pepperdine: Right.
Holly Cummins: We're looking at it through things like music synthesis as well, because if you're doing real time, if you're doing music synthesis it's so audible if you get a stutter and so with the real time we can show that you can actually get this going smoothly which is quite nice.
Kirk Pepperdine: Yeah, it's the stutters that really kill the, the user experience in this case.
Holly Cummins: Yeah.
Kirk Pepperdine: Do you find any interest from the gaming community then, because they sort of have these really hard rendering problems.
Holly Cummins: Yeah, I expect there is. I don't know of any personally, but I expect there is.
Kirk Pepperdine: Is there much more performance hit for using the real time extensions for the Virtual Machine?
Holly Cummins: The performance is definitely not as good. I don't know the actual figures, but there's a performance hit which is why real time is not for everybody, and you tend to have two kinds of real time constraints as well.
For some applications, the consequences of the failure are so catastrophic that you've absolutely got to meet the target, so for a lot of other applications, well, the consequences of failure are pretty bad, but they're livable with, and so in those cases, we tend to have soft real time, and a lot of people who want real time are actually more interested in the soft real time and in those cases you're pretty likely that you're going to make your target. But you get a big throughput gain in order to be a bit more flexible in the way you meet the target and all.
Kirk Pepperdine: So it's an interesting alternative for people who have like soft, real time targets and because traditionally they would probably be stuck on the existing technology.
Holly Cummins: Yeah, so we've got this continuance. At one end we've got WebSphere Real Time which is hard real time. And then, we're working on getting more things into that intervening space where it's a lot better than a normal concurrent pause policy, and it's a lot faster than hard real time. And then you can take the continuum up a bit further and then you get the gen-con policy which tends to have quite short pauses and then at the far end of the continuum would be the throughput policy which has fairly long pauses sometimes depending on the heap size, but really quite good throughput.
Kirk Pepperdine: Garbage collection has been such a thorn in Java's side for such a long time. You hear horror stories now, you know, people saying ‘oh we suffer like 30 second or 3 minute pause times like once a day.'
Holly Cummins: The problem with garbage collection is that it's invisible until it causes a problem for you, so under the covers it can be doing a lot of really great things in speeding your application up, but you never notice that.
Kirk Pepperdine: That's a good point.
Holly Cummins: You only notice it when it causes problems. I tend to cycle and I find it's the same with the wind. If there's a headwind, I always think always, isn't this a disaster that I have a headwind, and if there's a tailwind, I don't notice it, I just think, ooh, I'm really fit today.
Kirk Pepperdine: Yeah.
Holly Cummins: And I think it's a bit the same with garbage collection, because quite often you've got a policy and it's got long pauses, but your overall throughput is actually better than with a different policy because in a time when the application is not paused it's going really blindingly fast and that is because of the intervention of the garbage collection and how it's laid out the objects in the heap and that kind of thing, but in that kind of situation you're often not aware that it's the garbage collection that causes your application to go faster, you just think, ‘Oh, a really good coder I am, look, how I coded this up!'
Kirk Pepperdine: Does your work extend into other environments beyond the AS400 environments or are these systems any different than running on like a Linux based system that IBM would normally support.
Holly Cummins: We tend to support all those platforms with the activities in the iSeries and the pSeries and the fundamental algorithms tend to be the same for most of those. And then the diagnostic characteristics that you look for tend to be the same as well.
No matter what your platform it's pretty bad if you've got a high garbage collection overhead and it's pretty bad if you've got long pause times. And you know, it's pretty bad if your heap size is thrashing and that kind of thing. Some of the implementation details are different, but the principles are the same across all of them.
Kirk Pepperdine: What do you recommend then for GC pause times before you start taking aggressive action to actually improve GC efficiency, or should I say, before you start taking aggressive action and tuning your virtual machine?
Holly Cummins: It depends usually on the policy, because some policies can tolerate a much greater overhead than others and so... for example, if you use the gen-con policy, you'll see that you're spending a lot of time paused. But in fact, your application throughput is still pretty good. And again, on the other hand, if you're using a concurrent policy, you'll see any overheads above anything like 5%, then you know that something's wrong and you need to reach in your system, because something is destroying your garbage collection and it's not behaving the way you want it to. But beyond that, it depends so much on what your quality of service requirements are, and so, if you've got a big heap, then what you're going to expect is infrequent long collections. And as long as that's what you're happy with, then it's fine.
Kirk Pepperdine: Do you tend to recommend against large heaps then? And what's large in your estimation?
Holly Cummins: Again, it all depends on what you can tolerate, and so if you've got some sort of batch processing system and a long pause time really isn't gonna be a problem.
Kirk Pepperdine: Right.
Holly Cummins: And if you collect too frequently, then what will happen is that every time you resume after a garbage collection, the application's going to go more slowly because the cache has been polluted. And so, you need to, no matter what your quality of service requirement is, you kind of want to avoid collecting too frequently. But, if you've got a very large heap, then you tend to find that the performance of your application is improved, but there are diminishing returns.
So, about 50% occupancy is probably ideal to aim for. So that means after you've done your collection, half of what was there in the beginning of the collection is still alive. But there's a lot of flexibility in that; you can go up to about 70 or 80% without seeing too severe a degradation. So, if you've got a lot of data, then you'll have to have a large heap in order to accommodate all your live data.
Kirk Pepperdine: Holly, thank you for taking the time to speak with us today.
Holly Cummins: Thank you.
PRINTER FRIENDLY VERSION
|