Discussions

News: Object pooling is now a serious performance loss

  1. Brian Goetz continues to lift the lid and peak into the inner workings of Java in Java Urban Performance Legends. In this article he exposes the fallacy behind some of the more common performance myths found in the annals of the JVM.

    The article takes a deep look at memory management has improved over the years and how this improvement has invalidated many of the techniques that developers generally believe help performance. The article winds up an explanation of escape analysis and how this technology will further improve the JVMs ability to manage memory.

    Is the JVM technology fast enough for you or does you JVM cause you performance grief?

    Threaded Messages (92)

  2. grief?[ Go to top ]

    ...only memory usage grief.

    but that is more of a rt library problem than a jvm problem.
  3. Object Pooling[ Go to top ]

    Quote: "Public service announcement: Object pooling is now a serious performance loss for all but the most heavyweight of objects, and even then it is tricky to get right without introducing concurrency bottlenecks"

    When you say 'most heavyweight objects', can you put a size on it? Probably not, its more like its a relative experience, and changes in terms of the applicaiton context, or transaction context. Is it just a question of saying the transaction would be X% faster if the object was pooled, and if X% is significant then its worth creating a pool service.

    Is a heavyweight object something that is some ratio (e.g. 1/1000th) of the overall heap size? Or is it something that has multiple internal allocations (e.g child objects)?
  4. Re: Object Pooling[ Go to top ]

    I wouldn't use memory consumption as the guide for what a heavy object. Instead I would classify a heavy object as taking a long time to create. That creation work could be doing intensive initialization calculations or just aquiring external resources. For instance an object that would load and parse an XML document at creation time could easily be considered "Heavy" and you wouldn't want to do it on a whim.
  5. JVM is performance is OK for most cases, for me it shines on serverside.

    But Swing-based apps, especially large ones, make you feel that the statements in the article are just a theory :(
  6. Yes[ Go to top ]

    I have seen some very large swing apps too (more than 2048x2048 Pixels !) and they are very fat :-P

    The question should be: Why are client applications so fat ?

    Thunderbird Windows => 20 MB
    Thunderbird Linux => 100 MB ?

    SCM Application, FatClient (Java/Swing) ~ (80 Forms => 20-30 MB

    I think what makes Gui-Applications to fat are to many stupid icons ;-)
  7. you don't measure the same thing[ Go to top ]

    Where do you get your 100 MB figure?
    From top?

    top doesn't measure the same thing that your windows task manager does. Hard to compare.

    J
  8. GUI applications fat?[ Go to top ]

    Hello dudes, i'd like to answer a question which someone sad , Thunderbird in Linux is really fat, that's because some libraries are static. With static linked libraries your binary get very fat, 'cause they are linked togheter with your application, you can use dynamic linked libraries to solve that problem.
  9. There are exceptions as always...[ Go to top ]

    Simple object pooling (which hold no external resources, but only occupy memory) is sure a waste, but there are exceptions as always, when objects hold external OS resources:
    - database connection pooling,
    - socket connection pooling (including HTTP, RMI, CORBA, WS, etc...)
    - thread pooling,
    - bitmaps, fonts, other graphics objects...
  10. Simple object pooling (which hold no external resources, but only occupy memory) is sure a waste, but there are exceptions as always, when objects hold external OS resources:- database connection pooling,- socket connection pooling (including HTTP, RMI, CORBA, WS, etc...)- thread pooling,- bitmaps, fonts, other graphics objects...

    I believe this is pointed out in the article. It is a cost benefit analysis. What is the cost of pooling (synchronization, management, GC etc.) vs the benefit (quick access to connection or socket). What Brian is pointing out is that the performance improvments in the JVM have tilted this equation so you'd better re-evaluate what you think you know ;)
  11. I believe this is pointed out in the article. It is a cost benefit analysis. What is the cost of pooling (synchronization, management, GC etc.) vs the benefit (quick access to connection or socket). What Brian is pointing out is that the performance improvments in the JVM have tilted this equation so you'd better re-evaluate what you think you know ;)

    Point is on external resources. Whether objects holding external resources (out of JVM, either OS resources on JVM host machine, or on the other machine) should be pooled or not, does not depend on JVM efficiency, but on their availability, weight and and other costs of holding these resources.
  12. Anyone could explain then what is the use of pooling for Stateless Session Beans?
  13. The original ideea was that developers are not that "smart" and can not write thread safe code. And each object in the stateless session bean pool was used by one thread only at a time.
  14. There are exceptions as always...[ Go to top ]

    The original ideea was that developers are not that "smart" and can not write thread safe code. And each object in the stateless session bean pool was used by one thread only at a time.

    Huh??? If it has no state (we're talking about stateless session beans here), it does not need to be thread-safe, does it?

    Either I am missing something here, or your argument is fundamentally flawed...

    Confused,
    Lars
  15. Huh??? If it has no state (we're talking about stateless session beans here), it does not need to be thread-safe, does it?Either I am missing something here, or your argument is fundamentally flawed

    The EJB standard does not require that Stateless Session EJBs have no internal state (ie. fields). It simply requires that no conversational state be maintained between the bean and its clients. A pool of these that are doled out to client threads on a per-call-basis is valuable if the bean manages some expensive-to-allocate-or-create resource.

    Chuck McCorvey
  16. As far as I can remember, EJB spec states that only one thread at the time can execute method code in a statelss session bean instance. So you need to pool them to provide concurrency.

    The qustion is whether this request is meaningfull.
    As only reason for this request I can see is that this way you can have local instance state (as instance fields) in a stateless session bean that is valid during a single method execution. Note that your method can call other (possibly private) methods of the same bean, so in this scenario this makes sense.

    However, you can get the same behavior by holding a state and logic in a separate POJO and having your session bean playing a facade to that POJO.

    Probbaly this request is to simplify programming model for stateless session EJBs, making them completely thread safe .
  17. There are exceptions as always...[ Go to top ]

    Anyone could explain then what is the use of pooling for Stateless Session Beans?

    It is not useful in < EJB 2.1, but I think it may actually be useful in EJB 3.0 because of injection and interceptors. (interceptors will be allocated per bean instance). Somebody will have to do a microbench of this though.
  18. Anyone could explain then what is the use of pooling for Stateless Session Beans?
    It is not useful in < EJB 2.1, but I think it may actually be useful in EJB 3.0 because of injection and interceptors. (interceptors will be allocated per bean instance). Somebody will have to do a microbench of this though.

    Pooling for stateless session beans is important as the standard does not require that no internal state be maintained, only that no conversational state be maintained between the bean and its clients. A pool of Stateless Session Beans that hold onto some expensive to allocate resource or some other form of useful but non-client-specific sate is useful and allows access to each member of the pool in a serially reusabe pattern works wonders.

    Now, it would be nice if the container could be told to not pool the beans if the bean implementations have no internal state to maintain.

    A serious abuse of pooling (and caching) comes in WLS (at least through 8.1). Here, despite the fact that I've indicated that my Entity EJBs are not to retain value acros transactions, it still goes through the bother of putting the bean in a global cache, making the threads contend with each other for no purpose. A simple local cache with read through to a global cache and which becomes collectable at the end of the transaction would work wonders for performance of their Entity beans.

    Of course, who uses Entities anyway?

    Chuck McCorvey
  19. First, stop pulling out phrases from context to entitle articles, the original statment was "Object pooling is now a serious performance loss for all but the most heavyweight of objects.
    Second, what author is considering is known as object banks, not objects pools. Object pools works fine at managing limited external resources, such as database connections. They has nothing to do with JVM internals.
  20. Good article.

    It does confirm my suspicions about performance of JVM allocation and helps the decision of whether designing program for easy of maintenance or performance.

    For example I had IO application that would read file and create up to 10000 Records which would contain array of byte arrays.
    One performance savvy approach would be to read and return byte[][][] or do it more OO way and have Record[]->Field[]->byte[]probably with some type of Record/Field caching.

    I went with later approach without caching because it is easy to use (no guessing what is in byte[][][]) and JVM would do better job of maintaining Record/Field’s for me since they were small and short lived.
  21. I still find it quite hard to understand why "object pooling" would ever be a serious performance loss. I sure hope it is not due to the speed of reference counting. So I would assume it would be due to synchronization issues mainly?

    Oh and one more thing: I might be slightly confused there, but if I think about any decent real world gui component (the part of the world where lack of performance really hurts), it will more often than not interact with more than one thread anyway.....
  22. I still find it quite hard to understand why "object pooling" would ever be a serious performance loss. I sure hope it is not due to the speed of reference counting.

    Java GC doesn't now nor ever did use reference counting. Reference counting doesn't work. There are too many cases were the references will not go to 0 and the memory will not be collected.
     So I would assume it would be due to synchronization issues mainly?

    Synchronization is but one issue. The real issue is that the cost of maintaining an object in a pool in heap space far out weighs the cost of creating and collecting it in young generation.
  23. Java GC[ Go to top ]

    I still find it quite hard to understand why "object pooling" would ever be a serious performance loss. I sure hope it is not due to the speed of reference counting.

    Java GC doesn't now nor ever did use reference counting. Reference counting doesn't work. There are too many cases were the references will not go to 0 and the memory will not be collected.
      If you were speaking of CORBA you'd be correct, however we are talking about Java and EJBs, and reference counting is most certainly used. This was one of the design goals of RMI, to provide a seamless experience for the programmer. From the link below, "RMI uses a reference-counting garbage collection algorithm similar to Modula-3's Network Objects. (See "Network Objects" by Birrell, Nelson, and Owicki, Digital Equipment Corporation Systems Research Center Technical Report 115, 1994.)" http://java.sun.com/j2se/1.4.2/docs/guide/rmi/spec/rmi-arch4.html Now, that's not to say there aren't issues with reference counting, and in fact that's why it was not part of CORBA, Michi Henning calls this "The Pacific Ocean Problem". I love this comment: "It was me. I routinely store my object references in the Pacific Ocean ;-) " http://groups.google.com/group/comp.object.corba/browse_thread/thread/21213b646372d8e8/587e80c9ecb7b7d5?hl=en&lnk=gst&q=pacific+ocean+problem#587e80c9ecb7b7d5
  24. I still find it quite hard to understand why "object pooling" would ever be a serious performance loss. I sure hope it is not due to the speed of reference counting. So I would assume it would be due to synchronization issues mainly? Oh and one more thing: I might be slightly confused there, but if I think about any decent real world gui component (the part of the world where lack of performance really hurts), it will more often than not interact with more than one thread anyway.....

    It all depends how you implement your pool. If you use a synchronized block to protect your pool while you push/pop from it, then you have a synchronization bottleneck. You will *definately* see this show up when you get hundreds of concurrent threads hitting the pool, but you may not see it if you don't.

    Somepeople use a ThreadLocal variable to "pool" their objects, this alleviates the synchronization bottleneck, but doesn't really do anything if you need multiple instances in the same thread. Even with a ThreadLocal, IIRC, with a microbench, allocating the object is like 100 times faster than dealing with a ThreadLocal (ThreadLocal is a hash lookup in java.lang.Thread).

    Bill
  25. It all depends how you implement your pool. If you use a synchronized block to protect your pool while you push/pop from it, then you have a synchronization bottleneck. You will *definately* see this show up when you get hundreds of concurrent threads hitting the pool, but you may not see it if you don't.Somepeople use a ThreadLocal variable to "pool" their objects, this alleviates the synchronization bottleneck, but doesn't really do anything if you need multiple instances in the same thread. Even with a ThreadLocal, IIRC, with a microbench, allocating the object is like 100 times faster than dealing with a ThreadLocal (ThreadLocal is a hash lookup in java.lang.Thread).Bill

    As someone else pointed out, the real penalty for object pooling is having lots of objects hanging around filling up the long-lived generational object space. Remember, when the long-lived space fills up, the JVM does a FULL GC, which is what you're seeing when your app just "stops" for 10 seconds (or longer, depending on how big your heap is).
  26. Pooling will certainly cost a lot more when old generation fills up and full GC is invoked. Hopefully, by that time, application would have enjoyed benefit of pooling. If a JVM of application allocates 40% heap for old generation, and it never really uses it to full extent, then full GC will never get triggered and pooling is FREE. Of course, it will be a serious trouble if JVM triggers only full GC. I think we can to use larger JVM (For ex. 2 GB is a large heap), use all newer GC algorithms like parallel GC, size the generations properly and use old generation (pooling) effectively. All J2EE servers have pooling capabilities and we should be able use them to a certain extent. I don’t think it is authors intension to refer pooling as a problem as given in the heading of this news.
  27. I don’t think it is authors intension to refer pooling as a problem as given in the heading of this news.

    The headline is a quote from the article.
  28. Escape Analysis[ Go to top ]

    One interesting thing that The author mentions in this article and others is escape analysis for possible stack allocation.

    However, the examples he gives are a little facile. I assume that the compiler must be able to detect whether the constuctor of Point is defined like this:

    public class Point {
      private static final List points = new ArrayList();

      private int x, y;

      public Point(int x, int y) {
        this.x = x;
        this.y = y;
        // gotta put all Points on the heap.
        points.add(this);
      }

      public Point(Point p)
      {
         this(p.x, p.y);
      }

      public int getX() { return x; }
      public int getY() { return y; }
    }

    Anyone know where I can find info on how Mustang deals with this?
  29. RE: Escape Analysis[ Go to top ]

    If my understanding of all this is correct, then your example should work out OK, because GC is based on "generations", and depending upon how long your "Point" example object has a reference to its new instance in the static ArrayList, the JVM will detect it as a long lived object and treat it differently.
    Of course, if it were really smart, it would detect that no code anywhere reads from this List, and so factor it out completely. :)
  30. RE: Escape Analysis[ Go to top ]

    If my understanding of all this is correct, then your example should work out OK, because GC is based on "generations", and depending upon how long your "Point" example object has a reference to its new instance in the static ArrayList, the JVM will detect it as a long lived object and treat it differently.Of course, if it were really smart, it would detect that no code anywhere reads from this List, and so factor it out completely. :)

    And it does! If the array is not reachable, then it is collectable. If it is reachable, then code can touch it. If no code does touches it, then my IDE complains.
  31. RE: Escape Analysis[ Go to top ]

    If my understanding of all this is correct, then your example should work out OK, because GC is based on "generations", and depending upon how long your "Point" example object has a reference to its new instance in the static ArrayList, the JVM will detect it as a long lived object and treat it differently.Of course, if it were really smart, it would detect that no code anywhere reads from this List, and so factor it out completely. :)

    Did you read the part of the article about stack-based allocations via escape analysis? The is no GC for the stack. The GC only looks at the heap.
  32. RE: Escape Analysis[ Go to top ]

    Did you read the part of the article about stack-based allocations via escape analysis? The is no GC for the stack. The GC only looks at the heap.
    Yes, and because you add a reference to a static List (and assuming that the List is used elsewhere in code), the system would be forced to not allocate the Point on the stack. The system could also be sophisticated enough to detect when the Point is removed from the static List, and if the Point is removed within the same stack frame as when it was added, and no reference is held that can outlive the stack frame, then it could still allocate the Point on the Stack. Even if it is not that intelligent at this point in time, that kind of intelligence could be engineered into a later JVM.
  33. RE: Escape Analysis[ Go to top ]

    Did you read the part of the article about stack-based allocations via escape analysis? The is no GC for the stack. The GC only looks at the heap.
    Yes, and because you add a reference to a static List (and assuming that the List is used elsewhere in code), the system would be forced to not allocate the Point on the stack.

    Obviously. And the examples don't make any mention of having to introspect the constructor. My question was is there a reference I can look to for information about how this is handled by Mustang I never asked 'whether it will work'.
    The system could also be sophisticated enough to detect when the Point is removed from the static List, and if the Point is removed within the same stack frame as when it was added, and no reference is held that can outlive the stack frame, then it could still allocate the Point on the Stack. Even if it is not that intelligent at this point in time, that kind of intelligence could be engineered into a later JVM.

    That seems rather pointless.
  34. xalan[ Go to top ]

    Currently Xalan ships (as of jdk 1.5) with a memory leak that can consume gigabites of memory in a few minutes.
    It is due to a static object pool of byte arrays.

    See . The bug is solved in the database but not in your jdk.

    Try this code on jdk 1.5 with an xml and an xsl at your choice

    -------------------------------------------------------
     public static void main (String[] args) throws Exception, TransformerFactoryConfigurationError
        {
            final TransformerFactory newInstance = TransformerFactory.newInstance();
            final Object monitor = new Object ();
            
            for (;;)
            {
                synchronized (monitor)
                {
    new Thread ()
    {
    public void run ()
    {
    try
    {
    synchronized (monitor)
    {
    Source xsl = new StreamSource ("prova.xsl");
    Transformer transformer = newInstance.newTransformer(xsl);
    Source xml = new StreamSource ("prova.xml");
    transformer.transform(xml, new StreamResult ("ouput.xml"));
    monitor.notify();
    }
    }
    catch (Exception e)
    {
    e.printStackTrace();
    System.exit(1);
    }
    }
    }.start();
    monitor.wait();
    Thread.yield();
                }
                
            }
        }

    -------------------------------------------------------
  35. xalan[ Go to top ]

    Currently Xalan ships (as of jdk 1.5) with a memory leak that can consume gigabites of memory in a few minutes.It is due to a static object pool of byte arrays.
    ...
    The bug is solved in the database but not in your jdk.

    What does this have to do with the article and what database?
  36. xalan[ Go to top ]

    The article title was "Object pooling is now a serious performance loss", wasn'it...

    The database is xalan bug database. The bug is at
    http://issues.apache.org/jira/browse/XALANJ-1844
  37. This is resolvable[ Go to top ]

    You can create the thread in a method instead. Then the memory is eventually released.

    Lots of wierd stuff occurs when you stuff it in the main method. So as a rule don't do it.
  38. JVM Performance[ Go to top ]

    In most cases I find that JVM performance is more than adequate.

    I can see definite slowdown in things like image manipulation (through JAI w/ out the native libraries for example). Memory can be a bit of a problem too, Java tends to use a bunch.

    But all in all, JVM performance is great (esp w/ 1.5). I tend to do a lot of threaded coding though.

    I don't thing swing is slow is an applicable arguement any longer. I am working on a swing program in excess of 40k lines of code. 99% of wait time is due to remote communications (SOAP & Database). The Swing portions of the program are very snappy (and rather pretty believe it or not).
  39. JVM Performance[ Go to top ]

    Agreed - Swing apps can both perform and look good.

    The single biggest performance gain seems to be loading resources (heavyweight or otherwise) in a background thread which has nothing to do with garbage allocation, or even java for that matter.

    I worked on a swing application which does a bunch of webservice interaction plus uses FOP for printing.

    Tossing resouce initialisation (FOP initialisation in particular is heavyweight - it seems to take a few seconds to parse the XSL and configure itself) into a low-priority thread removed perceived performance problems.

    Regards,
    Andrew.
  40. JVM Performance[ Go to top ]

    allocation

    *collection.
  41. Object pooling is now a serious performance loss

    This is so funny because it's exactly the opposite (and it has little to do with Brian's article)! Pre-JDK 1.5 going back at least a few version you couldn't create an object pool that was faster than "new." That changed with JDK 1.5 as the new concurrency libraries enable pools that best "new" even for the smallest of objects.

    The title of this article should be, "Object pooling is now a serious performance *gain*."

    That's not to say you should pool objects. Pooling isn't that much faster, and you have to worry about things like cleaning up state and memory leaks.
  42. Object pooling is now a serious performance loss
    This is so funny because it's exactly the opposite (and it has little to do with Brian's article)! Pre-JDK 1.5 going back at least a few version you couldn't create an object pool that was faster than "new." That changed with JDK 1.5 as the new concurrency libraries enable pools that best "new" even for the smallest of objects.The title of this article should be, "Object pooling is now a serious performance *gain*."That's not to say you should pool objects. Pooling isn't that much faster, and you have to worry about things like cleaning up state and memory leaks.

    Really? I dont see how that's possible when a HashMap lookup would take more than 10 instructions to execute, while the new wouldn't. Assuming the HashMap isn't synchronized, wouldn't creating the new object be faster? I believe he says it is in his article. Furthermore, I would think that a concurrent version of HashMap must be slower than the original implementation.

    Of course missing from this discussion is getting the object into the state that you desire. Sure I can create a new object in 10 instructions, but then I have to set all of its fields. Creating a new product to sell requires me to set its sku, name, description, etc. Looking it up in cache allows me to have all of its values set already.

    Then again, I dont think this is the type of object pooling the author is referring to. I think he's referring to caching things such as Strings, Integers, etc. I may be wrong though.
  43. Really? I dont see how that's possible when a HashMap lookup would take more than 10 instructions to execute, while the new wouldn't. Assuming the HashMap isn't synchronized, wouldn't creating the new object be faster? I believe he says it is in his article. Furthermore, I would think that a concurrent version of HashMap must be slower than the original implementation.Of course missing from this discussion is getting the object into the state that you desire. Sure I can create a new object in 10 instructions, but then I have to set all of its fields. Creating a new product to sell requires me to set its sku, name, description, etc. Looking it up in cache allows me to have all of its values set already.Then again, I dont think this is the type of object pooling the author is referring to. I think he's referring to caching things such as Strings, Integers, etc. I may be wrong though.

    I think you're mixing up pooling and caching. The author is talking about pooling, i.e. the JVM takes too long to create an object, so I'm going to hold onto an existing instance and reuse it. There's no need for a HashMap.
  44. I think you're mixing up pooling and caching. The author is talking about pooling, i.e. the JVM takes too long to create an object, so I'm going to hold onto an existing instance and reuse it. There's no need for a HashMap.

    And how is that distinct from holding onto Integer Objects for reuse?

    An Object pool for Integers doesn't really require any synchronization or concurrency control.

    Have you looked at the code in the concurrent package? There is a lot of overhead in those libraries. I fail to see how using a 'concurrent pool' will be faster than allocating Objects on th stack.
  45. There is a lot of overhead in those libraries.

    I've been surprised by how much faster things like CAS can be compared to synchronization.
    I fail to see how using a 'concurrent pool' will be faster than allocating Objects on th stack.

    Faster than a heap allocation not a stack allocation (well, actually I don't know for sure because I haven't tried JDK 1.6 yet).
  46. There is a lot of overhead in those libraries.
    I've been surprised by how much faster things like CAS can be compared to synchronization.

    The concurrent packages are only faster when you have a lot of concurrency. From a raw, per call (no contention) perspective, a synchronized Collection is faster than a concurrent one.
  47. The concurrent packages are only faster when you have a lot of concurrency. From a raw, per call (no contention) perspective, a synchronized Collection is faster than a concurrent one.

    Actually, that's wrong. Even when there's no contention, ConcurrentHashMap is still faster than Collections.synchronizedMap(new HashMap()) (by about 15-20% in my test).

    The thing many people don't recognize about synchronized HashMap is that the hashCode()/equals() calls on the keys are also effectively synchronized for the scope of the map. The performance of hashCode()/equals() can create a bottleneck. This isn't the case with ConcurrentHashMap. For example, in my AOP framework, switching to identity equality comparisons for the Method objects from Method.hashCode()/equals() resulted in a 3 fold performance increase overall.
  48. The concurrent packages are only faster when you have a lot of concurrency. From a raw, per call (no contention) perspective, a synchronized Collection is faster than a concurrent one.
    Actually, that's wrong. Even when there's no contention, ConcurrentHashMap is still faster than Collections.synchronizedMap(new HashMap()) (by about 15-20% in my test).

    Show me the test so I can try it.

    Anyone can claim a test shows anything.

    The rest of your post is irrelevant to my point.
  49. The rest of your post is irrelevant to my point.

    Maybe that's because it wasn't in response to your point? No need to get defensive. ;)
  50. The rest of your post is irrelevant to my point.
    Maybe that's because it wasn't in response to your point? No need to get defensive. ;)

    Sorry. I've had a lot of people responding to my posts in tones that I feel show an assumption that I am less knowledgeable that they are. Of course, I'm completely paranoid about that kind of thing.
  51. Actually, that's wrong. Even when there's no contention, ConcurrentHashMap is still faster than Collections.synchronizedMap(new HashMap()) (by about 15-20% in my test).

    I just ran a test that gave me that ConcurrentHashMap was 40% slower for simple inserts and lookups. I can provide the code if you want it.
  52. I can provide the code if you want it.

    Sure. Here's my code.


    package org.crazybob.pool;

    import java.lang.reflect.Method;
    import java.util.HashMap;
    import java.util.Collections;
    import java.util.Map;
    import java.util.concurrent.ConcurrentHashMap;

    public class MapTest {

      private static final int SIZE = 100000;
      private static final int ITERATIONS = 100;

      static void testMap(Map<Integer, String> map) {
        for (int i = 0; i < SIZE; i++) {
          map.put(i, "foo");
          map.get(i);
        }
      }

      public static void testSynchronizedHashMap() {
        testMap(Collections.synchronizedMap(new HashMap<Integer, String>()));
      }

      public static void testConcurrentHashMap() {
        testMap(new ConcurrentHashMap<Integer, String>());
      }

      static void run(Method method) throws Exception {
        // warm up.
        method.invoke(null);

        long start = System.nanoTime();
        for (int i = 0; i < ITERATIONS; i++) {
          method.invoke(null);
        }
        long newTime = System.nanoTime() - start;
        System.out.println(method.getName() + ": " + newTime / 1000000 + "ms");
      }

      public static void main(String[] args) throws Exception {
        run(MapTest.class.getMethod(args[0]));
      }
    }


    Here's the script I used to run it:


    MEM=32m

    RUN="java -classpath classes \
      -Xmx${MEM} -Xms${MEM} -server \
      org.crazybob.pool.MapTest"

    $RUN testConcurrentHashMap
    $RUN testSynchronizedHashMap


    And here are the results:


    testConcurrentHashMap: 9972ms
    testSynchronizedHashMap: 11826ms
  53. I just realized I am running 1.5.0-beta2 so my test results may change if I get a newer version. Anyway, here is the test code I use. I think it's a lot better than running each test in tandem. It tends to spread out anomalies between the different implementations. In any event, it produces extremely stable results. Note that there is a call to System.gc that really slows things down but it's in there to prevent the GC from skewing the results. It's probably not needed in this test (nothing to GC) so you can try commenting it out.

    import java.util.*;
    import java.util.regex.*;
    import java.util.concurrent.*;

    public class SpeedTester{
        public static final long OUTER_ITERATIONS = 10;
        public static final long INNER_ITERATIONS = 1000000;
        public static void main(String[] args) {
            Data[] data = {new RandomIntegers()};
            Test[] tests = {
                new ControlTest(),
                new MapTest(new HashMap(), "Unsynchronized"),
                new MapTest(Collections.synchronizedMap(new HashMap()), "Synchronized"),
                new MapTest(new ConcurrentHashMap(), "Concurrent")};
            long[] times = new long[tests.length];
            for (int j = 0; j < OUTER_ITERATIONS; j++)
            {
                for (int k = 0; k < data.length; k++)
                {
                    data[k].create();
                }
                for (int k = 0; k < tests.length; k++)
                {
                    System.gc();
                    times[k] += test(tests[k]);
                }
            }
            for (int j = 0; j < tests.length; j++)
            {
                System.out.println(tests[j].name() + ": " + times[j] + " - "
                    + ((double) times[j]) / (OUTER_ITERATIONS * INNER_ITERATIONS)
                    + " millis per test");
            }
        }
        public static long test(Test test)
        {
            long time;
            long start = System.currentTimeMillis();
            for (int j = 0; j < INNER_ITERATIONS; j++) test.test();
            
            time = System.currentTimeMillis() - start;
            
            if (time < 10) throw new RuntimeException("too few inner iterations");
            
            return time;
        }
    }

    interface Data
    {
        public void create();
    }

    interface Test
    {
        public void test();
        
        public String name();
    }

    class RandomIntegers implements Data
    {
        static final Random random = new Random();

        static Integer number;

        public void create()
        {
            number = new Integer(random.nextInt());
        }
    }

    class MapTest implements Test
    {
        final Map map;
        final String name;

        MapTest(final Map map, final String name)
        {
            this.map = map;
            this.name = name;
        }

        public void test()
        {
            map.put(RandomIntegers.number, "test");

            map.get(RandomIntegers.number);
        }

        public String name()
        {
            return name;
        }
    }

    class ControlTest implements Test
    {
        public String name()
        {
            return "Control";
        }
        
        public void test()
        {
            
        }
    }
  54. I just realized I am running 1.5.0-beta2 so my test results may change if I get a newer version. Anyway, here is the test code I use.

    Here are the results:

    Control: 368 - 3.68E-6 millis per test
    Unsynchronized: 13596 - 1.3596E-4 millis per test
    Synchronized: 16200 - 1.62E-4 millis per test
    Concurrent: 16155 - 1.6155E-4 millis per test

    java version "1.5.0_04"
  55. I think it's a lot better than running each test in tandem. It tends to spread out anomalies between the different implementations.

    I think it's better to run each test in a separate VM run so you know they're not impacting each other.
  56. I think it's a lot better than running each test in tandem. It tends to spread out anomalies between the different implementations.
    I think it's better to run each test in a separate VM run so you know they're not impacting each other.

    I'm not sure why that would be. We have server problems where VMs starve each other out all the time. Have you tried reversing the order of your tests in your script? I find that often changes the results of speed-testing code.

    Exactly how would running them in the same VM cause the tests to interfere with each other? Otherwise, how can you be sure they are on equal footing?
  57. We have server problems where VMs starve each other out all the time.

    Are you saying that one test VM run will starve out another? I don't think I understand.
    Have you tried reversing the order of your tests in your script? I find that often changes the results of speed-testing code.

    I've found this to happen when I alter the order of tests running in the same JVM, not across different JVM runs.
    Exactly how would running them in the same VM cause the tests to interfere with each other? Otherwise, how can you be sure they are on equal footing?

    Garbage collection, HotSpot, there are a number ways one test can impact another running in the same VM. You're sure they're on equal footing because you run the test multiple times with the same VM parameters.
  58. Exactly how would running them in the same VM cause the tests to interfere with each other? Otherwise, how can you be sure they are on equal footing?
    Garbage collection, HotSpot, there are a number ways one test can impact another running in the same VM. But in the tester code I have provided.
    You're sure they're on equal footing because you run the test multiple times with the same VM parameters.

    I ran your tests. I got your results. I got rid of intantiations and that eliminated a good part of the difference. I changed the VM parameter to Xms32M and that eliminated most of the difference. Then I changed the testMap to use random numbers instead of the loop counter and they produce basically the same with the sync coming out ahead slightly in most tests. Of course this is still with a BETA JVM so it's not a great environment.

    A lot of what your test was showing was GC, and reflection. Using a loop counter for the key isn't a great way to test HashMaps beause it eliminates most of the hash collisions. Apparently, the Concurent version preforms better when there are few collisions.
  59. Apparently, the Concurent version preforms better when there are few collisions.

    Actually, I think that's 'no collisions'. As long as the threshold is less than 1, inserting incremental Integers as keys will produce 0 hash collisions. And this makes sense, I think. You were suggesting that the Concurrent version doesn't lock the whole map on all inserts, right? And Collections.sycnronizedMap does.
  60. We have server problems where VMs starve each other out all the time.
    Are you saying that one test VM run will starve out another? I don't think I understand.

    I thought you were trying to run the tests concurrently at first. You're right, it doesn't apply.

    Here are the code changes:

    import java.lang.reflect.Method;
    import java.util.*;
    import java.util.Collections;
    import java.util.Map;
    import java.util.concurrent.ConcurrentHashMap;

    public class MapTest {

      private static final int SIZE = 100000;
      private static final int ITERATIONS = 100;

      static void testMap(Map<Integer, String> map) {
        Random random = new Random();

        for (int i = 0; i < SIZE; i++) {
          int r = random.nextInt();
          map.put(r, "foo");
          map.get(r);
        }
      }

      public static void testSynchronizedHashMap() {
        Map map = Collections.synchronizedMap(new HashMap<Integer, String>());
        
        // warm up.
        for (int i = 0; i < 1; i++) {
          testMap(map);
        }

        long start = System.nanoTime();

        for (int i = 0; i < ITERATIONS; i++) {
          testMap(map);
          map.clear();
        }

        long newTime = System.nanoTime() - start;
        System.out.println("Synchronized: " + newTime / 1000000 + "ms");
      }

      public static void testConcurrentHashMap() {
        Map map = new ConcurrentHashMap<Integer, String>();

        // warm up.
        for (int i = 0; i < 1; i++) {
          testMap(map);
        }

        long start = System.nanoTime();

        for (int i = 0; i < ITERATIONS; i++) {
          testMap(map);
          map.clear();
        }

        long newTime = System.nanoTime() - start;
        System.out.println("Concurrent: " + newTime / 1000000 + "ms");
      }

      public static void main(String[] args) throws Exception {
        if (args[0].startsWith("c")) testConcurrentHashMap();
        if (args[0].startsWith("s")) testSynchronizedHashMap();
      }
    }
  61. From a raw, per call (no contention) perspective, a synchronized Collection is faster than a concurrent one.

    Oops. Just noticed you said "Collection", and maps aren't collections. Still though, I don't buy your statement. Which collections and which operations? It depends.
  62. There is a lot of overhead in those libraries.

    I've been surprised by how much faster things like CAS can be compared to synchronization.

    Bob, that's because the JVM "cheats". It inlines it more aggressively than one would think possible (i.e. it knows what the CAS method does and just "pastes in" the appropriate machine code.)

    Peace,

    Cameron Purdy
    Tangosol Coherence: Clustered Transactional Caching
  63. First, in response to this:
    I think you're mixing up pooling and caching. The author is talking about pooling, i.e. the JVM takes too long to create an object, so I'm going to hold onto an existing instance and reuse it. There's no need for a HashMap.

    True, pooling doesn't necessarily require a HashMap. But I think my point remains relevant to the argument made in the article. How would you pool things like Integers, Longs, and Strings without a HashMap when they're immutable. You cant. And the author discusses how a HashMap lookup will be slower than allocating a new object.

    Furthmore, why pool these objects when the compiler can inline the call and get rid of the object allocation altogether? Even if the compiler can optimize the call to the pool, you still are accessing objects on the heap rather than the stack. More objects on the heap that are long-lived means increased garbage collection work.

    And the micro benchmarks that compare a synchronized HashMap to a ConcurrentHashMap have nothing to do with the aurthor's point: That a HashMap lookup is slower than object allocation. The fact that you have to call hashCode() and equals() to search through the structure to find the desired object means loading objects into registers and caches. All of this has to be slower than <code>new Integer( "5" ) </code>.
  64. True, pooling doesn't necessarily require a HashMap. But I think my point remains relevant to the argument made in the article. How would you pool things like Integers, Longs, and Strings without a HashMap when they're immutable. You cant. And the author discusses how a HashMap lookup will be slower than allocating a new object.Furthmore, why pool these objects when the compiler can inline the call and get rid of the object allocation altogether? Even if the compiler can optimize the call to the pool, you still are accessing objects on the heap rather than the stack. More objects on the heap that are long-lived means increased garbage collection work.And the micro benchmarks that compare a synchronized HashMap to a ConcurrentHashMap have nothing to do with the aurthor's point: That a HashMap lookup is slower than object allocation. The fact that you have to call hashCode() and equals() to search through the structure to find the desired object means loading objects into registers and caches. All of this has to be slower than <code>new Integer( "5" ) </code>.

    1. Brian doesn't make the point that, "a HashMap lookup is slower than object allocation."
    2. You're still talking about caching.
    3. The JVM already caches Strings, Integers, Longs, etc.
    4. Brian's point about object pooling was one small paragraph in a long article comparing Java performance to that of C++.

    The difference between pooling and caching (in this context) is that pooled objects don't have any state or identity. The only purpose of pooling is to prevent memory allocation (which used to be slow).

    Yes, Brian said, "object pooling is now a serious performance loss," but as you've eluded to, the target audience of that statement is developers who would use hash maps or create more objects in their pool implementation.
  65. 3. The JVM already caches Strings, Integers, Longs, etc.

    ? The JVM maintains a "String pool" but I'm not aware of anything similar for Integers, Longs or any of the primitive wrappers. Do you have any documentation to back this up?
  66. 3. The JVM already caches Strings, Integers, Longs, etc.
    ? The JVM maintains a "String pool" but I'm not aware of anything similar for Integers, Longs or any of the primitive wrappers. Do you have any documentation to back this up?

    Try this:

    Object a = 1, b = 1;
    System.out.println(a == b);

    You can also look at the source of Integer.valueOf(int). Notice they refer to it as a "cache" as opposed to a "pool."

    Bob

    P.S. If you're interested in an object pool that outperforms allocation on JDK 1.5 (even for Object), check out http://crazybob.org/pool.zip. I accomplish it by:

    1) extending the pooled class at runtime so the object you're pooling doubles as a node (so you don't cancel out your pool's benefit by creating another object)
    2) using a Treiber stack so there's no locking and it's memory cache-friendly

    I'll try to blog about it soon.
  67. You can also look at the source of Integer.valueOf(int). Notice they refer to it as a "cache" as opposed to a "pool."

    I didn't realize that was done. It's not really what I think of as a pool. It's a static set of Objects that are created whether they are needed or not.

    As far as calling it a cache vs. a pool, look at the JavaDocs for the String.interfn method. Notice they refer to it as a "pool" as a opposed to a "cache".
  68. It's not really what I think of as a pool.

    Or a cache, whatever you want to call it.
  69. If you're interested in an object pool that outperforms allocation on JDK 1.5 (even for Object), check out http://crazybob.org/pool.zip. I accomplish it by:1) extending the pooled class at runtime so the object you're pooling doubles as a node (so you don't cancel out your pool's benefit by creating another object)2) using a Treiber stack so there's no locking and it's memory cache-friendlyI'll try to blog about it soon.

    How do you eliminate the extra work for the GC?
  70. How do you eliminate the extra work for the GC?

    Extra work for the GC? It has to do work either way. The included performance test creates 1,000,000 objects 100 times so as to encompass garbage collection. The cool thing is as you add fields to the pooled class, the pooled version's run time stays the same, while the version that creates a new object every time takes longer and longer.
  71. How do you eliminate the extra work for the GC?
    Extra work for the GC? It has to do work either way.

    Did you read the topic article of this thread?

    "But allocation is only half of memory management -- deallocation is the other half. It turns out that for most objects, the direct garbage collection cost is -- zero. This is because a copying collector does not need to visit or copy dead objects, only live ones. So objects that become garbage shortly after allocation contribute no workload to the collection cycle."

    Long lived Objects (e.g. pooled Objects) create work for the GC.
  72. Did you read the topic article of this thread? "But allocation is only half of memory management -- deallocation is the other half. It turns out that for most objects, the direct garbage collection cost is -- zero. This is because a copying collector does not need to visit or copy dead objects, only live ones. So objects that become garbage shortly after allocation contribute no workload to the collection cycle. "Long lived Objects (e.g. pooled Objects) create work for the GC.

    Ha ha. Yeah, I read it. I actually shared this pool with Brian a month or so ago. Deallocation is only half of garbage collection--the garbage collector still has to determine if the object is live or not. Why don't you run it for yourself and see? Or you could just believe everything you read. ;)
  73. Ha ha. Yeah, I read it. I actually shared this pool with Brian a month or so ago. Deallocation is only half of garbage collection--the garbage collector still has to determine if the object is live or not. Why don't you run it for yourself and see? Or you could just believe everything you read. ;)

    No. I believe what makes sense.
  74. Details, details....[ Go to top ]

    "Deallocation is only half of garbage collection--the garbage collector still has to determine if the object is live or not." Well, I suppose one could write a garbage collector that has to visit every object, live or not. However, the most common implementation of the Java garbage collector only visits live objects. As stated in the post you were replying to: "... a copying collector does not need to visit or copy dead objects, only live ones." [I know this thread is kinda stale, but I hate to see myths pervade.]
  75. The cool thing is as you add fields to the pooled class, the pooled version's run time stays the same, while the version that creates a new object every time takes longer and longer.

    Again, your test is 'cooking the books'. What real scenario would that relate to? Change the code so that the Objects are not assigned to an array, remove the second for loop and the pool loses. I added a counter that used the value of i to prevent the loops from being compiled away.
  76. What real scenario would that relate to?

    Node objects in a shared concurrent collection?
    Change the code so that the Objects are not assigned to an array, remove the second for loop and the pool loses. I added a counter that used the value of i to prevent the loops from being compiled away.

    Yeah, I'm aware of that. That's why I said earlier, "[pooling is] faster than a heap allocation not a stack allocation," even though I haven't tried JDK 1.6 yet.

    Still, it's pretty cool because this wasn't even possible before 1.5.
  77. What real scenario would that relate to?

    Or Point objects in a 3D model! Though you're correct, in the vast majority of real world cases, objects are short lived.
  78. What real scenario would that relate to?

    Or Point objects in a 3D model! Though you're correct, in the vast majority of real world cases, objects are short lived.
    You don't want to pool your Point objects! Or at least not when escape analysis makes it into a future Java version because most likely the entire allocation will be factored out. See other article by Brian: http://www-128.ibm.com/developerworks/java/library/j-jtp09275.html
  79. Just ignore me[ Go to top ]

    Duh, forget I ever wrote that message ;-) Should learn to look at the date of the messages before replying.
  80. Again, your test is 'cooking the books'. What real scenario would that relate to? Change the code so that the Objects are not assigned to an array, remove the second for loop and the pool loses. I added a counter that used the value of i to prevent the loops from being compiled away.

    At the risk of further monopolizing this thread (I think everyone else has stopped listening anyway)...

    We've established that for short-lived empty objects, new allocation beats out pooling. What happens if we add some substance to the object we're pooling? At what point do the scales tip back toward pooling?

    I've added an int array of size M and a couple methods to the test class:
      static class Foo {

        final int[] ints = new int[M];

        /** Does something so we don't get optimized away. */
        void set(int i) {
          ints[i % M] = i;
        }

        /** Makes me like new again. */
        void clear() {
          for (int j = 0; j < M; j++)
            ints[j] = 0;
        }
      }

    Here's the new allocation test:
      for (int i = 0; i < SIZE; i++) {
        Foo foo = new Foo();
        foo.set(i);
      }

    And here's the pool test:
      for (int i = 0; i < SIZE; i++) {
        Foo foo = fooPool.acquire();
        foo.set(i);
        // so we're fair...
        foo.clear();
        fooPool.release(foo);
      }

    On my machine, the tables turn when M ~= 25 (~100 bytes). That's pretty low in my opinion. I definitely would consider it among "the most heavyweight of objects."

    Bob
  81. I definitely would consider it among "the most heavyweight of objects."

    I mean, I definitely wouldn't consider it among "the most heavyweight of objects."
  82. On my machine, the tables turn when M ~= 25 (~100 bytes). That's pretty low in my opinion. I definitely would consider it among "the most heavyweight of objects."Bob

    I've gotta give it to you. I was pretty doutful but I get the flip just past 10. When I add multiple threads, I can't get the NEW version to win, even with M = 1. This result suprised me but the more I think about it, it makes sense.

    The results of this throw a good bit of doubt on the claims of the article. It would be interesting to see the effect of stack based allocations on these tests.

    Of course, this is all academic for me now. I'm not looking to squeeze every bit of speed out of my code. But I guess for those that are, the question is whether you want to use pooling and potentially lose out on improvements later (or refactor.)
  83. Of course, this is all academic for me now. I'm not looking to squeeze every bit of speed out of my code. But I guess for those that are, the question is whether you want to use pooling and potentially lose out on improvements later (or refactor.)

    Ditto. I can't think of anywhere I'd actually use this. Maybe if I had some code that created tons of StringBuffers or ArrayLists or something...

    It would be nice if the JVM could detect that it's frequently creating a certain type of object and optimize accordingly (i.e. rather than deallocating and reallocating memory, simply re-initialize an existing object).
  84. difference bet. caching and pooling[ Go to top ]

    so, finally what is the difference between caching and pooling ? When should i still use HashMap?? I am a bit confoosed :)
  85. difference bet. caching and pooling[ Go to top ]

    caching refers to data, where pooling refers to an activity on an object.

    so, finally what is the difference between caching and pooling ? When should i still use HashMap?? I am a bit confoosed :)
  86. difference bet. caching and pooling[ Go to top ]

    caching refers to data, where pooling refers to an activity on an object.

    www.binaryfrost.com

    so, finally what is the difference between caching and pooling ? When should i still use HashMap?? I am a bit confoosed :)
  87. lol[ Go to top ]

    lol. my post was more like a vent really :)
    apologies
  88. the tables turn when M ~= 25 (~100 bytes). That's pretty low in my opinion.

    Is this an isolated test. You may find things to be different when you have a very large system that needs to create objects constantly and quickly. By keeping pools of objects around, you could be making harder for other parts of your system to quickly grab new allocations. Also, in an isolated test there is not much garbage collection going (no really challenges with long lived objects). Of course is hard to benchmark in a real world application where a lot of "stuff" is going on.

    R.
  89. hop in the pool ;-)[ Go to top ]

    True, pooling doesn't necessarily require a HashMap. But I think my point remains relevant to the argument made in the article. How would you pool things like Integers, Longs, and Strings without a HashMap when they're immutable. You cant.

    Actually, it's even easier if they're immutable. Consider the function:
    public static Integer makeInteger(int i) {
        return new Integer(i);
    }

    Let's assume that most of the values of "i" at runtime are in the range 0..small.
    public static Integer makeInteger(int i) {
        return (n & 0xFFFFFF00) == 0 ? INTEGERS[n] : new Integer(n);
    }
    private static final Integer[] INTEGERS = new Integer[256];
    static {
    for (int i = 0, c = INTEGERS.length; i < c; ++i) {
        INTEGERS[i] = new Integer(i);
        }
    }

    Let's assume that there's some possibility of a repeat hit:
    public static Integer makeInteger(int i) {
        Integer I = INTEGERS[n & 0xFFFFFF00];
        if (I == null || i != I.intValue()) {
            INTEGERS[n & 0xFFFFFF00] = I = new Integer(i);
        }
        return I;
    }
    private static final Integer[] INTEGERS = new Integer[256];

    Note that this is thread safe. No synchronization, CAS, no nuttin. But thread safe ;-)
    And the author discusses how a HashMap lookup will be slower than allocating a new object.

    It _might_ be, in the future. As of testing I did on 1.5, it's still faster in most cases to pool (intelligently).

    Peace,

    Cameron Purdy
    Tangosol Coherence: The Java Data Grid
  90. ORA-01555[ Go to top ]

    This is funny too... why still we have very bad well-known frameworks such as EHCache and OSCache? Who really uses Java 5? http://jira.opensymphony.com/browse/CACHE-233 - this bug is still open! Why can't we use different Concurrency Strategy instead of "Object Pooling" and "Problem Solving"? ORA-01555: snapshot too old
  91. I am not sure whether the results of the tests the author referred to were accurate. I saw some other results (about 2-2.5 years ago) that showed that object instantiation usually takes much more time than extracting an object from the pool (the factor was around 100). The sizes of the object pools can be configured if there are problems with memory and can be set based on the collected statistics on objects usage. If the application response time is not crutial, you probably can live with object instantiation as necessary. But do not forget that this implies garbage collection and because of that results in some unpredictable behavior of the application. If you need predictable response time and have strict time constarints, you better use object pools to avoid delays on object instantiation. As to comparison of malloc/dealloc calls in C/C++ with new in Java. I think that the comparison done was not fair to C/C++. New in Java deals with the heap which is much smaller than actual computer memory size. Traversing a smaller list to find the necessary chunk of memory for the object is obviously faster than scanning all the memory.
  92. jvm grief ?[ Go to top ]

    as oldies, so far serverside quite excellent (although not much competitor for this nowadays, as other non-java based serverside is loosing the support). for clientside which is most troublematic as it's swing apps ("multi-platform") always has competitor from native apps(specific one). the questionable one is whether JVM now has able to shared multi apps on single jvm through parameter. there's one two JVM now heard still in the labs :) for this but standard one, do not think it's out yet.Most probably sun forget/ignore/low priority client apps so this feature never go out, well .... sure true for serverside this is feature is important but not top notch emergency issue :D, and ok we have lot more bug/feature for other java subject.....which is very very broad nowadays from jse,jee,jme,jwsdp and now jfx :D good thing also learn one language for all but might interesting also to see other language for specific reason :D although i might be lazy also ....
  93. Re: jvm grief ?[ Go to top ]

    look the object pooling is not an over kill but is a very good thing. The issue is now a days most of the frameworks dont have a good implementation of it. Following is what i think should be used. We never create any object directly. Always borrow it from the pool. The pool keep a MAX_ALLOWED_OBJETCS_IN_POOL and MIN_ACTIVE_OBJECTS_IN_POOL watch. Our pool is configurable IF the MAX_ALLOWED_OBJETCS_IN_POOL is reached it creates new objects (uses an algo to decide how many) and pushes in the pool. We borrow objects use them and return to pool. Simple. The good thing is the pool returns us a weak reference when we borrow objects not a strong reference.After some specific amount of time a check thread runs on the pool and cleans up the objects that are stale(again decide by an algo, lets say objects last accessed some 30 minutes before, considered stale). Cleans up means it nullifies the reference. Now the point to note is that only the references in the pool are strong references and once they are nullified the object is marked for GC. All other borrowed references are weak references. This idea can be bettered but its just a starter.