Java Collections are efficient enough, apparently

Home

News: Java Collections are efficient enough, apparently

  1. Kairi Kangro has written "On Java Collection Waste," an article in which he examines the wasted heap associated with the standard Java collections library. Short form: they work pretty well.

    However, there're a couple of problems with the post. Mr. Kangro abandoned the (implied) hypothesis that the collections were wasteful, partly because Java's garbage collector ran multiple times, cleaning up the heap before actual effects could be accurately measured; that doesn't sound like a validation of the collections framework, but more proof that the JVM works well such that the collections don't need to be spectacularly efficient.

    That's not the same thing as "the collections aren't wasteful," or even "the collections aren't wasteful enough to worry about.."

    It's an interesting thought. How would you accurately measure the waste associated with collections?

    One consideration is that the collections do different things; you'd want to have a measurement for Sets, Lists, and Maps; then you'd want to have a way to populate the collections both sparsely and abundantly, perhaps changing from one state to the other. You'd also want to do so multiple times, to let HotSpot do its work - and with multiple garbage collectors (and potentially multiple JVMs.)

    There's a lot of possibility here, even though it might imply some work; who knows, the JDK may already have such measurements.

    With all that said, it sure would be useful to have an assertion and related validation (or counterproof) publicly available.

    Threaded Messages (12)

  2. Value Types Change Everything[ Go to top ]

    Can't we just speed *every* collection up by using value types? I believe Azul and IBM have competing strategies for shoehorning them into the language. Java 9 maybe? Doesn't that just end the whole discussion?

    https://blogs.oracle.com/jrose/entry/value_types_in_the_vm 

  3. How value types change the inner structure of, for example, HashSet? Where it was sparse before, with reference types, it will remain sparse, with value types.

  4. Your approach is flawed though I can well understand how it might look appealing to others less experienced. First, it is not the developer that needs to be informed of waste but instead the software itself. Getting a developer to act on such notification and then change the code it SLOW....in fact incredibly SLOW even in these days of continuous agility of what not. Not only is it slow is presents a risk...its a change...and one that might have a very short life span once something else changes.

    Take the following code.

        final Path2D path = new Path2D.Double(Path2D.WIND_NON_ZERO, 2);

        path.moveTo(x + 4, y + 16);

        path.lineTo(x + 9, y + 16);

        path.lineTo(x + 16, y + 22);

        path.lineTo(x + 23, y + 16);

        path.lineTo(x + 28, y + 16);

    Now as performance engineering looking at the source code that 2 is wrong (irrespective of the rendering) because I know that the parameter is a capacity value for the internal double[] array maintained by the path that is going to be resized once the following statements are executed. Now why can't that software see this? I am not talking about the IDE but the actual Path class itself? Why can't it see ahead in the source code much like I am doing and set up it's capacity accordingly. Asking a developer to set the capacity was the original sin followed by another sin which is someone like yourself proposing that we be notified of this fact thousands (or millions) times across an entire code base and I have not even got down to the instance level issues in all of this.

    What is wrong is that the class (just like many of the collection classes) is not adaptive...not adaptive within a enclosing (and hierarchial) conversational context that would allow it to predict based on past execution flows/paths. For this to be the case we need something like Signals and we need it baked into the JVM runtime. It is unlikely to ever happen but that does not mean it is not the right solution...in fact I think it is probably the only right solution up to this point.

    http://www.jinspired.com/site/introducing-signals-the-next-big-thing-in-application-management

  5. Figures...[ Go to top ]

    Leave it to Louth to confuse everbody. ><

  6. Value Types Change Everything[ Go to top ]

    Your approach is flawed though I can well understand how it might look appealing to others less experienced. First, it is not the developer that needs to be informed of waste but instead the software itself. Getting a developer to act on such notification and then change the code it SLOW....in fact incredibly SLOW even in these days of continuous agility of what not. Not only is it slow is presents a risk...its a change...and one that might have a very short life span once something else changes.

    Take the following code.

        final Path2D path = new Path2D.Double(Path2D.WIND_NON_ZERO, 2);

        path.moveTo(x + 4, y + 16);

        path.lineTo(x + 9, y + 16);

        path.lineTo(x + 16, y + 22);

        path.lineTo(x + 23, y + 16);

        path.lineTo(x + 28, y + 16);

    Now as performance engineering looking at the source code that 2 is wrong (irrespective of the rendering) because I know that the parameter is a capacity value for the internal double[] array maintained by the path that is going to be resized once the following statements are executed. Now why can't that software see this? I am not talking about the IDE but the actual Path class itself? Why can't it see ahead in the source code much like I am doing and set up it's capacity accordingly. Asking a developer to set the capacity was the original sin followed by another sin which is someone like yourself proposing that we be notified of this fact thousands (or millions) times across an entire code base and I have not even got down to the instance level issues in all of this.

    What is wrong is that the class (just like many of the collection classes) is not adaptive...not adaptive within a enclosing (and hierarchial) conversational context that would allow it to predict based on past execution flows/paths. For this to be the case we need something like Signals and we need it baked into the JVM runtime. It is unlikely to ever happen but that does not mean it is not the right solution...in fact I think it is probably the only right solution up to this point.

    http://www.jinspired.com/site/introducing-signals-the-next-big-thing-in-application-management

    i hope you are not a teacher... i will drop out on day one. what a confuing explanation(if it can be called explanation)

  7. Value Types Change Everything[ Go to top ]

    I assume you did not bother to click the link. I knew the Path2D reference was going to throw some people ;-)

  8. Value Types Change Everything[ Go to top ]

    allow me to try again...

    Caller: Path2D create thy self...

    Path2D: Who is it that has called forth me?

    Caller: It is me the one that will subsequently call moveTo() once and lineTo four times.

    Path2D: Oh it is you...master of 5. I have prepared everything to your liken and wait in anticipation to serve your needs.

    Caller: Great. moveTo, now lineTo, then lineTo,,....

    Path2D: I've done all that you have asked of me and I never created any additonal and unneccesary allocation subsequent to my summonsing (construction).

    In the above dialog the caller (master) never said up-front what he would need or do.

    Now the trick here is how does the servant see the enclosing conversational context and use this to predict (anticipation) call sequencing.

  9. Nice try[ Go to top ]

    Much better than your last one :)

  10. Value Types Change Everything[ Go to top ]

    I think it is pretty clear that without getting full request from the client we wont be able to efficiently fulfill the operation. In this case the operation is to create a 2D path.

    Unfortunately this is a much bigger problem that is not very well handled by current languages that dont have built in support for transactional requests that spawn multiple methods and objects.

     

     

     

  11. Value Types Change Everything[ Go to top ]

    Not exactly sure what is meant by "transactional" requests but could it be very much related to change sets within some tagged (identifiable) conversational scope that consist of the methods called and measurements performed. If so then this is already available for Java today.

    http://www.jinspired.com/site/going-beyond-state-structure-reflection-in-java-with-behavioral-tracking

    http://www.jinspired.com/site/from-anomaly-detection-to-root-cause-analysis-via-self-observation

    Signals also has a similar savepoint and change set generation support: 

     

  12. Value Types Change Everything[ Go to top ]

    Transactions come into a play when you need to handle errors during the execution and  notify all involved object that the request has been dispatched to all the parties and is now the time to execute it.
    Lets take the following request flow:

    client execute(){
        object1.execute1()
        object1.execute2(){
            object11.execute111()
            object11.execute112()
        }
       object2.execute21()
    }


    Now, how does "object11" know that "object2.execute21()" has been called before it goes and fulfills "object11.execute111()" method? With stack traces it wont be able since object2.execute21() is called after "object11.execute111()".
    Or how will "object11.execute111()" know not to fulfill its request when "object2.execute21()"  fails to fulfill its request and "client.execute()" decides to cancel its request.

    To mitigate this we need to add some type of 2-phase commit transaction support.
    Here the execution flow rewritten with transactions in mind.


    client execute(){
        object1.execute1()
        object1.execute2(){
            object11.execute111()
            object11.execute112()
            object11.prepare()
            object11.commit() (object11.rollback()in case the whole request fails at any point)
        }
        object1.prepare()
        object1.commit() (or object1.rollback() in case the whole request fails at any point)

        object2.execute21()
        object2.prepare()
        object2.commit() (or object2.rollback() in case the whole request fails at any point)
    }


    We need "prepare" method in order to let all involving object to know that we are about to commit and they have last chance to decide if they can fulfill the request and error out if needed. During this method call we can pass the whole conversational context and objects involved to every involved object.
    The "rollback" will happen when any of involved object errors out during "execute*" or "prepare" calls.
    The "commit" will finalize the request and allow each involved object to take actual actions for all of their "execute*" calls (instead of each "execute*" at a time). This method can also take the execution context and be adaptable by knowing all the objects and methods that participate in the whole request.

    By having transactions we take care of efficient request handling by not taking actions on each "execute*" method, pass the whole execution context for more intelligent handling of the request by individual components and be able to handle cancelation or failures at each execution point.

     

     

    Transactions come into a play when you need to handle errors during the execution and  notify all involved object that the request has been dispatched to all the parties and is now the time to execute it.
    Lets take the following request flow:
  13. Value Types Change Everything[ Go to top ]

    I am sure this will piss of a few people here but I already solved the awareness (did method X happen yet) problem in 2010 using Current, SavePoint and ChangeSet interfaces.

    http://www.jinspired.com/research/self-observing-software

    On prepare and commits this is standard (but not pretty) OTS stuff we've done decades ago in CORBA using Resource and ResourceManager interfaces. Of course treating the JVM heap as a co-ordinated resource is an entirely different and currently impractical endeavor.