Would you dare to change HashMap implementation?

Home

News: Would you dare to change HashMap implementation?

  1. You can take a simple code snippet creating a HashMap with 10,000,000 elements and run it with a <100m heap. Lo and behold will you be surprised if you compare the results in JDK 7u25 against the next, JDK 7u40 release.

    In the u40 the JDK engineers have changed the Hashmap(initialCapacity, loadFactor) constructor, which now ignores your will to construct a HashMap with the initial size being equal to initialCapacity. Instead, you see the underlying array being allocated lazily only when the first put() method is called on the map. 

    A seemingly very reasonable change – JVM is lazy by nature in different aspects, so why not postpone the allocation of large data structures until the need for such allocation becomes imminent. So in that sense a good call.

    In the sense that a particular application was performing tricks via reflection and directly accessing the internal structures of the Map implementations – maybe not. But again, one should not bypass the API and start being clever, so maybe the particular developer is now a bit more convinced that each newly found concept is not applicable everywhere.

    Would you have made the change yourself if you were the API developer? I am not convinced I would have had the guts, knowing that there has to be around bazillion apps out there depending on all kind of weird aspects of the implementation.

    If you are interested to see the full post about the case study, check out the original post.



  2. In a fairly large scale study, we realized that a startling number of hash map never had a single piece of data stored in them, so to reduce application memory footprint (in some cases, by close to 20%), we decided to lazily instantiate the "buckets" array storage that backs the hash map. This was truly a profile-driven optimization.

    Peace,

    Cameron Purdy | Oracle

    For  the sake of full disclosure, I work at Oracle. The opinions and  views  expressed in this post are my own, and do not necessarily  reflect  the opinions or views of my employer.

  3. Can you publish the study and confirm it was not indeed just the WebLogic code base? ;-).

    I think both approaches (old and new) are wrong...you probably expected me to say that. But I did write about such non-adaptive mechanisms and offered you a trully innovative approach to solving this and much more.

    http://www.jinspired.com/site/introducing-signals-the-next-big-thing-in-application-management#software_signals

  4. William -

    Can you publish the study and confirm it was not indeed just the WebLogic code base? ;-).

    * I cannot publish the study (unfortunately, it was internal only).

    * No, it was not the WebLogic code base. However, I know that at least one of the applications was running on top of WebLogic; however, it was not WebLogic that had all those HashMaps ;-).

    Peace,

    Cameron.

  5. Yes, I have seen JavaOne presentation talking about, what I believe, the same study you refer to. My only concern is that now the overhead of allocation that inner array occurs at some other time, not when developers used to expect it. Other than that, great change :) And it is really nice, that you put effort in these things.

  6. Yes, I have seen JavaOne presentation talking about, what I believe, the same study you refer to. My only concern is that now the overhead of allocation that inner array occurs at some other time, not when developers used to expect it. Other than that, great change :) And it is really nice, that you put effort in these things.

  7. ArrayList and other collections have the same issue. Probably all of default constructors should be lazy but it should be better to use Collections.emptyMap()  for this particular use case :)
     

  8. I didn't know about this change, but think it's great, especially for JPA-based applications.

    Indeed, most JPA entities hold Sets, initialized with a new HashSet by default (which internally uses a HashMap) in case they're new transient entities that must be populated and then saved.

    But most of the time, these entities are created and populated by the JPA engine, and their collections are immediately replaced by collection proxies using reflection. So a whole lot of unnecessary garbage should be avoided in JPA applications thanks to this change.

    It would be interesting to measure if that has a significant impact on performance and GC, though.