Discussions

News: An Introduction to Java Map Collection Classes

  1. Do you have bottlenecks in your application due to Maps? Jack Shirazi, author of O'Reilly's "Java Performance Tuning", discusses the basics of one the most commonly used collection type "Maps" and how to optimize Maps for your application specific data. [15-Sept-2004]
    Which map should you use? Should it be synchronized or not synchronized? These are perhaps the two most important questions to answer to establish optimal performance for your application. Add in sizing the Map and choosing a load factor, and that pretty much covers Map tuning options if you are using one of the available general purpose Maps.
    Read Jack Shirazi's article

    Threaded Messages (12)

  2. The article looks like a great overview of using various Maps.

    Regarding the synchronization issue, we've done some testing on the latest JDK and there was literally zero performance difference between using HashMap and Hashtable, meaning that the JVMs are getting better and better doing synchronization, and knowing when it isn't necessary. (As an aside, we tested on both single- and multi-CPU machines -- same results.)

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  3. Comparing HashMap and Hashtable to determine the cost of synchronization is not the best idea as they use differen alorithms. While Hashtable performas better in a few cases like using ascending numbers as keys, HashMap has usually a better overall performance and handles "weak" hashCode() implentations much better. So depending on the test the better performance of the Hashtable's algorithm might make the overhead of synchronization seem smaller than it actually is (of course it might the other way round too)
  4. "zero performance difference between using HashMap and Hashtable"

    Which isn't the case if you were testing concurrent access or update from multiple threads, where the performance is dramatically different.
  5. Which isn't the case if you were testing concurrent access or update from multiple threads, where the performance is dramatically different.
    I think you'd be surprised if you tested it .. ;-)

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  6. Fastutil[ Go to top ]

    Stephanio Vigna's Fastutil package is so much better if you need to:

    - save memory: 2..6 times smaller memory consumption
    - and get the faster operation that comes with the previous clause..

    Some results (all run with -server -Xms128m -Xmx128m 1.4.2_05 P4 1,6 laptop wXP)

    Test1:
    Time to count 1783293664 using toArray is 401 (ms) with java.util.HashMap
    Time to count 1783293664 iterating is 411 (ms) with java.util.HashMap
    Time to count 1783293664 using toArray (ex-creation) is 180 (ms) with java.util.HashMap
    Time to count 1783293664 iterating (ex-creation) is 460 (ms) with java.util.HashMap
    Time to count 1783293664 using toArray is 511 (ms) with java.util.LinkedHashMap
    Time to count 1783293664 iterating is 390 (ms) with java.util.LinkedHashMap
    Time to count 1783293664 using toArray (ex-creation) is 200 (ms) with java.util.LinkedHashMap
    Time to count 1783293664 iterating (ex-creation) is 380 (ms) with java.util.LinkedHashMap

    Time to count 1783293664 using toArray is 90 (ms) with it.unimi.dsi.fastutil.ints.Int2IntAVLTreeMap
    Time to count 1783293664 iterating is 170 (ms) with it.unimi.dsi.fastutil.ints.Int2IntAVLTreeMap
    Time to count 1783293664 using toArray (ex-creation) is 21 (ms) with it.unimi.dsi.fastutil.ints.Int2IntAVLTreeMap
    Time to count 1783293664 iterating (ex-creation) is 270 (ms) with it.unimi.dsi.fastutil.ints.Int2IntAVLTreeMap
    Time to count 1783293664 using toArray is 221 (ms) with it.unimi.dsi.fastutil.ints.Int2IntRBTreeMap
    Time to count 1783293664 iterating is 271 (ms) with it.unimi.dsi.fastutil.ints.Int2IntRBTreeMap
    Time to count 1783293664 using toArray (ex-creation) is 70 (ms) with it.unimi.dsi.fastutil.ints.Int2IntRBTreeMap
    Time to count 1783293664 iterating (ex-creation) is 270 (ms) with it.unimi.dsi.fastutil.ints.Int2IntRBTreeMap
    Test2: (changed the containsValue loop to 5x smaller to speed the test a bit)
    704982704 get() in time 30 (ms) for class java.util.LinkedHashMap
    704982704 containsKey() in time 40 (ms) for class java.util.LinkedHashMap
    199990000 containsValue()/100 in time 7020 (ms) for class java.util.LinkedHashMap
    HashMap's containsValue was so slow that my patience wore out.

    704982704 get() in time 40 (ms) for class it.unimi.dsi.fastutil.ints.Int2IntAVLTreeMap
    704982704 containsKey() in time 30 (ms) for class it.unimi.dsi.fastutil.ints.Int2IntAVLTreeMap
    199990000 containsValue()/100 in time 5237 (ms) for class it.unimi.dsi.fastutil.ints.Int2IntAVLTreeMap
    704982704 get() in time 50 (ms) for class it.unimi.dsi.fastutil.ints.Int2IntRBTreeMap
    704982704 containsKey() in time 60 (ms) for class it.unimi.dsi.fastutil.ints.Int2IntRBTreeMap
    199990000 containsValue()/100 in time 7631 (ms) for class it.unimi.dsi.fastutil.ints.Int2IntRBTreeMap
    Test3:
    Time to populate with presize: false is 1492 (ms) with java.util.HashMap
    Time to populate with presize: true is 1011 (ms) with java.util.HashMap
    (presize and loadfactor not available for Int2IntAVLTreeMap)
    Time to populate no presize: is 1212 (ms) with it.unimi.dsi.fastutil.ints.Int2IntAVLTreeMap
  7. Fastutil[ Go to top ]

    for non-concurrent Map util, check out Trove. It simply rocks!

    for concurrent Map util, just follow the master

    what would be cooler is to have concurrent-primitive-collections API, as fast as Trove and as concurrent as Doug Lea's util package.

    -talip
  8. The article mentions about the internals of how the elements are stored. It mentions that elements are stored in an array, the index at which an element is stored is the [abs(hashcode for the object) % array length]. And in case the index turns out to be the same for multiple values, these multiple values are stored in a linked list at the same index.
    Would it not be efficient to store these multiple values in a B-tree at the given index?

    Also, since we would need a big enough array so that there are least collisions for a given array index.
    Considering both the the space and the time required for searching would it not be a batter idea to implement Maps as an array of balanced binary trees?
  9. While collisions are somthing common the cases when you have more than 2 or 3 elements at the same index are quite rare (or you have a real problem with your hashCode() implementation). So the it is not really worth it trying being smart and probably faster to use "brute force".
  10. Take a look at TreeMap
  11. The article mentions about the internals of how the elements are stored.
    Yup. Also you can see for yourself; the source comes with the JDK. :-)
    It mentions that elements are stored in an array, the index at which an element is stored is the [abs(hashcode for the object) % array length].
    Something like that, depending on the JDK version and which hashed structure you are talking about. It actually _should_ have been:

    (int) ((((long) hashcodefortheobject) & 0xFFFFFFFFL) % ((long) arraylength));
    And in case the index turns out to be the same for multiple values, these multiple values are stored in a linked list at the same index.
    Yes, that's called an "hashing with overflow" algorithm.
    Would it not be efficient to store these multiple values in a B-tree at the given index?
    No, because the only thing that can be b-treed is the hashcode (because the objects can't be assumed to be comparable) and the hashcode might very well be the same hashcode if you have collisions.

    Furthermore, to reduce collisions, it is far better to extend the size of the bucket array, and to use a prime modulo (the size of the array.)
    Also, since we would need a big enough array so that there are least collisions for a given array index.
    Exactly.
    Considering both the the space and the time required for searching would it not be a batter idea to implement Maps as an array of balanced binary trees?
    That is what TreeMap is. It uses a red/black algorithm. However, a hashed algorithm is O(1) for lookups, while a balanced tree is O(logN), so it's better (as in faster) for most cases to use a hashed approach.

    Oleg blogged a bit on this subject:

    http://www.javablogs.com/views/ViewBlog.action?id=12415

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  12. Great article, Great questions, Great replies..Thanks to all :)
  13. Good article[ Go to top ]

    From my experience, this is a good article about maps and the issues involved.

    For anyone looking for more unusual or specialised map implementations, or if you wish there was a mapIterator() method on Map, you might want to look at Jakarta Commons Collections' map package:
    http://jakarta.apache.org/commons/collections/apidocs-COLLECTIONS_3_1/index.html

    Stephen Colebourne
    Committer, Jakarta Commons