Which map should you use? Should it be synchronized or not synchronized? These are perhaps the two most important questions to answer to establish optimal performance for your application. Add in sizing the Map and choosing a load factor, and that pretty much covers Map tuning options if you are using one of the available general purpose Maps.Read Jack Shirazi's article
-
An Introduction to Java Map Collection Classes (12 messages)
- Posted by: Sudhakar Ramakrishnan
- Posted on: September 15 2004 17:26 EDT
Do you have bottlenecks in your application due to Maps? Jack Shirazi, author of O'Reilly's "Java Performance Tuning", discusses the basics of one the most commonly used collection type "Maps" and how to optimize Maps for your application specific data. [15-Sept-2004]Threaded Messages (12)
- An Introduction to Java Map Collection Classes by Cameron Purdy on September 17 2004 11:37 EDT
- An Introduction to Java Map Collection Classes by irenicus irenicus on September 17 2004 14:45 EDT
- An Introduction to Java Map Collection Classes by Albert Johns on September 19 2004 05:30 EDT
-
An Introduction to Java Map Collection Classes by Cameron Purdy on September 19 2004 02:47 EDT
-
Fastutil by Nipsu on September 19 2004 04:18 EDT
- Fastutil by Talip Ozturk on September 20 2004 11:00 EDT
-
Fastutil by Nipsu on September 19 2004 04:18 EDT
-
An Introduction to Java Map Collection Classes by Cameron Purdy on September 19 2004 02:47 EDT
- RE: An Introduction to Java Map Collection Classes by Amit Kulkarni on September 17 2004 15:22 EDT
- RE: An Introduction to Java Map Collection Classes by irenicus irenicus on September 17 2004 16:11 EDT
- RE: An Introduction to Java Map Collection Classes by Raman Raja on September 17 2004 16:47 EDT
- RE: An Introduction to Java Map Collection Classes by Cameron Purdy on September 17 2004 19:41 EDT
- RE: An Introduction to Java Map Collection Classes by Srihari S on September 18 2004 10:29 EDT
- Good article by Stephen Colebourne on September 20 2004 08:43 EDT
-
An Introduction to Java Map Collection Classes[ Go to top ]
- Posted by: Cameron Purdy
- Posted on: September 17 2004 11:37 EDT
- in response to Sudhakar Ramakrishnan
The article looks like a great overview of using various Maps.
Regarding the synchronization issue, we've done some testing on the latest JDK and there was literally zero performance difference between using HashMap and Hashtable, meaning that the JVMs are getting better and better doing synchronization, and knowing when it isn't necessary. (As an aside, we tested on both single- and multi-CPU machines -- same results.)
Peace,
Cameron Purdy
Tangosol, Inc.
Coherence: Shared Memories for J2EE Clusters -
An Introduction to Java Map Collection Classes[ Go to top ]
- Posted by: irenicus irenicus
- Posted on: September 17 2004 14:45 EDT
- in response to Cameron Purdy
Comparing HashMap and Hashtable to determine the cost of synchronization is not the best idea as they use differen alorithms. While Hashtable performas better in a few cases like using ascending numbers as keys, HashMap has usually a better overall performance and handles "weak" hashCode() implentations much better. So depending on the test the better performance of the Hashtable's algorithm might make the overhead of synchronization seem smaller than it actually is (of course it might the other way round too) -
An Introduction to Java Map Collection Classes[ Go to top ]
- Posted by: Albert Johns
- Posted on: September 19 2004 05:30 EDT
- in response to Cameron Purdy
"zero performance difference between using HashMap and Hashtable"
Which isn't the case if you were testing concurrent access or update from multiple threads, where the performance is dramatically different. -
An Introduction to Java Map Collection Classes[ Go to top ]
- Posted by: Cameron Purdy
- Posted on: September 19 2004 14:47 EDT
- in response to Albert Johns
Which isn't the case if you were testing concurrent access or update from multiple threads, where the performance is dramatically different.
I think you'd be surprised if you tested it .. ;-)
Peace,
Cameron Purdy
Tangosol, Inc.
Coherence: Shared Memories for J2EE Clusters -
Fastutil[ Go to top ]
- Posted by: Nipsu
- Posted on: September 19 2004 16:18 EDT
- in response to Cameron Purdy
Stephanio Vigna's Fastutil package is so much better if you need to:
- save memory: 2..6 times smaller memory consumption
- and get the faster operation that comes with the previous clause..
Some results (all run with -server -Xms128m -Xmx128m 1.4.2_05 P4 1,6 laptop wXP)
Test1:
Time to count 1783293664 using toArray is 401 (ms) with java.util.HashMap
Time to count 1783293664 iterating is 411 (ms) with java.util.HashMap
Time to count 1783293664 using toArray (ex-creation) is 180 (ms) with java.util.HashMap
Time to count 1783293664 iterating (ex-creation) is 460 (ms) with java.util.HashMap
Time to count 1783293664 using toArray is 511 (ms) with java.util.LinkedHashMap
Time to count 1783293664 iterating is 390 (ms) with java.util.LinkedHashMap
Time to count 1783293664 using toArray (ex-creation) is 200 (ms) with java.util.LinkedHashMap
Time to count 1783293664 iterating (ex-creation) is 380 (ms) with java.util.LinkedHashMap
Time to count 1783293664 using toArray is 90 (ms) with it.unimi.dsi.fastutil.ints.Int2IntAVLTreeMap
Time to count 1783293664 iterating is 170 (ms) with it.unimi.dsi.fastutil.ints.Int2IntAVLTreeMap
Time to count 1783293664 using toArray (ex-creation) is 21 (ms) with it.unimi.dsi.fastutil.ints.Int2IntAVLTreeMap
Time to count 1783293664 iterating (ex-creation) is 270 (ms) with it.unimi.dsi.fastutil.ints.Int2IntAVLTreeMap
Time to count 1783293664 using toArray is 221 (ms) with it.unimi.dsi.fastutil.ints.Int2IntRBTreeMap
Time to count 1783293664 iterating is 271 (ms) with it.unimi.dsi.fastutil.ints.Int2IntRBTreeMap
Time to count 1783293664 using toArray (ex-creation) is 70 (ms) with it.unimi.dsi.fastutil.ints.Int2IntRBTreeMap
Time to count 1783293664 iterating (ex-creation) is 270 (ms) with it.unimi.dsi.fastutil.ints.Int2IntRBTreeMap
Test2: (changed the containsValue loop to 5x smaller to speed the test a bit)
704982704 get() in time 30 (ms) for class java.util.LinkedHashMap
704982704 containsKey() in time 40 (ms) for class java.util.LinkedHashMap
199990000 containsValue()/100 in time 7020 (ms) for class java.util.LinkedHashMap
HashMap's containsValue was so slow that my patience wore out.
704982704 get() in time 40 (ms) for class it.unimi.dsi.fastutil.ints.Int2IntAVLTreeMap
704982704 containsKey() in time 30 (ms) for class it.unimi.dsi.fastutil.ints.Int2IntAVLTreeMap
199990000 containsValue()/100 in time 5237 (ms) for class it.unimi.dsi.fastutil.ints.Int2IntAVLTreeMap
704982704 get() in time 50 (ms) for class it.unimi.dsi.fastutil.ints.Int2IntRBTreeMap
704982704 containsKey() in time 60 (ms) for class it.unimi.dsi.fastutil.ints.Int2IntRBTreeMap
199990000 containsValue()/100 in time 7631 (ms) for class it.unimi.dsi.fastutil.ints.Int2IntRBTreeMap
Test3:
Time to populate with presize: false is 1492 (ms) with java.util.HashMap
Time to populate with presize: true is 1011 (ms) with java.util.HashMap
(presize and loadfactor not available for Int2IntAVLTreeMap)
Time to populate no presize: is 1212 (ms) with it.unimi.dsi.fastutil.ints.Int2IntAVLTreeMap -
Fastutil[ Go to top ]
- Posted by: Talip Ozturk
- Posted on: September 20 2004 11:00 EDT
- in response to Nipsu
for non-concurrent Map util, check out Trove. It simply rocks!
for concurrent Map util, just follow the master
what would be cooler is to have concurrent-primitive-collections API, as fast as Trove and as concurrent as Doug Lea's util package.
-talip -
RE: An Introduction to Java Map Collection Classes[ Go to top ]
- Posted by: Amit Kulkarni
- Posted on: September 17 2004 15:22 EDT
- in response to Sudhakar Ramakrishnan
The article mentions about the internals of how the elements are stored. It mentions that elements are stored in an array, the index at which an element is stored is the [abs(hashcode for the object) % array length]. And in case the index turns out to be the same for multiple values, these multiple values are stored in a linked list at the same index.
Would it not be efficient to store these multiple values in a B-tree at the given index?
Also, since we would need a big enough array so that there are least collisions for a given array index.
Considering both the the space and the time required for searching would it not be a batter idea to implement Maps as an array of balanced binary trees? -
RE: An Introduction to Java Map Collection Classes[ Go to top ]
- Posted by: irenicus irenicus
- Posted on: September 17 2004 16:11 EDT
- in response to Amit Kulkarni
While collisions are somthing common the cases when you have more than 2 or 3 elements at the same index are quite rare (or you have a real problem with your hashCode() implementation). So the it is not really worth it trying being smart and probably faster to use "brute force". -
RE: An Introduction to Java Map Collection Classes[ Go to top ]
- Posted by: Raman Raja
- Posted on: September 17 2004 16:47 EDT
- in response to Amit Kulkarni
Take a look at TreeMap -
RE: An Introduction to Java Map Collection Classes[ Go to top ]
- Posted by: Cameron Purdy
- Posted on: September 17 2004 19:41 EDT
- in response to Amit Kulkarni
The article mentions about the internals of how the elements are stored.
Yup. Also you can see for yourself; the source comes with the JDK. :-)It mentions that elements are stored in an array, the index at which an element is stored is the [abs(hashcode for the object) % array length].
Something like that, depending on the JDK version and which hashed structure you are talking about. It actually _should_ have been:
(int) ((((long) hashcodefortheobject) & 0xFFFFFFFFL) % ((long) arraylength));And in case the index turns out to be the same for multiple values, these multiple values are stored in a linked list at the same index.
Yes, that's called an "hashing with overflow" algorithm.Would it not be efficient to store these multiple values in a B-tree at the given index?
No, because the only thing that can be b-treed is the hashcode (because the objects can't be assumed to be comparable) and the hashcode might very well be the same hashcode if you have collisions.
Furthermore, to reduce collisions, it is far better to extend the size of the bucket array, and to use a prime modulo (the size of the array.)Also, since we would need a big enough array so that there are least collisions for a given array index.
Exactly.Considering both the the space and the time required for searching would it not be a batter idea to implement Maps as an array of balanced binary trees?
That is what TreeMap is. It uses a red/black algorithm. However, a hashed algorithm is O(1) for lookups, while a balanced tree is O(logN), so it's better (as in faster) for most cases to use a hashed approach.
Oleg blogged a bit on this subject:
http://www.javablogs.com/views/ViewBlog.action?id=12415
Peace,
Cameron Purdy
Tangosol, Inc.
Coherence: Shared Memories for J2EE Clusters -
RE: An Introduction to Java Map Collection Classes[ Go to top ]
- Posted by: Srihari S
- Posted on: September 18 2004 10:29 EDT
- in response to Cameron Purdy
Great article, Great questions, Great replies..Thanks to all :) -
Good article[ Go to top ]
- Posted by: Stephen Colebourne
- Posted on: September 20 2004 08:43 EDT
- in response to Sudhakar Ramakrishnan
From my experience, this is a good article about maps and the issues involved.
For anyone looking for more unusual or specialised map implementations, or if you wish there was a mapIterator() method on Map, you might want to look at Jakarta Commons Collections' map package:
http://jakarta.apache.org/commons/collections/apidocs-COLLECTIONS_3_1/index.html
Stephen Colebourne
Committer, Jakarta Commons