What is faster? Hashtable/hashmap, others?

Discussions

News: What is faster? Hashtable/hashmap, others?

  1. What is faster? Hashtable/hashmap, others? (15 messages)

    James Sutherland has published a series of microbenchmarks on performance of specific twinned classes - things like Hashtable vs. HashMap, etc. etc. He has some interesting conclusions.

    True performance optimization involves measuring the current performance. Profiling the application and determining the bottlenecks. Investigating and implementing optimizations to avoid the bottlenecks.

    But how should one code on a day to day basis for optimal performance? Are synchronized methods slow? Are final methods fast? What is faster Vector, ArrayList or LinkedList? What is the optimal way to iterate a List? Is refection slow? I will attempt to answer these questions in this post.

    For Hashtable/HashMap: Hashtable wins.

    Between Vector and ArrayList, ArrayList wins.

    For iteration, using the index into an ArrayList/Vector is fastest, with an Enumeration being faster than an Iterator.

    There are more results (mostly focused on reflection and methods being set to final, and stuff like that) but what stands out to me is a hole in methodology: he used an old JVM (1.6.0_07 or so, maybe he was unable to uninstall and update, like that java champion?) and he mostly reports from single-threaded use.

    That last point is the thing that bugs me: everything was single-threaded. On one hand, that makes sense - most code is single-threaded. People don't code for multiple thread-access to a single collection. On the other hand, it's more common than you might think, and the effects can be pretty bad if you're not at least ready to go threadsafe; on the third hand, maybe you shouldn't use Java if you're doing major threading. The JVM, fine, but use Scala or something.

    Threaded Messages (15)

  2. Truly, I don't think I got it right. The fact tha hashtable does 3605235 operations/10 seconds doesn't make it faster than hashmap (2908854), only that the first is more complex than the second one.

    What were the total execution time for each class? Even if the times spent for each class were equal, I'd go with hashmap, because of less resource spent...

  3. This kind of benchmarks is not very interesting in my opinion. First, a LinkedList and an ArrayList do not address the same needs (to pick an example). Second, I highly doubt your choice of container really affects the performance of your typical web app. The bottleneck is simply not there.

    It's way more interesting to consider whether a particular container supports concurrency or preserves the order of inserts.

  4. In my experience often performance issue is I/O. People are building distributed systems that spend most time waiting for database or remote service to respond. Then, when performance is measured per business operation, it turns out that in-vm processing takes fractions of total time, no matter what type of collection is used.

    When optimizing performance, it is always important to know the context. While there are myriads of implementation aspects that are abstractly slower or faster, only a few of them do really matter in specific context. And worth optimization effort.

  5. I/O is the bottleneck[ Go to top ]

    I strongly agree.

  6. Agree[ Go to top ]

    In my experience often performance issue is I/O. People are building distributed systems that spend most time waiting for database or remote service to respond. Then, when performance is measured per business operation, it turns out that in-vm processing takes fractions of total time, no matter what type of collection is used.

    When optimizing performance, it is always important to know the context. While there are myriads of implementation aspects that are abstractly slower or faster, only a few of them do really matter in specific context. And worth optimization effort.

    Very good and true point! 

  7. Although I fully agree that noramlly optimizations are much more coarse, I/O, database etc. It depends on what you are optimizing for. If you need maximum throughput you need to run many threads with little I/O etc. But you may also optimize for minimal latency, i.e. if running only 10 threads each request must take <10ms. So depending on your actual problem all types of optimizations may help you.

  8. Good data. Needs more refinement[ Go to top ]

    Interesting data for single threaded tests. But different java data structure are optimized for specific use cases. It is not correct to compare them. Some facts like slowness of Reflection are well known

    It would have been more interesting, if we had this data on different JDK versions which would have given us some ideas about the improvements done in Java.

     

     

  9. LinkedList[ Go to top ]

    Has anybody found a good reason (real world scenario) to use LinkedList?

    LinkedList uses up to twice the space (or worse really because it is doubly linked), it caches poorly, and is slower for most operations (even the in ones it is supposed to beat ArrayList).

    Has anybody found it useful?

  10. LinkedList has its place[ Go to top ]

    http://stackoverflow.com/questions/322715/when-to-use-linkedlist-over-arraylist

    When I am performing add/remove operations on a list frequently, and don't need to access data based on index and only want to iterate over a list of elements, I am using LinkedList.

  11. LinkedList has its place[ Go to top ]

    I always remember this old article:

    http://onjava.com/pub/a/onjava/2001/05/30/optimization.html?page=3&x-order=date

    That shows that ArrayList often used to outperform LinkedList. We can't ignore that ArrayList is much better in terms of the cache and the cost of dereferencing links. Only if you insert at the beginning continuously (a rare scenario or one that can be avoided) will LinkedList be better. ArrayList is even better at adding in the middle.

    We also can't ignore that internal copies can be done fairly efficiently with memcopies on some architectures. So I haven't found a good place to use LinkedList in realworld code.

     

  12. Good pick in interviews.[ Go to top ]

    I do, it is a good pick for intervewer. other than that I don't see any needful.

  13. Java 6 update 7???[ Go to top ]

    This is just silly! Why would anyone want to benchmark such an ancient version?

  14. Java 6 update 7???[ Go to top ]

    Comparing Hashtable and Hashmap may or may not be interesting but whether Java 6 update 7 is ancient or not is very very debatable. There are companies running their production environment on JDK 1.4.

    Besides, there are probably very few changes (if at all) made to the Hashtable/Hashmap since Java 6 update 7. So it probably doesn't matter that the author ran his tests on Java 6 update 7.

  15. Java 6 update 7???[ Go to top ]

    There are companies running their production environment on JDK 1.4.

    As the rumor goes, there's also a company still running stuff on an actual Commodore 64. I'm not sure how useful benchmarks executed on a C64 are though.

    There was a post a few days back here about a survey regarding JDK usage. No results have surfaced though, but I remember surveys from a couple of years ago(!), and the usage of JDK 1.4 was tiny.

    Besides, there are probably very few changes (if at all) made to the Hashtable/Hashmap since Java 6 update 7.

    The JVM is automatically tested as well. There have been HUGE changes in the JVM (hotspot) since Java 6 update 7. Advise being given or conclusions based on such an ancient version just doesn't make any sense. If you go to all this trouble to benchmark, and you just use what 'happens to be on your system'?

    Sloppy...

  16. Survey results[ Go to top ]

    I'm still waiting on more survey results. I think I have enough to draw some conclusions, but I'm expecting another wave of responses.