Discussions

News: Pramati engineer discusses Java performance engineering

  1. Performance engineering always seems to be a science that only a few really have. Sachin of Pramati has been doing a lot of performance engineering on the Pramati Web Server. He has discussed his experiences with sockets, data copies, data structures, and more.

    Data Structures
    Set instead of List: A common programming mistake is the wrong choice of data structure to do a contains() . List.contains() is an O(n) operation versus Set.contains() for HashSet has a best case behaviour of O(1). So, if order does not matter to you, use Set.

    Object Pooling: Some classes are expensive to create. eg. large arrays. Use pools of these objects so they can be reused, there by preventing GC lags for these, and primarily creation costs. Interestingly, PrintWriter, if pooled, shows considerable performance improvement. The creation of the object is expensive because of a call to get the line separator in its constructor.
    Read more: Experiences In Java Performance Engineering

    Threaded Messages (19)

  2. A mixed bag[ Go to top ]

    Lots of good, sometimes surprising, often common sense advice, but these two caught my eye:
    Create a class that takes a pointer to this character array, its start and length and you can create lots of instances of this class to point to pieces of the character stream instead of String objects which make copies of the characters.
    Isn't this reinventing StringBuffer and StringBuffer.substring(start, end)? as long as you don't need to modify the StringBuffer, that is.
    Set instead of List: A common programming mistake is the wrong choice of data structure to do a contains(). [...]
    Sounds like a special case of "know your basic data structures, algorithms, and big-O's"? The Java Collections framework isn't that big. I get really annoyed when people who call themselves experienced Java developers turn out not to have a clue about the different implementations and their behaviour. Choosing between an ArrayList and a LinkedList or between a List and a Set should be second nature.
  3. String performance problems[ Go to top ]

    Isn't this reinventing StringBuffer and StringBuffer.substring(start, end)? as long as you don't need to modify the StringBuffer, that is.
    I wonder if the String performance problems were related to the use of String.matches() - as others have pointed out String and StringBuffer.substring() do reuse the original character array.
  4. NOT LinkedList![ Go to top ]

    Choosing between an ArrayList and a LinkedList ...... should be second nature.
    Ironically, I would suggest that there are many programmers who overuse LinkedList. LinkedList is a relatively poor class, and uses MUCH more memory/objects than ArrayList for little or no performance gain.

    It is suited for add/remove at the start, add/remove at the end, and iteration. Accessing a LinkedList by index is very poor. In most normal cases ArrayList will outperform LinkedList, and using LinkedList is actually being too clever.

    ArrayList is slowest when adding/removing at the start or in the middle (because it causes an array copy to occur). To handle this use case, the TreeList class from commons collections is a good choice. It is optimised for large, long-lived lists where a lot of adding/removing occurs at indices other than the end and memory use is not important. See http://jakarta.apache.org/commons/collections/apidocs-COLLECTIONS_3_1/index.html

    The following relative performance statistics were derived when testing the TreeList class (largish lists, repeated test runs):

                  get addAtEnd insert iterate remove
     TreeList 3 5 1 2 1
     ArrayList 1 1 40 1 40
     LinkedList 5800 1 350 2 325


    But REMEMBER, premature optimisation is the root of all evil ;-)

    Stephen Clebourne
    Committer, Jakarta Commons
  5. NOT LinkedList![ Go to top ]

    Choosing between an ArrayList and a LinkedList ...... should be second nature.
    Ironically, I would suggest that there are many programmers who overuse LinkedList.
    Really? Within my field of vision, hardly anyone ever uses LinkedList even where it is definitely appropriate.
    LinkedList is a relatively poor class, and uses MUCH more memory/objects than ArrayList for little or no performance gain. It is suited for add/remove at the start, add/remove at the end, and iteration. Accessing a LinkedList by index is very poor. In most normal cases ArrayList will outperform LinkedList, and using LinkedList is actually being too clever.
    There's no surprise in any of this, is there? Certainly nothing that makes LinkedList a "poor class". It's a linked list, and has the behaviour you'd expect from a linked list. It's poor for some things, and excellent for others. It's definitely worth knowing that for small n, where the big-O behaviour doesn't dominate, ArrayList is generally the more efficient implementation, but that too should be no surprise to anyone familiar with these data structures.

    The quoted performance statistics seem skewed against the LinkedList class. Correct me if I'm wrong, but it looks like the "insert" and "remove" benchmarks used add(int, Object) and remove(int), incurring the cost of LinkedList random access in addition to the basic insertion/removal cost. Hardly a fair or representative comparison. You would hope that any developer needing random access knows better than to pick a LinkedList.
  6. NOT LinkedList![ Go to top ]

    Ironically, I would suggest that there are many programmers who overuse LinkedList. LinkedList is a relatively poor class, and uses MUCH more memory/objects than ArrayList for little or no performance gain.It is suited for add/remove at the start, add/remove at the end, and iteration.
    I did some micro benchmarking last year between LinkedList and ArrayList for iteration only (using Iterator), and I was surprised to learn that ArrayList was faster, but not by much. This was on Sun 1.3 JVM/API.
    Obviously iterating using List.get(int) makes LinkedList die.
  7. Set instead of List: A common programming mistake is the wrong choice of data structure to do a contains() . List.contains() is an O(n) operation versus Set.contains() for HashSet has a best case behaviour of O(1). So, if order does not matter to you, use Set.
    A more interesting issue to me is the use of HashSet vs. TreeSet. Most people automatically use HashSet (or Maps - the sets are just based on the maps ..)
    but in many cases this ruins your memory performance, because hash tables use arrays, so if unattended a very large hash table can end up allocating arrays of odd sizes, which fragments memory, and require rehasing and array copying. This can be such a big problem that I would say you should not use a HashSet unless you know that it's bounded in size. It's great for read-only usage obviously.

    If you need a large set with read-write usage, you are better off with a TreeSet. That's a red-black tree. This results in allocation of lots of 'node' objects, but the performance is a lot more predictable. If you watch it run in QA and the performance is good enough for you, you don't have to worry about the performance degrading later.
  8. This is pretty basic optimization stuff, I'm suprised that Sachin of Pramati even bothers to call it "performance engineering". He should have learnt that stuff at University. It says a lot about the technical level of the people in forum who even bother to give this article the time of day.
  9. I don get it. The guy of Pramati just has a couple of performance tips. They are not really groundbreaking, so what?

    What did you guys expect? I'm always amazed to see that here on these forums everybody knows all this stuff already. How did you learn it? If you are lucky university, otherwise you did it through forums like this one, websites, articles, books,... If you know something already: just don't read the article.

    I for one would be really happy if 50% of my teammembers would know this stuff. I can tell you it more like 20-30% on average. And if they know it does not mean they necessarily use it. Only if performance is bad and the profiler shows it is because of List.contains the structure will be converted to Set.

    Lots of programmers just don't know when they need a instance variable or a parameter. Programmers cannot refactor, another if test will fix the bug, so methods of 500+ lines are common place. Readibility? Sure! No problem, "I" can read it.

    And a really bad programmer? How can prove he is really bad? A manager cannot judge his work. So he can stay. Or if he must go, he will get a good reference so nobody really feels bad.

    Phew, glad I got this of my chest.
    Natan

    PS I must admit "performance engineering" looks a bit over the top, but look at the average job-title...
  10. Everyone's an expert..[ Go to top ]

    I don get it. The guy of Pramati just has a couple of performance tips. They are not really groundbreaking, so what? What did you guys expect? I'm always amazed to see that here on these forums everybody knows all this stuff already.
    It's the nature of the beast I'm afraid: everyone likes to think they're a performance expert, and the easiest way to appear to be that is to flame every article or post about performance. He did say in his article "Some you know, perhaps, some you don't."

    I think a large group of programmers have never thought of the implications of simple things like looping and sticking together a big String. Whenever I find myself teaching a java class, a simple demo with StringBuffer vs adding Strings usually gives a good example of performance tuning.

    I'd recommend anyone interested in general java performance tuning: there's a sun press book "enterprise java performance" I think it's called. Full of lots of low level performance stuff (with actual statistics comparing them). Still quite often the big performance gains are usually at an architectural or design level rather than micro-tweaking of java methods.

    Regards, Nathan Lee

    P.S. Although sometimes the flaming might be appropriate: for anyone attending TSS Symposium, there was an certain keynote speaker talking about J2EE performance that did seem to have some extremely basic tips such as "turn on connection pooling", "don't use the single thread model of servlets".. Given the crowd (500 or so J2EE experts), it was never going to go down well.
  11. just write the code[ Go to top ]

    I for one am tired of hearing the StringBuffer String story. Now I see programmers use stringbuffer making ugly code on a single call that will use 10 minutes of processor time. You only need to optimize where its mecessary. Most of you code will not be performance critical. Make it simple make it readable and optimize it after performance test or never if its good enough.
    Guessing what is performance critical is not worth while and often impossible.
    At least Pramati was real not opinion on what costs

    cheers Rob
  12. just write the code[ Go to top ]

    I for one am tired of hearing the StringBuffer String story. Now I see programmers use stringbuffer making ugly code on a single call that will use 10 minutes of processor time. You only need to optimize where its mecessary. Most of you code will not be performance critical. Make it simple make it readable and optimize it after performance test or never if its good enough. Guessing what is performance critical is not worth while and often impossible.At least Pramati was real not opinion on what costscheers Rob
    This is code "crappy" vs "clever" string concatination :

    public class TestStringConcat {


     public String concatinateCrappy( String strings[] ){

       String str = "";

       for( int i = 0; i < strings.length; i++ ){
          
          str += strings[i];

       }

       return str;

     }

     public String concatinateClever( String strings[] ){

       StringBuffer buff = new StringBuffer();

       for( int i = 0; i < strings.length; i++ ){
          
          buff.append(strings[i]);

       }

       return buff.toString();

     }


    }

    This is compiled code :


    Compiled from "TestStringConcat.java"
    public class TestStringConcat extends java.lang.Object{
    public TestStringConcat();
      Code:
       0: aload_0
       1: invokespecial #1; //Method java/lang/Object."<init>":()V
       4: return

    public java.lang.String concatinateCrappy(java.lang.String[]);
      Code:
       0: ldc #2; //String
       2: astore_2
       3: iconst_0
       4: istore_3
       5: iload_3
       6: aload_1
       7: arraylength
       8: if_icmpge 38
       11: new #3; //class StringBuffer
       14: dup
       15: invokespecial #4; //Method java/lang/StringBuffer."<init>":()V
       18: aload_2
       19: invokevirtual #5; //Method java/lang/StringBuffer.append:(Ljava/lang/String;)Ljava/lang/StringBuffer;
       22: aload_1
       23: iload_3
       24: aaload
       25: invokevirtual #5; //Method java/lang/StringBuffer.append:(Ljava/lang/String;)Ljava/lang/StringBuffer;
       28: invokevirtual #6; //Method java/lang/StringBuffer.toString:()Ljava/lang/String;
       31: astore_2
       32: iinc 3, 1
       35: goto 5
       38: aload_2
       39: areturn

    public java.lang.String concatinateClever(java.lang.String[]);
      Code:
       0: new #3; //class StringBuffer
       3: dup
       4: invokespecial #4; //Method java/lang/StringBuffer."<init>":()V
       7: astore_2
       8: iconst_0
       9: istore_3
       10: iload_3
       11: aload_1
       12: arraylength
       13: if_icmpge 30
       16: aload_2
       17: aload_1
       18: iload_3
       19: aaload
       20: invokevirtual #5; //Method java/lang/StringBuffer.append:(Ljava/lang/String;)Ljava/lang/StringBuffer;
       23: pop
       24: iinc 3, 1
       27: goto 10
       30: aload_2
       31: invokevirtual #6; //Method java/lang/StringBuffer.toString:()Ljava/lang/String;
       34: areturn

    }

    "concatinateCrappy" creates object instance per iteration, but as I understand there is no mojor difference for java application performance. Most of this kind optimizations are useless and it is better to optimize "architecture"
    ( drop some useless code, layers and contracts )

    BTW "concatinateCrappy" is faster then strings.length == 0 :)
  13. I don get it. The guy of Pramati just has a couple of performance tips. They are not really groundbreaking, so what? What did you guys expect? I'm always amazed to see that here on these forums everybody knows all this stuff already. How did you learn it? If you are lucky university, otherwise you did it through forums like this one, websites, articles, books,... If you know something already: just don't read the article.
    This article is bit lame and most of comments are rigth. If you do not know it then javadoc is more usefull for you, there are a lot of good books too, but be sceptic if you want to learn, some things are wrong in books too.
  14. Set instead of List: A common programming mistake is the wrong choice of data structure to do a contains() . List.contains() is an O(n) operation versus Set.contains() for HashSet has a best case behaviour of O(1). So, if order does not matter to you, use Set.
    True. For some reason, many programmers reflexively use Lists when there is no ordering required, as if List was synonymous with Collection. Probably because "list" is a more intuitive data structure than "set", and because "arrays", which are necessarily ordered, have always offered a convenient way to implement both lists and sets.
  15. True. For some reason, many programmers reflexively use Lists when there is no ordering required, as if List was synonymous with Collection.
    Perhaps the reason is that an ArrayList is the most efficient general-purpose Collection around, which makes it an obvious choice if you're just looking for a bag with no particular uniqueness, performance or other requirements. Nothing wrong with that, if it weren't for two commonly made mistakes: (1) the ArrayList is often used as a List, which is a very misleading expression of programmer intent if all you're interested in is some form of Collection; (2) the collection used is often never rethought even when operations are performed that do ask for specific behaviour or performance characteristics.
  16. Lists are intuitive[ Go to top ]

    Lists are an excellent choice for basic 'collection' data requirements, perhaps supplemented with an extra map or set if lookup/ uniqueness access is required.

    The basic reason here is that humans find it much easier to view, read, verify and otherwise interact with system data when it doesn't get arbitrarily reordered.

    Particularly useful combinations for application use, are ArrayList with HashMap and ArrayList with IdentityHashMap. Most applications will lookup user identifiers to data/ internal structures at various points, and this enables sensible & verifiable ordering to be preserved for reporting/ diagnosis as well as efficient lookup.


    Thomas Whitmore
    http://www.powermapjdo.com
  17. <quote>
    String Operations : If you are writing extremely performance sensitive code such as a highly benchmarked web server :-) then keep Strings to a minimum. I cannot stress this enough. If you have a large character stream that you need to tokenize, parse, etc, use references into character arrays to reduce String operations. Create a class that takes a pointer to this character array, its start and length and you can create lots of instances of this class to point to pieces of the character stream instead of String objects which make copies of the characters. String concatenations can be quite expensive too.
    <quote>

    My advice would be exactly the opposite. If you have big char array and you want to tokenize it, you are better off creating a huge string and using String.substring to split/tokenize it. substring does not make copy of the string but holds a reference to the original String's data making it very efficient.

    It is very likely that you would to convert the parts of char array to String anyway. If you are working with char array instead, you need to construct string objects, which internally allocates a char array and copies it. You avoid this if you are using substring.

    If you are not convinced, take a large char array and try to split it into number of strings using linefeed as separator. Compare this with constructing a string object for the whole array and using substring to split. The latter is faster by a long way - by cutting down object (char[]) allocation and copying.

    PS: I have been writing parsers (for SWIFT, FIX etc) the last two years and have found this technique very useful

    - Krishnan
  18. Pramati Engineer replies ...[ Go to top ]

    Firstly, thanks to Dion for linking my article here. My primary intention was to chronicle my findings while involved in a performance tuning exercise. The effort was focused at brute throughput improvements and therefore involved more low-level tweaking. I would have put more effort into the article if I had known I was addressing an audience of this magnitude :-). So, being a victim of unexpected attention, I think it would be prudent to cover a couple of findings in more detail since quite a few people have mentioned them.

    I would very strongly agree with people who have mentioned that 'clean' code is much better than 'clever' code. Some of the tunings are potential cases where encapsulation gets broken, code maintainability gets affected. Also, if performance is satisfactory, I would prefer not to go overboard tuning it. Sometimes though, when you are competing with other very similar [standard-based] products, you cannot explain to a potential customer that your code is slower than XYZ because our code is much 'cleaner'.

    A lot of people have mentioned String.substring* making a reference to the original char array. Yes it does that, there are, however, some useful cases of using the character array and pointers to pieces of data inside it [lets call it a CharBuffer] without using String. Lets take the case of the Web Server again. HTTP request headers are read from the socket's input stream into a buffer. There is no way of creating a String from this buffer without a data copy. Subsequently there is no way to retrieve the character array back from String without a data copy.
    There are cases where you need to convert the String to a char[] e.g. the request line needs to be written to access logs. CharBuffer saves data copies in these situations. Another situation arises because of the case insensitive nature of HTTP headers. It's cheaper to convert them to lower case or upper case in place and then process the request - again CharBuffer helps.

    Object pooling does help considerably although it could again be considered as excessive optimization for most software. Although, GC performance has significantly improved version after version, if you really have long lasting allocations like buffers, pooling does help. PrintWriter, like I mentioned in the article, is a different case, where the cost of executing its constructor makes it a good choice for pooling.

    * for business reasons, our code compiles with JDK 1.3 so I cannot make use of JDK 1.4 features. If some point is redundant as a result of newer classes in JDK 1.4 please bear with me :-)
  19. Pramati Engineer replies ...[ Go to top ]

    A lot of people have mentioned String.substring* making a reference to the original char array. Yes it does that, there are, however, some useful cases of using the character array and pointers to pieces of data inside it [lets call it a CharBuffer] without using String.
    The single reason to use "CharBuffer" is useless synchronization in "StringBuffer", but jdk 1.4 solves it too.
    Lets take the case of the Web Server again. HTTP request headers are read from the socket's input stream into a buffer. There is no way of creating a String from this buffer without a data copy. Subsequently there is no way to retrieve the character array back from String without a data copy. There are cases where you need to convert the String to a char[] e.g. the request line needs to be written to access logs. CharBuffer saves data copies in these situations.
    You will save more if you will use byte arrays, you read bytes from socket and write bytes to log, so you adding encoding converter overhead twice.
    Object pooling does help considerably although it could again be considered as excessive optimization for most software. Although, GC performance has significantly improved version after version, if you really have long lasting allocations like buffers, pooling does help. PrintWriter, like I mentioned in the article, is a different case, where the cost of executing its constructor makes it a good choice for pooling.* for business reasons, our code compiles with JDK 1.3 so I cannot make use of JDK 1.4 features. If some point is redundant as a result of newer classes in JDK 1.4 please bear with me :-)
    I am not sure about object pooling, but as understand this pool is shared, so you add synchronization overhead, how doe's it scale vs "new" ?
  20. Just a note[ Go to top ]

    I would recommend everybody who's interested in performance tuning to check out excellent book Java Performance Tuning by Jack Shirazi.

    One more note: don't overfocus on performance tuning. Yes, it's important to keep track on it during project development. Running load/stress tests every week would be enough to be sure that evolving codebase meets service level requirements.