Discussions

News: Java Best Practices – String performance and Exact String Matching

  1. Continuing our series of articles concerning proposed practices while working with the Java programming language, we are going to talk about String performance tuning. We will focus on how to handle String creation, String alteration and String matching operations efficiently.

    Furthermore we will provide our own implementations of the most commonly used algorithms for Exact String Matching. Many of these algorithms can achieve far more superior performance compared to the naive approach for exact String matching available with the Java Development Kit.

    This article concludes with a performance comparison between the aforementioned Exact String Matching algorithms.

     

    Read more at :

    Java Code Geeks: Java Best Practices – String performance and Exact String Matching

  2. From the article:

    "For example concatenating two strings using the concatenation operator (+) results in the creation of two new objects, a temporary String object used for the actual concatenation and the new String instance pointing to the concatenated result (the String “toString()” operation is utilized to instantiate the resulting String).

    Read more: Java Code Geeks: Java Best Practices – String performance and Exact String Matching http://www.javacodegeeks.com/2010/09/string-performance-exact-string.html#ixzz0yqtTFb26"

    Sorry, I stopped after reading those lines. This is definitely not true (at least not in all situations).  For example:

    "a" + "b" + "c"

    will result in

    new StringBuilder().append("a").append("b").append("c")

  3. Sorry, I stopped after reading those lines. This is definitely not true (at least not in all situations).  For example:

    "a" + "b" + "c"

    will result in

    new StringBuilder().append("a").append("b").append("c")

    Actually, the above compiles to "abc".  You can use javap to verfiy this.

  4. Sorry, I stopped after reading those lines. This is definitely not true (at least not in all situations).  For example:

    "a" + "b" + "c"

    will result in

    new StringBuilder().append("a").append("b").append("c")

     

    Actually this will result in "abc". Furthermore it is the ONLY case that our statement is not valid. We did not mention this specific case because we do not believe that anyone will concatenate literals like that!! It is just a naive approach that the compiler handles correctly. Seriously why do a "a" + "b" + "c" and do not do "abc" in the first place??!!!

  5. Actually this will result in "abc". Furthermore it is the ONLY case that our statement is not valid. We did not mention this specific case because we do not believe that anyone will concatenate literals like that!! It is just a naive approach that the compiler handles correctly. Seriously why do a "a" + "b" + "c" and do not do "abc" in the first place??!!!

    I saw some code that was written for us recently that had maybe a hundred lines of SQL embedded in Java code.  This is bad enough but they used a StringBuilder to do it.

    It's not true that this is the only case where your statement is not valid.  Consider the following method:

    String foo(String s) {

      return "pre" + s + "post";

    }

    Use javap or open the class in Eclipse and you'll see something like this:

     java.lang.String foo(java.lang.String s);
         0  new java.lang.StringBuilder [140]
         3  dup
         4  ldc <String "pre"> [142]
         6  invokespecial java.lang.StringBuilder(java.lang.String) [144]
         9  aload_1 [s]
        10  invokevirtual java.lang.StringBuilder.append(java.lang.String) : java.lang.StringBuilder [145]
        13  ldc <String "post"> [149]
        15  invokevirtual java.lang.StringBuilder.append(java.lang.String) : java.lang.StringBuilder [145]
        18  invokevirtual java.lang.StringBuilder.toString() : java.lang.String [151]
        21  areturn

    If you then use StringBuilder explicitly, you will see that the byte codes are virtually identical.

  6. It's not true that this is the only case where your statement is not valid.  Consider the following method:

    String foo(String s) {

      return "pre" + s + "post";

    }

    Use javap or open the class in Eclipse and you'll see something like this:

     java.lang.String foo(java.lang.String s);
         0  new java.lang.StringBuilder 140
         3  dup
         4  ldc <String "pre"> 142
         6  invokespecial java.lang.StringBuilder(java.lang.String) 144
         9  aload_1 [s]
        10  invokevirtual java.lang.StringBuilder.append(java.lang.String) : java.lang.StringBuilder 145
        13  ldc <String "post"> 149
        15  invokevirtual java.lang.StringBuilder.append(java.lang.String) : java.lang.StringBuilder 145
        18  invokevirtual java.lang.StringBuilder.toString() : java.lang.String 151
        21  areturn

    If you then use StringBuilder explicitly, you will see that the byte codes are virtually identical.

    I totally agree with you, but allow me to remind you of my exact statement we argue about :

    For example concatenating two strings using the concatenation operator (+) results in the creation of two new objects, a temporary String object used for the actual concatenation and the new Stringinstance pointing to the concatenated result (the String “toString()” operation is utilized to instantiate the resulting String).

    Dirk commented on that exact statement backing up his point with the "a" + "b" + "c" concatenation example (please see above). What you have shown us with your example is that my exact statement is valid. In fact is valid in every case except the one that we both previously mentioned where the compiler eliminates the concatenation entirely!

  7. The statement has a typo, as you can see from the relevant link address, the temporary object mentioned is not obviously a String but a StringBuffer/StringBuilder

  8. The statement has a typo, as you can see from the relevant link address, the temporary object mentioned is not obviously a String but a StringBuffer/StringBuilder

    Ah, I see now.  I figured you were talking the way the + operator worked way back in the day when it would use the String.concat() method which would create a new String for each one you concatenated.  That is, if you concatenated 5 Strings, there would be 4 new Strings created in the process.  When that was how it worked, a lot of people advocated using StringBuffer all the time.  I'm guessing Dirk thought the same.

    Actually, if you are just concatenating 2 Strings, the old way (using concat()) is likely to be faster than using a StringBuilder but overall, it's insignificant.  Even creating a lot of short-lived objects isn't that big of a deal anymore now that we have generation collectors.

  9. I'm guessing Dirk thought the same.

    Yes, that is what I thought. To me it wasn't obvious that it was ment another way - likely because of my bad english. As for the example I gave: Actually, I oversimplified. Sorry for that. But I hope most people could transfer the example to a real-world scenario.

    Regards,

        Dirk

  10. So if I understand well, the authors, after having implemented tons of exotics Exact String Matching algorithms, determined that the best performer for the 99.999% cases is: the Java naïve String.indexOf() !

  11. So if I understand well, the authors, after having implemented tons of exotics Exact String Matching algorithms, determined that the best performer for the 99.999% cases is: the Java naïve String.indexOf() !

    We did not implement so many Exact String Matching algorithms only to determine the above statement. We did it so as to provide the community with an extremely rich suite of Exact String Matching algorithm implementations in Java. Anyone who wants to manipulate very large documents can benefit from one of the Exact String Matching algorithms of our suite.

  12. For the String concatenation, as pointed out by Dirk, the + operator will actually result in equivalent bytecodes to the StringBuilder approach in many cases.  As I note, in some cases it actually eliminates the concatenation entirely.

    The only time you need to worry about using StringBuilder directly is when you are concatenating a unknown number of Strings such as in a loop.  Using StringBuilder in most other cases will result in at best no improvement in performance and at worst a decrease in performance.

    The one really killer thing that Java developers need to know about String manipulation has to do with the way that substring() works.  If you load a huge String into memory and then substring a small String out of it, the entire underlying char array of the big source String is referenced by the new String.  This can result in serious memory leaks.  This situation is why the String(String) constructor exists.

  13. For the String concatenation, as pointed out by Dirk, the + operator will actually result in equivalent bytecodes to the StringBuilder approach in many cases.  As I note, in some cases it actually eliminates the concatenation entirely.

    The only time you need to worry about using StringBuilder directly is when you are concatenating a unknown number of Strings such as in a loop.  Using StringBuilder in most other cases will result in at best no improvement in performance and at worst a decrease in performance.

    The one really killer thing that Java developers need to know about String manipulation has to do with the way that substring() works.  If you load a huge String into memory and then substring a small String out of it, the entire underlying char array of the big source String is referenced by the new String.  This can result in serious memory leaks.  This situation is why the String(String) constructor exists.

    I'll second that! I would have thought it was common knowledge since jdk1.5?

  14. about exact string matching[ Go to top ]

    Don't forget that modern JDKs are using SSE 4.2 intrinsics for String.compareTo() and String.indexOf()