Often, enterprise Java applications contain code similar to the following:String x = "";x is subsequently sent as part of an HTML page to the client's browser. Does it make sense to use ropes to compute the value of x, rather than a StringBuilder, which is what the compiler generates by default? The answer is no, for a variety of reasons. To begin with, the amount of data being concatenated is, presumably, small, so using a rope is unlikely to improve performance, although it could improve robustness and scalability. (Consider how both solutions would behave if getName unexpectedly returned a 50 MB string.) But for the sake of argument, imagine that many more chunks of data are being concatenated. Given that Rope's append performance is generally better than StringBuffer's, would a rope make sense then? Again, no. Whenever input data is being combined to produce formatted output, the cleanest and most efficient approach is to use a template engine such as StringTemplate or FreeMarker. Not only does this approach cleanly separate presentation markup from code, but templates are compiled once (often to JVM bytecode) and then can be reused, giving them excellent performance characteristics. A second benefit of using templates exposes a fundamental flaw common to output-building routines like those in the code above, including one written using ropes. This benefit is that templates can be incrementally evaluated, and output can be written to a Writer as soon as it is produced, rather than unnecessarily first being accumulated in memory. In a Java EE application, where the Writer is actually a buffered connection to the client's browser, this approach to output rendering uses constant memory, O(1), versus the other solutions' O(n) memory usage. This is a huge improvement to application scalability and robustness, even though it's not apparent for smaller inputs or at lower application loads.
-
Theory and Practice of Ropes for Java String Manipulations (14 messages)
- Posted by: Matt Papendorf
- Posted on: February 14 2008 14:02 EST
Systems that manipulate large quantities of string data are poorly served by the Java language's default String and StringBuilder classes. A rope data structure can be a better alternative. This article introduces Ropes for Java, a rope implementation for the Java platform; explores performance issues; and provides pointers for effective use of the library. For more Java resources, and to specifically learn more about Java Testing and Debugging, take a look at this developerWorks Space. For an enterprise Java developer, one of the most interesting considerations for Ropes is that iteration over a flat Rope (a Rope with a depth of 1) is much faster than pulling data character by character from a String. (If you have no need to do this, then Ropes may not mean much of anything to you. It's also worth noting that Java's NIO structure has mmapping capabilities to make iteration over buffers extremely fast, and that Ropes are especially appropriate for editors, not enterprise Java applications - but ignorance isn't bliss, it's ignorance.) An important extract for Java EE developers:Threaded Messages (14)
- Re: Theory and Practice of Ropes for Java String Manipulations by David Rosenstrauch on February 14 2008 16:34 EST
- Ropes for Java by Paul Fremantle on February 15 2008 05:23 EST
- Re: Ropes for Java by Raffaele Guidi on February 15 2008 10:15 EST
- Re: Theory and Practice of Ropes for Java String Manipulations by Sam Ferguson on February 15 2008 08:26 EST
- How it causes memory leak by balanagireddy mudiam on February 15 2008 09:02 EST
-
Re: How it causes memory leak by James Watson on February 15 2008 09:50 EST
- Re: How it causes memory leak by James Watson on February 15 2008 12:56 EST
- Memory Leak by Matthew Inger on February 15 2008 03:24 EST
-
Re: How it causes memory leak by James Watson on February 15 2008 09:50 EST
- Ropes for Java by Paul Fremantle on February 15 2008 05:23 EST
- A small point by Ivaylo Kovatchev on February 15 2008 09:51 EST
- Re: A small point by Joseph Ottinger on February 15 2008 10:06 EST
- Re: A small point by Stefan Zobel on February 15 2008 10:13 EST
- hello?! let's clear up a few things by John Vance on February 16 2008 09:53 EST
- Re: hello?! let's clear up a few things by James Watson on February 16 2008 21:50 EST
- Re: hello?! let's clear up a few things by James Watson on February 16 2008 22:24 EST
-
Re: Theory and Practice of Ropes for Java String Manipulations[ Go to top ]
- Posted by: David Rosenstrauch
- Posted on: February 14 2008 16:34 EST
- in response to Matt Papendorf
It's a very cool approach, and clearly has some good efficiency advantages. But it has some drawbacks too, which he mentioned in a side bar to his article, but not prominently. Namely: ropes can cause major memory leaks. For example: Rope r = new Rope(); r = r.substring(0, 1); The result is a 1-character rope ... but which is carrying around a massive amount of characters along with it! -
Ropes for Java[ Go to top ]
- Posted by: Paul Fremantle
- Posted on: February 15 2008 05:23 EST
- in response to David Rosenstrauch
The other problem with this is that R4J is a GPL licensed project. I know a lot of projects that might look at this but can't because of the GPL license. For example, every single Apache project is unable to use this. -
Re: Ropes for Java[ Go to top ]
- Posted by: Raffaele Guidi
- Posted on: February 15 2008 10:15 EST
- in response to Paul Fremantle
+1 GPL should be forbidden for libraries (just kidding, freedom is freedom) -
Re: Theory and Practice of Ropes for Java String Manipulations[ Go to top ]
- Posted by: Sam Ferguson
- Posted on: February 15 2008 08:26 EST
- in response to David Rosenstrauch
It's a very cool approach, and clearly has some good efficiency advantages. But it has some drawbacks too, which he mentioned in a side bar to his article, but not prominently. Namely: ropes can cause major memory leaks.
The same applies to standard Java Strings, I believe. For example:
For example:
Rope r = new Rope();
r = r.substring(0, 1);
The result is a 1-character rope ... but which is carrying around a massive amount of characters along with it!
String s = new String();
s = s.substring(0, 1);
-
How it causes memory leak[ Go to top ]
- Posted by: balanagireddy mudiam
- Posted on: February 15 2008 09:02 EST
- in response to David Rosenstrauch
I didnt understand how the below code is causing memory leaks. can you tell me... Rope r = new Rope(); r = r.substring(0, 1); -
Re: How it causes memory leak[ Go to top ]
- Posted by: James Watson
- Posted on: February 15 2008 09:50 EST
- in response to balanagireddy mudiam
I didnt understand how the below code is causing memory leaks. can you tell me...
The new string instance holds a reference to the structure backing the rope. If that structure is large, you will be using much more memory than a 1 char string normally would and it will not be released until after the 1 char string is unreachable. Technically this isn't a memory leak but in Java-land that's what we call this. This is true in older versions of Java (including 1.4) for String.subString() and StringBuffer.toString() also but seems to have been done away with in recent versions.
Rope r = new Rope();
r = r.substring(0, 1); -
Re: How it causes memory leak[ Go to top ]
- Posted by: James Watson
- Posted on: February 15 2008 12:56 EST
- in response to James Watson
This is true in older versions of Java (including 1.4) for String.subString() and StringBuffer.toString() also but seems to have been done away with in recent versions.
Correction: String.toString() still shares the underlying array in 1.6. StringBuffer.toString() no longer does. -
Memory Leak[ Go to top ]
- Posted by: Matthew Inger
- Posted on: February 15 2008 15:24 EST
- in response to balanagireddy mudiam
The same thing that causes a "memory leak" in this case can be thought of as an optimization. By re-using the array, and simply using different count and offset values, you're saving memory when dealing with substrings. FYI: The trim() method has the same properties as the substring method in terms of the re-use of the array, as it's implementation actually uses the substring method. If it's that much of an issue, you can instead do: s = new String(s.substring(0,1)); This copy constructor actually creates a new array if the source array has unused space in it, ensuring that the new string is backed by an array of exactly the right size. Is there something similar in a Rope object? -
A small point[ Go to top ]
- Posted by: Ivaylo Kovatchev
- Posted on: February 15 2008 09:51 EST
- in response to Matt Papendorf
Very interesting read. I have one nitpicky question though: Isn't the maximum size of a String 2^16 characters? That's a lot less than 50 megabytes. Or do I have it wrong? -
Re: A small point[ Go to top ]
- Posted by: Joseph Ottinger
- Posted on: February 15 2008 10:06 EST
- in response to Ivaylo Kovatchev
The maximum size of a String in Java is roughly 2GB, theoretically, if your JVM could manage that. -
Re: A small point[ Go to top ]
- Posted by: Stefan Zobel
- Posted on: February 15 2008 10:13 EST
- in response to Ivaylo Kovatchev
Isn't the maximum size of a String 2^16 characters?
It's Integer.MAX_VALUE (= 2^31 -1), the maximum length of a char[] (in theory).
...
Or do I have it wrong? -
hello?! let's clear up a few things[ Go to top ]
- Posted by: John Vance
- Posted on: February 16 2008 09:53 EST
- in response to Matt Papendorf
A memory leak in Java land is the same as a memory leak in any other land and s = new String(); s.substring(0,1); will not produce a memory leak, but a very long string object waiting for garbage collection. What happens if you use Rope instead of String... i don't know. Also: the maximum length of a string is not 2^31 because Java saves Strings in modified UTF-16 where one character in supplemental Unicode planes can take up to 6 bytes but not contain a zero-byte (which is great for C compatibility). So the maximum length of a String depends on two things: "what characters do you use" and "what architecture does your VM run on", because the memory limits of a 64-bit VM are very different (naturally) from a 32 bit VM. The maximum length of a String may very well be 2^31 BYTES. Just for completeness: String.length() will not give you accurate results though, because for backwards-compatibility this function returns the count of UCS-2 characters in your string, so it will NOT give accurate results if your String contains characters that are only representable by 3+ bytes. -
Re: hello?! let's clear up a few things[ Go to top ]
- Posted by: James Watson
- Posted on: February 16 2008 21:50 EST
- in response to John Vance
A memory leak in Java land is the same as a memory leak in any other land
A memory leak is caused by memory that is unreachable but not deallocated. In Java this is the exact situation that makes memory available for collection by the garbage collector. The situation described here, where memory is referenced but no longer needed is not the same thing. The term I've seen used for this is a 'space leak'. But since memory leaks don't technically exist (or a always do, depending on your perspective) calling this a memory leak doesn't create any real confusion.and
The original String object can be collected but the large char[] backing it (the vast majority of the Strings memory) will not be collectable until the second String is unreachable (assuming no other object refer to the array.)
s = new String();
s.substring(0,1);
will not produce a memory leak, but a very long string object waiting for garbage collection. -
Re: hello?! let's clear up a few things[ Go to top ]
- Posted by: James Watson
- Posted on: February 16 2008 22:24 EST
- in response to John Vance
Also: the maximum length of a string is not 2^31 because Java saves Strings in modified UTF-16 where one character in supplemental Unicode planes can take up to 6 bytes but not contain a zero-byte (which is great for C compatibility).
No String can be longer than (2^31)-1, so that makes (2^31)-1 the maximum length of a String in Java.So the maximum length of a String depends on two things:
The limit is (2^31)-1 because arrays are addressed and sized with positive integers. String is not only backed by a single array but all methods that define length (even the unicode code point versions) return an int.
"what characters do you use" and
"what architecture does your VM run on", because the memory limits of a 64-bit VM are very different (naturally) from a 32 bit VM.
The maximum length of a String may very well be 2^31 BYTES.