String#getBytes and OutputStream

Discussions

Performance and scalability: String#getBytes and OutputStream

  1. String#getBytes and OutputStream (7 messages)

    Consider the following:

        out.write( myString.getBytes() );

    This seems to be suboptimal since the String should be able to convert the bytes as it writes them into the stream. Is there any way to avoid creating this extra byte buffer (the one created by #getBytes)?

    Thanks in advance!

    R.
  2. Since strings are immutable objects, the JVM is going to duplicate the instance data (the string) whenever it's giving you a reference to something that can potentially change the contents of the string (in this case the byte array). Have you tried wrapping your ouput stream with a Writer? This would allow you to write the string directly to stream without converting it to bytes and letting the lower-level APIs handle this. I haven't tested this myself; my only concern would be that the lower-level APIs would also convert it to a byte array using the same method.
  3. It's clear why #getBytes returns a new byte array. Even more than your explanation, String doesn't even use a byte array internally. And a Writer won't help much. This won't stream the bytes directly to an InputStream.

    What you would need is something more like this:


     String:
         public void writeBytes( ByteBuffer buffer );
         public void writeBytes( ByteBuffer buffer, String charsetName );

    Or even:
     
         public void writeBytes( OutputStream out );
         public void writeBytes( OutputStream, String charsetName );

    Either of these would have worked.

    R.
  4. First, you should not use getBytes().

    Second, unless you expect the size to exceed 64K characters, you should use DataOutput.writeUTF().

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  5. You should use #writeUTF unless you are writing ascii Strings (as bytes) to a Socket expecting ascii bytes (such as a telnet host). ;)

    That's kinda why I put #getBytes in the subject line, because that is the behavior I needed -- bytes.

    But if you think about it, from a performance perspective #writeUTF (as it is implemented in the Sun JDK) is just as bad if not worse than #getBytes. It's fine for small strings but for large ones in a highly scaled application, there will be a lot of large temporary allocations. If you want unencoded bytes #getBytes will allocate an extra buffer unnecessarily. If you want machine independent UTF encoded byte, then #writeUTF will create TWO extra buffers: one for the char array and one for the byte array.

    You can imagine that if you operated on the string rather than the Stream, you could implement this much more optimally. For example, instead of:

        out.writeUTF( string );

    It were:

        string.writeUTF( out );

    Or....

        string.writeBytes( out, encoding );

    This approach seems better (the "tell don't ask" approach) because it allows you to implement these methods in a way that allows them to be completely streamed with out temporary buffers.

    R.
  6. Hi Robert,

    Going to ASCII from a String? You should be using a Writer then on top of the OutputStream with an encoding of ASCII.

    The point is that String.getBytes() and DataOutput.writeBytes() are just _wrong_ because they lose data.

    (OTOH, if you know your data is in the range 0x00-0x7F, then maybe you know what you are doing and it doesn't matter.)

    It's obvious that writeUTF isn't what you need for a telnet host, though ;-)

    Peace,

    Cameron Purdy
    Tangosol, Inc.
    Coherence: Shared Memories for J2EE Clusters
  7. Consider the following:    out.write( myString.getBytes() );This seems to be suboptimal since the String should be able to convert the bytes as it writes them into the stream. Is there any way to avoid creating this extra byte buffer (the one created by #getBytes)?Thanks in advance!R.
    Maybe when the JIT compiler converts this class to binary code, it is going to convert both functions (out.write() and myString.getBytes()) in only one inline function in binary code, so I don't think that you should concern about performance problems.


    Just my 2 cents,
    Jose Ramon Huerga
    http://www.terra.es/personal/jrhuerga
  8. Uh.....no.