Discussions

General J2EE: Character Set conversion problem on Unix (WebLogic 8.1)

  1. Hi,

    Not sure if this is the right place for this post - apologies if it is not.

    Am facing problems in character set conversion - I convert a String using:

    String iso = new String(utfString.getBytes("ISO-8859-1");

    This works great on our Windows server but gives (some) unrecognizable char's from the Unix box. Am working on WebLogic Workshop 8.1.

    Any help would be really appreciated!

    Thanks and regards
    Kishlay Baranwal
  2. Try using "UTF-8" instead of "ISO-8859-1". "ISO-8859-1" is ASCII, which is only a subset of UTF.
  3. Had tried that early on - but did not work for me.

    What I noticed however, was that the default encoding on Windows was Cp1252 which I was converting to ISO-8859-1, which worked. However, on Unix (Solaris 5.8, JDK1.4.1_05), the default was already ISO-8859-1 (which I was unknowingly trying to convert to ISO-8859-1 again). When I print System.out's onto the console, I get the correct characters - however, by the time the same char's get to the browser, they become unidentified 'block' char's...

    Any ideas?
  4. Are you setting the text encoding for the browser? If you are sending output from a JSP, try adding this:

    <%@ page contentType="text/html;charset=utf-8" %>

    By default, JSP tell the browser that the text it is sending is ASCII, so it could be that the server is correctly emitting UTF-8, but the browser is trying to interpret it as ASCII.

    You can do similar tricks for servlets.
  5. Thanks Paul, though I had tried that as well - the thing is that WebLogic does not allow this to take effect "easily" - as in using the page directive does not help as WebLogic Workshop overrides our JSPs with its own (JSP) layer around it. Neither does a simple HTML meta tag encoding directive work. However, I can get a reference to the "outer response" using WebLogic API and can set the content type on that. In any case, WebLogic 8.1 anyway seems to be working with UTF-8 as its default encoding. So unfortunately, the browser still did not display the characters correctly. I changed it to ISO-8859-1 as well, but that did not help.

    Have put in a fix to convert the faulty characters (working on their unicode values) into readable ones. Hope some better solution comes hopping along though!

    Thanks
  6. Weird. You have exhausted all of my ideas. The only thought I have left is that the original text files were Cp1252, and on Windows, the ISO-8859-1 "conversion" left the Cp1252 characters unchanged, so that they were still rendered correctly on Windows.

    Updating the original files is probably the safest best. I try to ensure that all files are in UTF-8 when I can (though I admit that this sometimes fails).