encoding non english chars with utf 8 on jsp (critical!!)

Discussions

Web tier: servlets, JSP, Web frameworks: encoding non english chars with utf 8 on jsp (critical!!)

  1. hello friends
    im having troubles with a project kept me awake for some nights now...

    simple form in a jsp page using utf-8 declerative. (running on jboss 4 which is tomcat 5)submits hebrew chars
    data is submitted to a servlet that insert the data to the DB(MySql) .
    ** till now everything is ok , i can see in the db the hebrew chars correctly!
    the servlet forward to another jsp page.
    jsp page calling to an ejb method that returns a record from the db
    ok....problems starts here:

    a. when im not using any declerative (so defaults to iso-8859-1)
    the data is gibberish - but when im using browser's view source option - the data is shown correctly in the source .

    b. when i manually set the browser's encoding to utf-8 - data is shown correctly.

    so that mean i need to use utf-8 pageEncoding ..Right ?... Right! and so i did

    i have tried every manipulation on the utf-8 : <%@pageEncoding ,<%@page ContentType , response.setChar... request.setChar.....

    it was gibberish !! and now even in the view source option it shows gibberish

    Threaded Messages (26)

  2. We have had similar problems.
    We added the following directive on each JSP page to show danish letters from the database:
    <%@ page contentType="text/html; charset=UTF-8" pageEncoding="ISO-8859-1" %>

    No use of response.setChar... request.setChar...

    Yours
    Thomas Aagaard
  3. Save JSP as UTF-8[ Go to top ]

    Hi,

    Try saving your JSP files in UTF-8 encoding instead of the default ANSI (for windows).

    tq
  4. Here some thoughts...[ Go to top ]

    I know this is an annoying problem. Internet is a global thing, involving all languages, charsets, encodings and locales etc., but in my view the matter is underrepresented and there are no clear well known strategies. Most books, articles etc. silently assume that the world consists of English-speaking people... at most including a few French, Germans etc...

    Anyway back to your problem:

    I am trying to recollect from my memory, but if these advices don't work, please let me know and I will check the details of my own solution:

    There are a few "intersections" where data is exchanged and where "implicit" encoding conversion might happen:


    1- your mySQL DB: what version is it? Starting with 4.0.18 (I think) it can store UNICODE.
    Prior to that you have to use binary columns to hold the data and prevent automatic conversion by MySQL.

    2- I assume you fetch your data from JSP using JDBC. What driver (version) do you use?

    3- the JDBC driver usually converts the data from the database automatically, depending on the default encoding of the machine it is running on.

    To force JDBC to retrieve the data as unicode please use the parameter charset=utf8 (I don't have the exact synthax now... but you can check it yourself).

    4- in the JSP page itself:

    you have 2 levels of charset processing to deal with:

    a- the java processing in the jsp-servlet itself, before data is sent back to the browser.

    Here you have to take care that teh jsp-java code is not "silently converting" your data to the default charset, which might be "iso-8859-1" or anything that is not utf-8.

    b- the 2nd level of data encoding is what is sent to the browser and how to force or help the browser to understand what "Character ENcoding" to use.

    Well here you have of course to use "<META http-equiv="Content-Type" content="text/html; charset=UTF-8">"


    This is what has worked for me:


    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
    <HTML>
    <HEAD>
    <%@ page
    language="java"
    contentType="text/html; charset=UTF-8"
    pageEncoding="UTF-8"
    %>
    <META http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <META name="GENERATOR" content="IBM WebSphere Studio">
    <TITLE>MY Page Title.....</TITLE>
    </HEAD>
    <BODY>
    <P>Bla bla bla ....</P>
    ....



    PS:
    I found in my case that

    contentType="text/html; charset=UTF-8"
    pageEncoding="UTF-8"

    is not creating the a HTML meta element (<META http-equiv="Content-Type" content="text/html; charset=UTF-8">)

    the only secure way to have that meta element in your resulting HTML code is to put in the source code of your JSP page (in the HTML source code)

    So include always this:
    <META http-equiv="Content-Type" content="text/html; charset=UTF-8">

    to force the browser to handle & display it as UTF-8.

    Most of code is tested on Tomcat 4, Websphere 5, 5.1 and JBoss 3.2.x & 4.
    -----------------------------------------------------------------------------


    So bottom line is:

    1- Judging from your description your code is correct since you are seeing your correct content in the source page.
    So the pint is whether you have this in your resulted HTML code:

    "<META http-equiv="Content-Type" content="text/html; charset=UTF-8">"

    Or check whether your HTML code is valid.


    2- if that all is not solving your problem, the other strategy is to check in all those data-intersections whether your content is not tempered with (implicit coversions by your system...).


    Hope that helps and you are back to your well deserved night sleep ;)


    Cheers,
    Dilshad

    =========================================================================

    Here are 2 test JSP pages to test whether it works:
    (this was tested on a WinXP, JBoss 3.23, MySQL 4.13. Brosers: IExplorer 6.0 SP1, and Netscape 7.2)


    1- an entry form with a textarea element where you put your Hebrew text to insert it into your database
    2- an update page Update_Db.jsp which actually collects your entered text and does the INSERT on the DB and outputs the results.

    I assume a mysql database named "unicode" and a table called kurdi ( with 3 columns: id, title_text, title_text2)


    a- Entry Form file <   -----------------------------------

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
    <HTML>
    <HEAD>
    <%@ page
    language="java"
    contentType="text/html; charset=utf-8"
    pageEncoding="utf-8"
    import="java.sql.*"
    %>
    <META http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <META name="GENERATOR" content="IBM WebSphere Studio">
    <META http-equiv="Content-Style-Type" content="text/css">

    <TITLE>Entry_Form.jsp</TITLE>
    </HEAD>
    <BODY>
    <BR>
    <FORM action="Update_Db.jsp" enctype="application/x-www-form-urlencoded" name="entryFrm" method="POST">
    <TEXTAREA rows="10" cols="40" name="title_text">

              YOUR_HEBREW_TEXT_HERE

    </TEXTAREA>
    <BR>
    <BR>
    <INPUT type="submit" name="submitBtn" value="Update">
    </FORM>

    </BODY>
    </HTML>




    b- The INSERTing JSP file <   -----------------------------------------


    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
    <HTML>
    <HEAD>
    <%@ page
    language="java"
    contentType="text/html; charset=UTF-8"
    pageEncoding="UTF-8"
    import="java.sql.*"
    %>

    <META http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <META name="GENERATOR" content="IBM WebSphere Studio">
    <META http-equiv="Content-Style-Type" content="text/css">
    <TITLE>update_table.jsp</TITLE>
    </HEAD>
    <BODY>
    <P><H3>What is the Encodings Status etc....?</H3></P>
    <HR>
    <%
    try {
    request.setCharacterEncoding("UTF8");

    out.println("<BR>*** title_text= <BR>" + request.getParameter("title_text") );
    out.println("<BR>*** request.getCharacterEncoding()= " + request.getCharacterEncoding() );
    out.println("<BR>*** request.getContentType()= " + request.getContentType() );
    out.println("<BR>*** request.getLocale()= " + request.getLocale() );
    out.flush();

    Object key = null;
    Object value = null;
    int i = 0;
    java.util.Enumeration enum = request.getAttributeNames();
    while ( enum.hasMoreElements() ) {
    key = enum.nextElement();
    value = request.getAttribute( key.toString() );
    out.println("<BR>-- key= " + key + " --- " + "value= " + value );
    i++;
    if (i > 200 ) break;
    }
    } catch ( Exception ex) {
    out.println("EXception occured: ");
    ex.printStackTrace( new java.io.PrintWriter(out));
    out.flush();

    ex.printStackTrace();
    }

    java.sql.Connection conn = null;
    try {

    Class.forName("com.mysql.jdbc.Driver").newInstance();
    conn = DriverManager.getConnection( "jdbc:mysql://localhost/unicode?user=blah&password=blah&useUnicode=true&characterEncoding=UTF-8" );
    conn = DriverManager.getConnection( //"jdbc:mysql://localhost/unicode?useUnicode=true&characterEncoding=UTF-8" );

    String sql = "SELECT * FROM kurdi";
    Statement st = conn.createStatement( );

    String sqlInsert = "INSERT INTO kurdi SET title_text= ";
    sqlInsert += "'";
    sqlInsert += request.getParameter( "title_text" );
    sqlInsert += "'";

    sqlInsert += ",";
    sqlInsert += " title_text2= ";
    sqlInsert += "'";
    sqlInsert += request.getParameter( "title_text" );
    sqlInsert += "'";

    out.println("sqlInsert= " + sqlInsert );
    out.flush();
    System.out.println("sqlInsert= " + sqlInsert);

    st.execute(sqlInsert);

    ResultSet rs = st.executeQuery(sql);
    out.newLine();

                            // And now output the current content of the table
                            // I need to see whether the INSERT was successful!

    out.println("<TABLE BORDER=\"1\"");
    while(rs.next()) {
    out.println( "<TR>");
    out.print( "<TD>" + rs.getObject(1) + "</TD>");
    out.print( "<TD>" + rs.getObject(2) + "</TD>");
    out.print( "<TD>" + rs.getObject(3) + "</TD>");
    out.println( "</TR>");
    }
    out.println("</TABLE");
    conn.close();

    System.out.println("XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX");
    } catch( Exception ex ) {
    out.println("EXception occured: ");
    ex.printStackTrace( new java.io.PrintWriter(out));
    out.flush();

    ex.printStackTrace();
    try{
    conn.close();
    } catch( Exception ex2 ) {
    out.println("EXception occured: ");
    ex.printStackTrace( new java.io.PrintWriter(out));
    out.flush();

    ex2.printStackTrace();
    }
    }

    out.flush();
    %>

    </BODY>
    </HTML>
  5. ok..[ Go to top ]

    first of all thank u for your answer Dilshad ,

    i will try to :

    1.start mysql with - --default-character-set=utf8
    2.im using the latest mysql-connector/j ,so i dont think thats the issue
    3.i will add the <%@page language.. contentType...pageEncoding... and add the meta

    but u know one thing is really bothering me this is an application that
    was originally written with MySQL--CMP--EJB--WEB
    i was asked to rewrite it with MySQL--Hibernate--EJB--Web .
    and the old version is working! with the same DB configuration and same driver,

    thank u again...
    keep watching the thread

    Alon.
  6. another night.......[ Go to top ]

    yep ...its another night and its still not working.........

    i updated to the latest MySQL 4.1
    i set all the vars to &%#$%# UTF-8
    i added the meta , the page encoding and the content type all UTF-8 in the jsp

    and g i b b e r i s h

    please ...if anyone has any idea.....

    could it be hibernate ??????
  7. Have you gotten this working? My case is that it works in WebSphere but not in Tomcat nor Jetty
  8. server configuration[ Go to top ]

    Not sure if you still have this problem, but it sees to me that the problem is that the UTF-8 encoding doesn't get applied to the query string. To solve this you should add a new flag to your server.xml of tomcat/jboss that you run your application on. The new flag (URIEncoding="UTF-8") should be added too This solution will handle any problems like this involving the query string in a GET request that needs to be UTF8.
  9. General solution?[ Go to top ]

    Salim,

    thanks for you excellent explanation. Using
    <%@ page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8" %>
    helped us.

    Of course I would prefer a configuration solution to this problem instead of "fixing" every page. I tried encoding.properties, however this didn't help.

    Do you know of an solution?

    Regards
     Andreas
  10. Hi, I have the problems as you. I'm using a Tomcat 5.0 server and MySQL version 4.x. In my server.xml I have defined following connection url within the context-tag:
    jdbc:mysql://localhost:3306/aut?autoReconnect=TRUE&amp;useUnicode=TRUE&amp;characterEncoding=ISO8859_1
    The default charset in MySQL is latin1 (ISO8859_1).
    My Locale is "danish".
    My jsp-pages are configured like this:
    ...
    <%@ page contentType="text/html; charset=UTF-8" pageEncoding="ISO-8859-1" %>
    ...
    <meta http-equiv=\"Content-Type\" content=\"text/html; charset=\"ISO-8859-1\"/>

    I have followed the advices of Thomas Aagaard. But it doesn't really seems to work.

    Is any more I can do?
    Is this a Tomcat issue?
  11. This is a way:

    String value = new String(request.getParameter("name").getBytes("ISO-8859-1"), "UTF-8");

    Form data is alway decoded as ISO-8859-1 (or so it seems), but the browser encode the data using the page encoding (that you specified with contentType attribute).

    Using request.setCharacterEncoding doesn't help. I'm using Tomcat 5.0.16 but I think this is a general issue.

    The request sent by the browser doesn't contain a "charset" information or better, you cannot assume this information to be always sent. I think Tomcat decides to ignore the issue and step over assuming an ISO-8859-1 encoding.

    There is a trick: using an hidden field with "_charset_" name, but I never tryed it out. I think it's filled with charset name by the browser (Mozilla and IE seems tu support it).

    I've read something about a web.xml "directive" for Tomcat to set the default form encoding to use... do a search... my be this help!

    Bye, Stefano.
  12. utf-8 problems[ Go to top ]

    I am also facing the same problem... it has taken me a month trying to make utf-8 display correctly. I am using OC4J and I read utf-8 characters from files. I've tried everything, all the directives and tags mentioned in these posts - but with no success. Just yesterday I gave up and tried to use JSTL to solve my problem. I've never used JSTL before but within half a hour I had the code working. My utf-8 pages displayed perfectly.

    Mike
  13. hi all,

    i have a different situation here, the japanese characters are displayed in the text box, but when an english character is added to the same and saved, it gets corrupted,

    please help me on the same.

    thank you,

    M.
  14. ignore the above, the issue was something else, i guess i was bit hasty. thanks
  15. Hi,


    We have an application developed on Webpshere 5.0.2.6 and DB2 8.1. The
    user enters the information and submits it. On retreival, they are
    shown as some junk characters. When the Encoding is changing in IE
    using View --> Encoding --> More --> Thai(Windows), the Thai
    characters are dispalyed correctly. We need to set each and every time
    we access the page. We tried using various Character sets like TIS-620,

    Windows-874 and ISO-8859-1. This did not fix the issue.


    We have enabled the Web extension and have set the
    client.encoding.override=ISO-8­859-1 in the Generic JVM parameter for
    the application server. We have tried specifying various other charsets

    for this parameter like TIS-620, Windows-874.Can any one help us in
    fixing this issue.

    Somebody said that the JDBCdriver encodes based on the machine where it is installed. Any ideas in this regard will really helpful as I am working for the past weeks with no success.


    Regards


    Uday
  16. Apache Tomcat UTF-8[ Go to top ]

    Thanks to Stefano information :)
    This work for me now :
    <%@ page language="java" contentType="text/html;UTF-8" pageEncoding="UTF-8" %>
    <html>
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    <title>Testing</title>
    </head>
    <body>
    <%
    //Some JDBC and sql statement query UTF-8 data and then ...
    String str = rs.getString("utf8_data");
    str = new String(str.getBytes("ISO-8859-1"),"UTF-8");
    %>
    <%= str %>
    </body>
    </html>
  17. Same Problem[ Go to top ]

    I have MySQL 4.1.
    The column that stores Hebrew is varchar UTF-8
    When inside a JSP I read the data from the DB and write it to a UFT-8 file, open the file in Notepad, I see Hebrew.
    When I run the JSP in Tomcat, I need to set the character encoding manually in a browser to see Hebrew.
    None of the following including all possible combinations helped:
    <meta contentType="text/html; charset=UTF-8">
    <%@ page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8" %>
    What is more interesting:
    When I don't use neither of
    <meta contentType="text/html; charset=UTF-8">
    <%@ page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8" %>
    the JSP is compiled to a servlet with
    response.setContentType("text/html");
    Then I took the source of the servlet, removed
    response.setContentType("text/html");
    and ran it as a servlet and I SEE HEBREW !
    I thought great! now I will remove this HTTP header from the original JSP and everything will work.
    The only way to remove a header from a JSP is to remove all headers:
    <%response.reset()%>
    But it didn't help! Now JSP instead of garbage (which is instead of Hebrew) shows me HTML with correct Hebrew.
    I downloaded a tool that shows the HTTP headers of a URL.
    My JSP and my ex-JSP servlet have the same headers. Still, the servlet shows Hebrew and the JSP shows HTML with Hebrew.
    I tried the opposite: I tried to use response.setContentType("text/html;charset=UTF-8"); both in the servlet
    and in JSP. Servlet shows garbage and JSP shows garbage.
    What can it be?!
    DB contains information in the correct format. It is extracted from the DB and written to a file correctly.
    Servlet (taken from the compiled JSP) sends it to client correctly when I don't set content type.
    Absolutely the same JSP is shown as garbage on the client.
    I suppose that it is not the servlet and not the JSP. It is the code that executes them....
    I keep checking but if someone knows the solution, pleeeeeeez help.
  18. RE:Same Problem[ Go to top ]

    I have MySQL 4.1.
    The column that stores Hebrew is varchar UTF-8
    When inside a JSP I read the data from the DB and write it to a UFT-8 file, open the file in Notepad, I see Hebrew.
    When I run the JSP in Tomcat, I need to set the character encoding manually in a browser to see Hebrew.
    None of the following including all possible combinations helped:


    What is more interesting:
    When I don't use neither of


    the JSP is compiled to a servlet with
    response.setContentType("text/html");
    Then I took the source of the servlet, removed
    response.setContentType("text/html");
    and ran it as a servlet and I SEE HEBREW !
    I thought great! now I will remove this HTTP header from the original JSP and everything will work.
    The only way to remove a header from a JSP is to remove all headers:

    But it didn't help! Now JSP instead of garbage (which is instead of Hebrew) shows me HTML with correct Hebrew.
    I downloaded a tool that shows the HTTP headers of a URL.
    My JSP and my ex-JSP servlet have the same headers. Still, the servlet shows Hebrew and the JSP shows HTML with Hebrew.
    I tried the opposite: I tried to use response.setContentType("text/html;charset=UTF-8"); both in the servlet
    and in JSP. Servlet shows garbage and JSP shows garbage.
    What can it be?!
    DB contains information in the correct format. It is extracted from the DB and written to a file correctly.
    Servlet (taken from the compiled JSP) sends it to client correctly when I don't set content type.
    Absolutely the same JSP is shown as garbage on the client.
    I suppose that it is not the servlet and not the JSP. It is the code that executes them....
    I keep checking but if someone knows the solution, pleeeeeeez help.
  19. RE:Same Problem[ Go to top ]

    Hi, try to move the line below to the top of you page. it might help. ">
    I have MySQL 4.1.
    The column that stores Hebrew is varchar UTF-8
    When inside a JSP I read the data from the DB and write it to a UFT-8 file, open the file in Notepad, I see Hebrew.
    When I run the JSP in Tomcat, I need to set the character encoding manually in a browser to see Hebrew.
    None of the following including all possible combinations helped:


    What is more interesting:
    When I don't use neither of


    the JSP is compiled to a servlet with
    response.setContentType("text/html");
    Then I took the source of the servlet, removed
    response.setContentType("text/html");
    and ran it as a servlet and I SEE HEBREW !
    I thought great! now I will remove this HTTP header from the original JSP and everything will work.
    The only way to remove a header from a JSP is to remove all headers:

    But it didn't help! Now JSP instead of garbage (which is instead of Hebrew) shows me HTML with correct Hebrew.
    I downloaded a tool that shows the HTTP headers of a URL.
    My JSP and my ex-JSP servlet have the same headers. Still, the servlet shows Hebrew and the JSP shows HTML with Hebrew.
    I tried the opposite: I tried to use response.setContentType("text/html;charset=UTF-8"); both in the servlet
    and in JSP. Servlet shows garbage and JSP shows garbage.
    What can it be?!
    DB contains information in the correct format. It is extracted from the DB and written to a file correctly.
    Servlet (taken from the compiled JSP) sends it to client correctly when I don't set content type.
    Absolutely the same JSP is shown as garbage on the client.
    I suppose that it is not the servlet and not the JSP. It is the code that executes them....
    I keep checking but if someone knows the solution, pleeeeeeez help.
  20. Compiler Issue[ Go to top ]

    Hi, I had the same problem: no matter what tags I had on the JSP to specify UTF-8 as the encoding: multi byte characters would not be rendered correctly (using jDeveloper / OC4J). I finally discovered that when compiling the pages, jDeveloper was passing -encoding Cp1252 to the JDK, thus instructing it to treat my JSPs as Cp1252 encoded! In jDeveloper 10.1.2 you can specify the compiler encoding under Project Properties > Compiler. Switching the compiler encoding to UTF-8 fixed the problem.
  21. Here's a solution[ Go to top ]

    In JSP/HTML, represent the non-english characters in HexaDecimal character code. i.e; &#xnnnn; This worked for me. This is how JAVA has implemented for displaying Japanese characters without specifying the font. http://java.sun.com/j2se/corejava/intl/index.jsp Here is the code that works for me.. <%@ page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%> <% String originalText = .... StringBuffer newText = new StringBuffer(); int ch = -1; for(int i=0; i < originalText.length(); i+=1) { ch = (int) (originalText.charAt(i)); newText = newText.append("&#x" + Integer.toHexString(ch) + ";"); } %> <%=newText.toString()%> Note: There could be performance problems because the code processes each character in the string, converts it in to Hex code, appending another text etc.
  22. You can use a request filter instead of pasting code into EVERY jsp and servlet. ============================== CharacterEncodingFilter.java ============================== /* @author Peter Maloney */ public class CharacterEncodingFilter implements Filter { private FilterConfig fc; public void doFilter( ServletRequest req, ServletResponse res, FilterChain chain) throws IOException, ServletException { HttpServletRequest request = (HttpServletRequest) req; HttpServletResponse response = (HttpServletResponse) res; response.setContentType("text/html; charset=UTF-8"); request.setCharacterEncoding("UTF8"); chain.doFilter(request, response); //do it again, since JSPs will set it to the default response.setContentType("text/html; charset=UTF-8"); request.setCharacterEncoding("UTF8"); } public void init(FilterConfig filterConfig) { this.fc = filterConfig; } public void destroy() { this.fc = null; } } ============================== web.xml modify this so it matches all required servlets and input forms. my servlets are all conveniently located in /servlet/ ============================== ... CharacterEncodingFilter CharacterEncodingFilter CharacterEncodingFilter /servlet/* CharacterEncodingFilter *.jsp CharacterEncodingFilter *.html CharacterEncodingFilter *.htm ...
  23. I took the same java file and updated the web.xml as below. web.xml ======= CharacterEncodingFilter com.its.struts.action.CharacterEncodingFilter CharacterEncodingFilter action CharacterEncodingFilter *.jsp CharacterEncodingFilter *.html It's working fine for me.
  24. Ultimately it worked[ Go to top ]

    Thanks a lot , this has really helped me a lot

  25. Hi, I was calling services from server side to get some contents and Setting request header worked for me to display correctly nordic languanges (swedish, danish and finnish. GetMethod method = new GetMethod(url); // Set header. method.setRequestHeader("Content-type","application/x-wwwform-urlencoded; charset=utf-8"); String response = new String(method.getResponseBodyAsString().getBytes("UTF-8")); Hope this will be helpful ! Faisal Mateen
  26. cpap machine[ Go to top ]

    The Orion Company makes some of one of the economic appliances in CPAP models. It is a "standard" machine and delivers a particular level of pressure. The impeller and powerplant system is finely tuned and pressures are generally changed by increments.

    http://help-health.com/disorders/sleep-disorders/cpap-machine/

  27. Solved URL decoding[ Go to top ]

    Use:

    String value = new String(request.getParameter("word").getBytes("ISO-8859-1"), "UTF-8");

     

    Jai ho 

    Cheere