Hi,
I have a JSP with UTF-8 page encoding (I have in HTML head). This JSP receives form data from external website which are encoded in ISO-8859-1. I have set request.setCharacterEncoding("ISO-8859-1"); in the beginning before getting parameter. I am trying to convert these ISO-8859-1 to UTF-8 before presenting (or storing it in Database) them.
Following is my code:
Configuration:
Application server: Resin on Windows
JDK (JVM): jre1.5.0_11
JVM default charset: UTF-8
Code:
System.out.println("Default charset:"+Charset.defaultCharset());
Enumeration params = request.getParameterNames();
while(params.hasMoreElements()) { String name = (String)params.nextElement(); String value = request.getParameter(name); String value1 = new String(value.getBytes("UTF-8")); System.out.println("Name:"+name+"\t value:"+value1); %> <%=name%>
Server Console Output:
Default charset:UTF-8
Name:subject value: Vincent ‘Sonny’ Pirozzi Jr., Merrimack
Actual encoded text in originating HTML:
Vincent ‘Sonny’ Pirozzi Jr., Merrimack
Do you see anything I am doing wrong? My intention is to convert ISO-8859-1 to UTF-8 characters on JVM with UTF-8 default charset?
Discussions
Web tier: servlets, JSP, Web frameworks: 8859-1 to UTF-8 conversion and platform default charset
-
8859-1 to UTF-8 conversion and platform default charset (2 messages)
- Posted by: yogesh Gowdra
- Posted on: August 15 2007 19:37 EDT
Threaded Messages (2)
- Re: 8859-1 to UTF-8 conversion and platform default charset by java designer on August 18 2007 14:48 EDT
- Re: 8859-1 to UTF-8 conversion and platform default charset by Joe S??rensen on November 10 2008 03:37 EST
-
Re: 8859-1 to UTF-8 conversion and platform default charset[ Go to top ]
- Posted by: java designer
- Posted on: August 18 2007 14:48 EDT
- in response to yogesh Gowdra
Hi,
ISO-8859-1 is a subset of UTF-8. (meaning UTF-8 -- when using ONLY ISO-8859-1 characters -- is single byte only...so on the disk, the bytes look exactly the same whether they are stored as ISO-8859-1 or UTF-8). For this reason, you never have to "convert" arbitrary UTF-8 into ISO-8859-1 because that's impossible. And you never have to convert ISO-8859-1 to UTF-8 because ISO-8859-1 is already UTF-8 (more accurately a subset of UTF-8 but still UTF-8). Secondly, I don't know off the top of my head but are those funky open/close quotes part of ISO-8859-1 ? Thirdly, is your terminal/console capable of printing characters like those quotes ? Best regards, --j
I have a JSP with UTF-8 page encoding (I have in HTML head). This JSP receives form data from external website which are encoded in ISO-8859-1. I have set request.setCharacterEncoding("ISO-8859-1"); in the beginning before getting parameter. I am trying to convert these ISO-8859-1 to UTF-8 before presenting (or storing it in Database) them. -
Re: 8859-1 to UTF-8 conversion and platform default charset[ Go to top ]
- Posted by: Joe S??rensen
- Posted on: November 10 2008 03:37 EST
- in response to java designer
Sorry j, but you are wrong. If all your text are in english you would be right, but then all ISO-8859-X are UTF-8. There exist a lot lot 'letters' in ISO-8859-1 that are 16 bytes in UTF-8. Så you will often have to convert. --jso