I have developed a Servlet which reads an XML file from a remote server. The XML file contains some special chars(International chars). When I am trying to read that XML file it's throwing an exception.
I have tried with DataInputStream and also with BufferedReader with InputStreamReader
In case of DataInputStream (readUTF())it is throwing UTFDataFormatException
Incase of Reader it is throwing MalformedInputException
Can any one tell me how I can resolve this issue.
Thanks in advance
Quick question, Have you tried specifing the Character Type in the constructor of the StreamReader to denote the appropriate char set contained in the XML file?
Thanks for ur reply.
Yes, I have tried in giving UTF8 passing to IputStreamReader.
It's able to read the data till it encounters the special chars then it's throwing MalformedInputException.
When I give the URL for the XML in browser it's showing me the XML content properly.
When I read the XML content in stand alone JAVA program it's working fine even if with out specifying any char set. By default it's using Cp1252 char set.
When I specify this Cp1252 char set in the servlet it's started working fine.
But this char set works for Windows only since it's from MS.
I should make it work for UTF8.
That is strange. Have you tried running the stadalone application on say Linux and verified that you get the same Results?
Also verify how the XML is being constructed and serialized across the platforms.
I know these are not answers to your questions, but just a couple of things I'd typically do to narrow down the cause.
When I run the same standalone JAVA program in Solaris it's giving ISO8859_1 as defult eocoding. In Solaris also it's workig with out specifying any char set in the StreamReader.
The XML String stored and retrieved as String only. From DB point of view I don't see any problem as it's getting displayed properly only in browser.
This is regarding the issue I am facing while sending UTF-8 characters using GET method to a servlet directly from browser. I have done the following settings: 1. Created a CharsetFilter, which sets encoding type for each request as UTF-8 2. Applied this filter in web.xml before all the requests 3. In my servlet, while writing the response, I have set response.setContentType to text/html;charset=utf-8
For the above mentioned settings accented characters like ÀÁÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ works correctly. But Chinese characters, Arabic characters etc does not work.
How ever if along with above settings, I change the server.xml settings to have useBodyEncodingForURI="true" OR/AND URIEncoding="UTF-8" in connector tag, the Chinese & Arabic characters works fine but now accented characters do not work .
I have tried all the combination of the settings mentioned but some how only one of the above two situations work.
Has anybody come across this problem? Any pointers will be great. I can not use POST request, as My servlet is the entry point to my application.