Discussions

Web tier: servlets, JSP, Web frameworks: How to convert an HTML page into plain text ?

  1. Hi all,
    I need to convert an HTML page - downloaded with
    an URLConnection Object - into plain text.
    I wonder if I can do it in java, maybe using Servlet Filters,
    (filtering the response) or do I have to use complex XML/XSLT transformations ?
    Thanks
    Francesco
  2. I'm not aware of any freely available libraries providing HTML-to-plaintext conversion, but that surely doesn't mean that there aren't any out there, so keep on searching...

    Anyway, if/when you decide to implement your own conversion logic, XSL is probably out of the question, unless your HTML is actually XHTML, or converted into HTML from XML.