Web tier: servlets, JSP, Web frameworks: How to retrieve data from a HTMl page with Java
- Posted by: stanimir Iakov
- Posted on: September 13 2004 13:01 EDT
I would like to write a Java application,
which retrieves data from a html web page.
For instance, imagine a stock trading web page displaying
the stock's quotes every 30 seconds.
I would like to write an engine with Java that pulls the necessary data from the web page and processes it as convinient.
Do you have any ideas or hints how this could be done ?
Do you know a technology that retrieves the necessary data only from a HTML web page and faces a Java application ?
Thank you very much.
I'm not quite sure if you'd want to do that for a real-life application, for obvious reasons - parsing of HTML and extracting data from it isn't the best way of retrieving info from third party applications, the structure of the source HTML may change, and then there are obvious questions about scalability, robustness of such an approach.
However, if you still want to go ahead, have a look here
I have done so some times by using an HTML parser written by me (that I won't disclose because it is owned by my employer); it is not difficult. You can also try to process the HTML with Tidy (http://tidy.sourceforge.net/) and generate XML from it. Also I guess there would be many HTML parsers out there.
The book Java 2 from Scratch covers exactly the sort of thing you're talking about (writing an HTML parser and building a stock tracker app.
have a look here http://www.amazon.co.uk/exec/obidos/ASIN/0789721732/qid=1095154629/ref=sr_8_xs_ap_i1_xgl/026-1876434-7394046