I was working on a task to parse some of Amazon web-services. There are lots of ways to parse it Using DOM/SAX/Stax . All of them require some amount of coding. I wanted a quick fix and i finally landed on to JSoup an opensource HTML Parser ( Other html parser i like is HTMLParser) . In this article i’m going to explain how i’m going to parse DZone HTML links in java.
I’ll be retreiving description’s of all links in Dzone using the code
Note: This is not the best way to read links from Dzone ( You can use rss feed’s instead). This tutorial is to take you through css selectors for Java
All DZone pagination queries looks like this
http://www.dzone.com/links/?type=html&p=2
i used an opensource java library to parse this and extract link text description (jsoup)
Here is sample tags we have in dzone response
Fill post is on