<?xml version="1.0" encoding="UTF-8"?>











<rss version="2.0" xmlns:jf="http://www.jivesoftware.com/xmlns/jiveforums/rss">



<channel>
    <title>Support Forums: Message List - Web-Harvest, Web extraction tool released</title>
    <link>http://www.theserverside.com</link>
    <description>Most recent forum messages</description>
    <language>en</language>
    
        <generator>Jive Forums Silver 5.5.30 (www.jivesoftware.com)</generator>
    
    <pubDate>Thu, 23 May 2013 14:40:37 -0400</pubDate>


    <item>

        <title>It not support javascript</title>
        <link>http://www.theserverside.com/discussions/thread.tss?thread_id=42021</link>

        

        
            <description><![CDATA[<p>I think it not support javascript. There are many commercial web collection tool, one choice is Fminer: <a href="http://www.fminer.com" target="_blank">web extract tool</a>, and it&nbsp;<span>present</span>&nbsp;a FREE...]]></description>
        

        <pubDate>Mon, 20 Jun 2011 21:19:44 -0400</pubDate>

        

        <jf:creationDate>Mon, 20 Jun 2011 21:19:44 -0400</jf:creationDate>
        <jf:modificationDate>Mon, 20 Jun 2011 21:19:44 -0400</jf:modificationDate>
        <jf:date>Jun 20, 2011</jf:date>
        <jf:author>lee philips</jf:author>
        <jf:replyCount>0</jf:replyCount>
    </item>


    <item>

        <title>Web scraping software</title>
        <link>http://www.theserverside.com/discussions/thread.tss?thread_id=42021</link>

        

        
            <description><![CDATA[<p>You guys might want to take a look at this <a href="http://www.heliumscraper.com" target="_blank">web scraping software</a>. Is a tool I've been working on for about a year. You might find it useful as it focus the web scraping problem from a little...]]></description>
        

        <pubDate>Tue, 19 Apr 2011 12:58:01 -0400</pubDate>

        

        <jf:creationDate>Tue, 19 Apr 2011 12:58:01 -0400</jf:creationDate>
        <jf:modificationDate>Tue, 19 Apr 2011 12:58:01 -0400</jf:modificationDate>
        <jf:date>Apr 19, 2011</jf:date>
        <jf:author>juan soldi</jf:author>
        <jf:replyCount>0</jf:replyCount>
    </item>


    <item>

        <title>good</title>
        <link>http://www.theserverside.com/discussions/thread.tss?thread_id=42021</link>

        

        
            <description><![CDATA[<a class="jive-link-external" href="http://www.dohave.com" target="_newWindow">http://www.dohave.com</a> offers data extraction service i used one time. it was amazing.]]></description>
        

        <pubDate>Mon, 09 Feb 2009 02:44:07 -0500</pubDate>

        

        <jf:creationDate>Mon, 09 Feb 2009 02:44:07 -0500</jf:creationDate>
        <jf:modificationDate>Mon, 09 Feb 2009 02:44:07 -0500</jf:modificationDate>
        <jf:date>Feb 9, 2009</jf:date>
        <jf:author>ere wer</jf:author>
        <jf:replyCount>0</jf:replyCount>
    </item>


    <item>

        <title>Re: What is it good for?</title>
        <link>http://www.theserverside.com/discussions/thread.tss?thread_id=42021</link>

        

        
            <description><![CDATA[Who is thinking about using such a tool? Actually quite a few companies who need to collect (scrape) data of the Web. It's a specialised tool that offers efficient ways of doing this....]]></description>
        

        <pubDate>Mon, 11 Sep 2006 17:07:59 -0400</pubDate>

        

        <jf:creationDate>Mon, 11 Sep 2006 17:07:59 -0400</jf:creationDate>
        <jf:modificationDate>Mon, 11 Sep 2006 17:07:59 -0400</jf:modificationDate>
        <jf:date>Sep 11, 2006</jf:date>
        <jf:author>Stephane Vaucher</jf:author>
        <jf:replyCount>0</jf:replyCount>
    </item>


    <item>

        <title>umm it seems like an example of abused xml</title>
        <link>http://www.theserverside.com/discussions/thread.tss?thread_id=42021</link>

        

        
            <description><![CDATA[We can consider to use of RSS/Atom instead if possible.
Doing web scraping need to handle tedious different thing for each web.]]></description>
        

        <pubDate>Thu, 07 Sep 2006 03:47:01 -0400</pubDate>

        

        <jf:creationDate>Thu, 07 Sep 2006 03:47:01 -0400</jf:creationDate>
        <jf:modificationDate>Thu, 07 Sep 2006 03:47:01 -0400</jf:modificationDate>
        <jf:date>Sep 7, 2006</jf:date>
        <jf:author>Sutham Rojanusorn</jf:author>
        <jf:replyCount>0</jf:replyCount>
    </item>


    <item>

        <title>Re: Nice looking project</title>
        <link>http://www.theserverside.com/discussions/thread.tss?thread_id=42021</link>

        

        
            <description><![CDATA[Hi,...]]></description>
        

        <pubDate>Wed, 06 Sep 2006 08:14:58 -0400</pubDate>

        

        <jf:creationDate>Wed, 06 Sep 2006 08:14:58 -0400</jf:creationDate>
        <jf:modificationDate>Wed, 06 Sep 2006 08:14:58 -0400</jf:modificationDate>
        <jf:date>Sep 6, 2006</jf:date>
        <jf:author>Valerio Schiavoni</jf:author>
        <jf:replyCount>0</jf:replyCount>
    </item>


    <item>

        <title>Nice looking project</title>
        <link>http://www.theserverside.com/discussions/thread.tss?thread_id=42021</link>

        

        
            <description><![CDATA[I worked for 2 years developping web scrappers, and it's certainly not a pretty job. The project seems to have the simple things down. From my experience, the hard part of web scrapping would be:...]]></description>
        

        <pubDate>Tue, 05 Sep 2006 15:21:48 -0400</pubDate>

        

        <jf:creationDate>Tue, 05 Sep 2006 15:21:48 -0400</jf:creationDate>
        <jf:modificationDate>Tue, 05 Sep 2006 15:21:48 -0400</jf:modificationDate>
        <jf:date>Sep 5, 2006</jf:date>
        <jf:author>Stephane Vaucher</jf:author>
        <jf:replyCount>1</jf:replyCount>
    </item>


    <item>

        <title>Re: What are you talking about?</title>
        <link>http://www.theserverside.com/discussions/thread.tss?thread_id=42021</link>

        

        
            <description><![CDATA[<blockquote>Web-Harvest has a BSD-license, not GPL.</blockquote>...]]></description>
        

        <pubDate>Tue, 05 Sep 2006 10:15:24 -0400</pubDate>

        

        <jf:creationDate>Tue, 05 Sep 2006 10:15:24 -0400</jf:creationDate>
        <jf:modificationDate>Tue, 05 Sep 2006 10:15:24 -0400</jf:modificationDate>
        <jf:date>Sep 5, 2006</jf:date>
        <jf:author>Jens Voss</jf:author>
        <jf:replyCount>0</jf:replyCount>
    </item>


    <item>

        <title>Re: Web-Harvest, Web extraction tool released</title>
        <link>http://www.theserverside.com/discussions/thread.tss?thread_id=42021</link>

        

        
            <description><![CDATA[I have been using a commercial tool from Kapowtech (<a class="jive-link-external" href="http://www.kapowtech.com" target="_newWindow">http://www.kapowtech.com</a>) for web scraping. It works pretty well, and has a GUI to design your scrapers. The GUI...]]></description>
        

        <pubDate>Tue, 05 Sep 2006 09:05:20 -0400</pubDate>

        

        <jf:creationDate>Tue, 05 Sep 2006 09:05:20 -0400</jf:creationDate>
        <jf:modificationDate>Tue, 05 Sep 2006 09:05:20 -0400</jf:modificationDate>
        <jf:date>Sep 5, 2006</jf:date>
        <jf:author>Tero Vaananen</jf:author>
        <jf:replyCount>0</jf:replyCount>
    </item>


    <item>

        <title>What are you talking about?</title>
        <link>http://www.theserverside.com/discussions/thread.tss?thread_id=42021</link>

        

        
            <description><![CDATA[Web-Harvest has a BSD-license, not GPL.]]></description>
        

        <pubDate>Mon, 04 Sep 2006 13:52:33 -0400</pubDate>

        

        <jf:creationDate>Mon, 04 Sep 2006 13:52:33 -0400</jf:creationDate>
        <jf:modificationDate>Mon, 04 Sep 2006 13:52:33 -0400</jf:modificationDate>
        <jf:date>Sep 4, 2006</jf:date>
        <jf:author>Kai Virkki</jf:author>
        <jf:replyCount>1</jf:replyCount>
    </item>


    <item>

        <title>What is it good for?</title>
        <link>http://www.theserverside.com/discussions/thread.tss?thread_id=42021</link>

        

        
            <description><![CDATA[I agree that GPL is not suitable for a tool like this. It might be the right choice for something like JBoss or this kind of software. But I think this is a tool which is meant to be included in some real application. Therefore the GPL or LPGL is a show...]]></description>
        

        <pubDate>Mon, 04 Sep 2006 12:35:17 -0400</pubDate>

        

        <jf:creationDate>Mon, 04 Sep 2006 12:35:17 -0400</jf:creationDate>
        <jf:modificationDate>Mon, 04 Sep 2006 12:35:17 -0400</jf:modificationDate>
        <jf:date>Sep 4, 2006</jf:date>
        <jf:author>Andreas Mecky</jf:author>
        <jf:replyCount>2</jf:replyCount>
    </item>


    <item>

        <title>Re: The second of course</title>
        <link>http://www.theserverside.com/discussions/thread.tss?thread_id=42021</link>

        

        
            <description><![CDATA[Yep!

Too bad I can't use it for my next project...

Seems usefull for personal project anyway.

Christian
<a class="jive-link-external" href="http://www.intelli-core.com" target="_newWindow">http://www.intelli-core.com</a>]]></description>
        

        <pubDate>Mon, 04 Sep 2006 12:31:32 -0400</pubDate>

        

        <jf:creationDate>Mon, 04 Sep 2006 12:31:32 -0400</jf:creationDate>
        <jf:modificationDate>Mon, 04 Sep 2006 12:31:32 -0400</jf:modificationDate>
        <jf:date>Sep 4, 2006</jf:date>
        <jf:author>Albert Albert</jf:author>
        <jf:replyCount>0</jf:replyCount>
    </item>


    <item>

        <title>The second of course</title>
        <link>http://www.theserverside.com/discussions/thread.tss?thread_id=42021</link>

        

        
            <description><![CDATA[<blockquote>are you looking at the tool as a user or as a producer of another product that might re-use it?</blockquote>...]]></description>
        

        <pubDate>Mon, 04 Sep 2006 12:07:22 -0400</pubDate>

        

        <jf:creationDate>Mon, 04 Sep 2006 12:07:22 -0400</jf:creationDate>
        <jf:modificationDate>Mon, 04 Sep 2006 12:07:22 -0400</jf:modificationDate>
        <jf:date>Sep 4, 2006</jf:date>
        <jf:author>Kurt De Grave</jf:author>
        <jf:replyCount>1</jf:replyCount>
    </item>


    <item>

        <title>Re: Web-Harvest, Web extraction tool released</title>
        <link>http://www.theserverside.com/discussions/thread.tss?thread_id=42021</link>

        

        
            <description><![CDATA[Jens, I'm curious about your response - are you looking at the tool as a user or as a producer of another product that might re-use it?  Why is the license so important to your specific case?

Floyd]]></description>
        

        <pubDate>Mon, 04 Sep 2006 11:02:55 -0400</pubDate>

        

        <jf:creationDate>Mon, 04 Sep 2006 11:02:55 -0400</jf:creationDate>
        <jf:modificationDate>Mon, 04 Sep 2006 11:02:55 -0400</jf:modificationDate>
        <jf:date>Sep 4, 2006</jf:date>
        <jf:author>Floyd Marinescu</jf:author>
        <jf:replyCount>2</jf:replyCount>
    </item>


    <item>

        <title>Re: Web-Harvest, Web extraction tool released</title>
        <link>http://www.theserverside.com/discussions/thread.tss?thread_id=42021</link>

        

        
            <description><![CDATA[Looks kind of cool ... oops, it's GPL!

Too bad, have to keep looking!

Regards,
Jens]]></description>
        

        <pubDate>Mon, 04 Sep 2006 10:24:41 -0400</pubDate>

        

        <jf:creationDate>Mon, 04 Sep 2006 10:24:41 -0400</jf:creationDate>
        <jf:modificationDate>Mon, 04 Sep 2006 10:24:41 -0400</jf:modificationDate>
        <jf:date>Sep 4, 2006</jf:date>
        <jf:author>Jens Voss</jf:author>
        <jf:replyCount>6</jf:replyCount>
    </item>



</channel>
</rss>

