Archive for the ‘bing’ tag

SerpScraper 2nd attack!

As you may or may not know – we just released the new 100% revamped version of SerpScraper! SerpScraper is an URL harvesting tool used to scrape data (mainly urls) from searchengines like Yahoo, Google, Bing or Yandex.

What do we need URL’s for anyway, you might ask now. Mainly you would like to scrape URL’s of one type or another like for example WordPress Blogs or similar places where you can leave your backlinks. Scuttle sites, Pligg sites for AutoPligg, Bulletin Boards etc etc.

Now, since not everybody is after the same type of URL’s or likes to scrape from the included searchengines only, we added a nice feature which lets you create your own “Spiders” in XML easily!

Now this is how the Yahoo spider looks like in XML:

<?xml version="1.0" encoding="utf-8"?>
<SpiderBase>
 <Name>Yahoo</Name>
 <SearchEngineUrl>http://de.search.yahoo.com/search?n=100&amp;p=</SearchEngineUrl>
 <InfoPattern><![CDATA[<h3><a href="http\:\/\/.+?(?<Link>http%3a\/\/.+?)">.+?<div>(?<Description>.+?)</div>]]></InfoPattern>
 <SpaceReplacement>+</SpaceReplacement>
 <ReplaceNewLines>true</ReplaceNewLines>
 <UrlDecode>true</UrlDecode>
</SpiderBase>

We can addapt this easily to – let’s say – the amazon product search.

<?xml version="1.0" encoding="utf-8"?>
<SpiderBase>
 <Name>AmazonProductSearch</Name>
 <SearchEngineUrl>http://www.amazon.com/s/ref=nb_ss?url=search-alias%3Daps&amp;x=0&y=0&amp;field-keywords=</SearchEngineUrl>
 <InfoPattern><![CDATA[productTitle"><a href="(?<Link>.+?)">\s(?<Description>.+?)<]]></InfoPattern>
 <SpaceReplacement>+</SpaceReplacement>
 <ReplaceNewLines>true</ReplaceNewLines>
 <UrlDecode>true</UrlDecode>
</SpiderBase>

product-serpscraper

Enjoy!

Posted: December 15th, 2009
at 12:27pm by neek

Tagged with , , , , , , , , , ,


Categories: Syndk8 Tools

Comments: No comments