<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>neek! &#187; xml</title>
	<atom:link href="http://neek.org/tag/xml/feed" rel="self" type="application/rss+xml" />
	<link>http://neek.org</link>
	<description>Like the BBC of the SEO industry and &#34;tired to website&#34;</description>
	<lastBuildDate>Mon, 31 Jan 2011 12:30:47 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>SerpScraper 2nd attack!</title>
		<link>http://neek.org/serpscraper-2nd-attack-0009.html</link>
		<comments>http://neek.org/serpscraper-2nd-attack-0009.html#comments</comments>
		<pubDate>Tue, 15 Dec 2009 11:27:29 +0000</pubDate>
		<dc:creator>neek</dc:creator>
				<category><![CDATA[Syndk8 Tools]]></category>
		<category><![CDATA[bing]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[harvesting]]></category>
		<category><![CDATA[msn]]></category>
		<category><![CDATA[serpscraper]]></category>
		<category><![CDATA[spider]]></category>
		<category><![CDATA[syndk8]]></category>
		<category><![CDATA[url harvester]]></category>
		<category><![CDATA[xml]]></category>
		<category><![CDATA[yahoo]]></category>
		<category><![CDATA[yandex]]></category>

		<guid isPermaLink="false">http://neek.org/?p=9</guid>
		<description><![CDATA[As you may or may not know &#8211; we just released the new 100% revamped version of SerpScraper! SerpScraper is an URL harvesting tool used to scrape data (mainly urls) from searchengines like Yahoo, Google, Bing or Yandex.
What do we need URL&#8217;s for anyway, you might ask now. Mainly you would like to scrape URL&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p>As you may or may not know &#8211; we just released the new 100% revamped version of SerpScraper! SerpScraper is an URL harvesting tool used to scrape data (mainly urls) from searchengines like Yahoo, Google, Bing or Yandex.</p>
<p>What do we need URL&#8217;s for anyway, you might ask now. Mainly you would like to scrape URL&#8217;s of one type or another like for example WordPress Blogs or similar places where you can leave your backlinks. Scuttle sites, Pligg sites for AutoPligg, Bulletin Boards etc etc.</p>
<p>Now, since not everybody is after the same type of URL&#8217;s or likes to scrape from the included searchengines only, we added a nice feature which lets you create your own &#8220;Spiders&#8221; in XML easily!</p>
<p>Now this is how the Yahoo spider looks like in XML:</p>

<div class="wp_syntax"><div class="code"><pre class="xml" style="font-family:monospace;"><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;?xml</span> <span style="color: #000066;">version</span>=<span style="color: #ff0000;">&quot;1.0&quot;</span> <span style="color: #000066;">encoding</span>=<span style="color: #ff0000;">&quot;utf-8&quot;</span><span style="color: #000000; font-weight: bold;">?&gt;</span></span>
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;SpiderBase<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
 <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;Name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>Yahoo<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/Name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
 <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;SearchEngineUrl<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>http://de.search.yahoo.com/search?n=100<span style="color: #ddbb00;">&amp;amp;</span>p=<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/SearchEngineUrl<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
 <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;InfoPattern<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><span style="color: #339933;">&lt;![CDATA[&lt;h3&gt;&lt;a href=&quot;http\:\/\/.+?(?&lt;Link&gt;http%3a\/\/.+?)&quot;&gt;.+?&lt;div&gt;(?&lt;Description&gt;.+?)&lt;/div&gt;]]&gt;</span><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/InfoPattern<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
 <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;SpaceReplacement<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>+<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/SpaceReplacement<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
 <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;ReplaceNewLines<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>true<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/ReplaceNewLines<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
 <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;UrlDecode<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>true<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/UrlDecode<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/SpiderBase<span style="color: #000000; font-weight: bold;">&gt;</span></span></span></pre></div></div>

<p>We can addapt this easily to &#8211; let&#8217;s say &#8211; the amazon product search.</p>

<div class="wp_syntax"><div class="code"><pre class="xml" style="font-family:monospace;"><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;?xml</span> <span style="color: #000066;">version</span>=<span style="color: #ff0000;">&quot;1.0&quot;</span> <span style="color: #000066;">encoding</span>=<span style="color: #ff0000;">&quot;utf-8&quot;</span><span style="color: #000000; font-weight: bold;">?&gt;</span></span>
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;SpiderBase<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
 <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;Name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>AmazonProductSearch<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/Name<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
 <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;SearchEngineUrl<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>http://www.amazon.com/s/ref=nb_ss?url=search-alias%3Daps<span style="color: #ddbb00;">&amp;amp;</span>x=0<span style="color: #ddbb00;">&amp;y=0&amp;amp;</span>field-keywords=<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/SearchEngineUrl<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
 <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;InfoPattern<span style="color: #000000; font-weight: bold;">&gt;</span></span></span><span style="color: #339933;">&lt;![CDATA[productTitle&quot;&gt;&lt;a href=&quot;(?&lt;Link&gt;.+?)&quot;&gt;\s(?&lt;Description&gt;.+?)&lt;]]&gt;</span><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/InfoPattern<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
 <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;SpaceReplacement<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>+<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/SpaceReplacement<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
 <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;ReplaceNewLines<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>true<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/ReplaceNewLines<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
 <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;UrlDecode<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>true<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/UrlDecode<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/SpiderBase<span style="color: #000000; font-weight: bold;">&gt;</span></span></span></pre></div></div>

<p><img title="product-serpscraper" src="http://neek.org/wp-content/uploads/2009/12/product-serpscraper.png" alt="product-serpscraper" width="104" height="170" /></p>
<p>Enjoy!</p>
]]></content:encoded>
			<wfw:commentRss>http://neek.org/serpscraper-2nd-attack-0009.html/feed</wfw:commentRss>
		<slash:comments>140</slash:comments>
		</item>
	</channel>
</rss>

