<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Meyer Information Management Blog&#187; extraction</title>
	<atom:link href="http://mimblog.de/tag/extraction/feed/" rel="self" type="application/rss+xml" />
	<link>http://mimblog.de</link>
	<description>Innovationen und Technologien im Information Management</description>
	<lastBuildDate>Thu, 25 Mar 2010 20:18:39 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Mozenda Web Data Extraction: Quick Review</title>
		<link>http://mimblog.de/2008/10/30/mozenda-web-data-extraction-quick-review/</link>
		<comments>http://mimblog.de/2008/10/30/mozenda-web-data-extraction-quick-review/#comments</comments>
		<pubDate>Thu, 30 Oct 2008 15:45:57 +0000</pubDate>
		<dc:creator>Hannes Carl Meyer</dc:creator>
				<category><![CDATA[Information Management]]></category>
		<category><![CDATA[extraction]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[saas]]></category>
		<category><![CDATA[scraping]]></category>

		<guid isPermaLink="false">http://mimblog.de/?p=550</guid>
		<description><![CDATA[
I already posted Mozenda in my Breakfast Links last week after BS recommended it to check it out. Since web monitoring and data extraction always was an issue for me I was really excited on it. I couldn&#8217;t wait and because of their 30-day-trial I signed up and tested it right away.

Prerequisite
After signing up on [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://mimblog.de/wp-content/uploads/2008/10/mozenda-web-agent-builder.jpg"><img class="alignnone size-medium wp-image-552" title="mozenda-web-agent-builder" src="http://mimblog.de/wp-content/uploads/2008/10/mozenda-web-agent-builder-300x240.jpg" alt="" width="180" height="144" /></a></p>
<p>I already posted <a title="Mozenda Web Data Extraction" href="http://www.mozenda.com" target="_blank">Mozenda</a> in <a title="Mozenda " href="http://mimblog.de/2008/10/23/lucene-hadoop-zookeeper-katta/" target="_blank">my Breakfast Links</a> last week after <a title="http://arnoldit.com/wordpress/2008/10/23/mozenda-check-it-out/" href="http://arnoldit.com/wordpress/2008/10/23/mozenda-check-it-out/" target="_blank">BS recommended it to check it out</a>. Since web monitoring and data extraction always was an issue for me I was really excited on it. I couldn&#8217;t wait and because of their 30-day-trial <strong>I signed up and tested it right away</strong>.</p>
<p><span id="more-550"></span></p>
<p><strong>Prerequisite<br />
</strong>After signing up on Mozenda&#8217;s website you need to download the so called &#8220;<strong>Mozenda Web Agent Builder</strong>&#8220;, which is a small .NET (so requiring .NET framework installed) application.</p>
<p><strong>Creating the Agent<br />
</strong>A common screen scraping task is to automatically <strong>retrieve search results from a search engine (in my example google)</strong>, so this is my example scraping application. My scraping agent will use google to search for the phrase &#8220;<strong>enterprise search</strong>&#8221; and grab the result&#8217;s title and url.</p>
<p><a href="http://mimblog.de/wp-content/uploads/2008/10/mozenda-google-1.jpg"><img class="alignnone size-medium wp-image-572" title="mozenda-google-1" src="http://mimblog.de/wp-content/uploads/2008/10/mozenda-google-1-300x240.jpg" alt="" width="180" height="144" /></a></p>
<p>To create a new <strong>agent </strong>you start the desktop application, enter the URL of the website you want to extract data from and klick &#8220;Start a new agent on this page&#8221;. Since data extraction from google needs a submitted form (I know you can do it by parameters but this is just for example) you define the user input and the form submission. <strong>This is all done in a visual way which is comfortable and pretty straightforward</strong>.</p>
<p><a href="http://mimblog.de/wp-content/uploads/2008/10/mozenda-google-2.jpg"><img class="alignnone size-medium wp-image-573" title="mozenda-google-2" src="http://mimblog.de/wp-content/uploads/2008/10/mozenda-google-2-300x240.jpg" alt="" width="180" height="144" /></a></p>
<p>On the page of search results you can start selecting items like the particular result&#8217;s title and url and Mozenda will automatically recognize multiple occurences (see left right corner on my screenshot) of items. If you&#8217;re done with selecting the data you want, just click save and Mozenda will <strong>save the configuration in your Mozenda Account online</strong> &#8211; from now on you can close the desktop application and login to your Mozenda Account.</p>
<p><a href="http://mimblog.de/wp-content/uploads/2008/10/mozenda-web-agents.jpg"><img class="alignnone size-medium wp-image-574" title="mozenda-web-agents" src="http://mimblog.de/wp-content/uploads/2008/10/mozenda-web-agents-300x240.jpg" alt="" width="180" height="144" /></a></p>
<p>In the view of a particular agent you can either run it immediately or schedule the agent to run periodical. It is also possible to publish the collected data for example via RSS to use it in your own application.</p>
<p><a href="http://mimblog.de/wp-content/uploads/2008/10/mozenda-schedule.jpg"><img class="alignnone size-medium wp-image-583" title="mozenda-schedule" src="http://mimblog.de/wp-content/uploads/2008/10/mozenda-schedule-300x240.jpg" alt="" width="180" height="144" /></a></p>
<p><strong>Conclusion</strong><br />
The idea to provide a web data extraction software in the SaaS way is great. The seperation of server and RCP client application for configuration of the agent is also well done. Since Mozenda is still in beta phase, there are still some bugs, for example debugging a faulty agent is difficult and ends up with a deletion and recreation of it. <em>I wouldn&#8217;t spend my money on it nor use it in a productive way during the beta phase, but lets see how it works when the beta is over!</em></p>
]]></content:encoded>
			<wfw:commentRss>http://mimblog.de/2008/10/30/mozenda-web-data-extraction-quick-review/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

