<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Meyer Information Management Blog&#187; hadoop</title>
	<atom:link href="http://mimblog.de/tag/hadoop/feed/" rel="self" type="application/rss+xml" />
	<link>http://mimblog.de</link>
	<description>Innovationen und Technologien im Information Management</description>
	<lastBuildDate>Thu, 25 Mar 2010 20:18:39 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Berlin Buzzwords 2010 Search Store Scale</title>
		<link>http://mimblog.de/2010/02/12/berlin-buzzwords-2010-search-store-scale/</link>
		<comments>http://mimblog.de/2010/02/12/berlin-buzzwords-2010-search-store-scale/#comments</comments>
		<pubDate>Fri, 12 Feb 2010 13:02:15 +0000</pubDate>
		<dc:creator>Hannes Carl Meyer</dc:creator>
				<category><![CDATA[Events]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[nosql]]></category>

		<guid isPermaLink="false">http://mimblog.de/?p=916</guid>
		<description><![CDATA[Search Store Scale &#8211; diese drei Begriffe bringen die Veranstaltungen, die von Isabel Drost organisiert werden ganz genau auf den Punkt! Neben den regelmäßigen Hadoop Get Togethers (das nächste findet am 10. März statt) soll nun auch im Juni die Berlin Buzzwords 2010 scalability conference stattfinden.
Organisiert durch Jan Lehnardt (CouchDB), Simon Willnauer (Lucene Committer) und [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Search Store Scale</strong> &#8211; diese drei Begriffe bringen die Veranstaltungen, die von <a href="http://blog.isabel-drost.de/index.php/about" target="_blank">Isabel Drost</a> organisiert werden ganz genau auf den Punkt! Neben den regelmäßigen Hadoop Get Togethers (<a title="Apache Hadoop Get Together - March 2010 - Update" href="http://blog.isabel-drost.de/index.php/archives/149/apache-hadoop-get-together-march-2010-update" target="_blank">das nächste findet am 10. März statt</a>) soll nun auch im Juni die <a title="Berlin Buzzwords 2010" href="http://hadoopberlin.de/~events/index.html" target="_blank">Berlin Buzzwords 2010 scalability conference</a> stattfinden.</p>
<p>Organisiert durch Jan Lehnardt (CouchDB), Simon Willnauer (Lucene Committer) und Isabel Drost (Co-Founder &amp; Committer of Mahout) wird sich bei der Konferenz alles rund um die Themen NoSQL, Hadoop, Lucene und weiteres aus dem Bereich Scalability drehen. Die Namen der drei dürften übrigens jedem Bekannt sein, der die ein oder andere Mailingliste rund um Lucene abonniert hat.</p>
<p>Neuigkeiten gibt es via <a href="http://twitter.com/hadoopberlin" target="_blank">@hadoopberlin</a> sowie auf der <a href="http://hadoopberlin.de/~events/index.html">Veranstaltungswebsite</a> &#8211; dort werden übrigens auch noch &#8220;Helping Hands&#8221; gesucht!</p>
]]></content:encoded>
			<wfw:commentRss>http://mimblog.de/2010/02/12/berlin-buzzwords-2010-search-store-scale/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hadoop Summit 2009 &#8211; Remote Review</title>
		<link>http://mimblog.de/2009/06/15/hadoop-summit-2009-remote-review/</link>
		<comments>http://mimblog.de/2009/06/15/hadoop-summit-2009-remote-review/#comments</comments>
		<pubDate>Mon, 15 Jun 2009 19:59:19 +0000</pubDate>
		<dc:creator>Hannes Carl Meyer</dc:creator>
				<category><![CDATA[Events]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[machine_learning]]></category>

		<guid isPermaLink="false">http://mimblog.de/?p=838</guid>
		<description><![CDATA[Während ich noch darauf warte das die Hadoop bzw. MapReduce Welle auch nach Deutschland (in Berlin gibt es bereits schon regelmäßige Get togethers, nächste am 25. Juni) herüberschwappt schaue ich interessiert gen Übersee. In Santa Clara fand in der vergangenen Woche das Hadoop Summit 09 (hadoopsummit09) statt und ich habe versucht dieses per Twitter zu [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-851" title="3488577947_b0b5c3466f_o" src="http://mimblog.de/wp-content/uploads/2009/06/3488577947_b0b5c3466f_o.gif" alt="3488577947_b0b5c3466f_o" width="250" height="148" />Während ich noch darauf warte das die Hadoop bzw. MapReduce Welle auch nach Deutschland (in Berlin gibt es bereits schon regelmäßige Get togethers, nächste am <a title="Apache Hadoop Get Together @ Berlin " href="http://upcoming.yahoo.com/event/2488959/" target="_blank">25. Juni</a>) herüberschwappt schaue ich interessiert gen Übersee. In Santa Clara fand in der vergangenen Woche das <a title="Hadoop Summit 09" href="http://developer.yahoo.com/events/hadoopsummit09/" target="_blank">Hadoop Summit 09</a> (<strong>hadoopsummit09</strong>) statt und ich habe versucht dieses per Twitter zu verfolgen und berichte nachfolgend in kurzer knapper Twitter Manier!</p>
<p><span id="more-838"></span></p>
<ul>
<li><a title="Ganglia Monitoring System" href="http://ganglia.info/" target="_blank">Ganglia</a>, ein Monitoring-System für Server-Infrastrukturen nutzt ähnlich wie <a title="Chuckwa" href="http://wiki.apache.org/hadoop/Chukwa" target="_blank">Chukwa</a> Hadoop zur Analyse von Log-Files</li>
<li><a title="Avro" href="http://hadoop.apache.org/avro/" target="_blank">Avro</a> ist ein sehr frisches Projekt von Doug Cutting und soll als Serialisierer/Store von Daten wie bspw. Pig oder Hive dienen</li>
<li><a title="Announcing the Yahoo! Distribution of Hadoop" href="http://developer.yahoo.net/blogs/hadoop/2009/06/yahoo_distribution_of_hadoop.html" target="_blank">Yahoo!</a> wird die intern verwendete Hadoop-Distribution freigeben, mit der derzeit das größte Hadoop-Cluster der Welt betrieben wird (4.000 Nodes, 16 PB)</li>
<li>Facebook nutzt Hadoop bzw. <a title="Hive" href="http://hadoop.apache.org/hive/" target="_blank">Hive</a> zur Auswertung von Log-Files und zum Trainieren eines <a title="Random Forest" href="http://en.wikipedia.org/wiki/Random_forest" target="_blank">Random Forest</a> Kategorisierers</li>
<li><a title="EBS Datasets on ec2" href="http://www.cloudera.com/blog/2009/05/11/using-clouderas-hadoop-amis-to-process-ebs-datasets-on-ec2/" target="_blank">Cloudera bewertet</a> die Nutzung von Hadoop auf  EC2 und die Nutzung eines datasets mit EBS inkl. Tutorial &#8211; <strong>Awesome!</strong></li>
<li>Peter Skomoroch entwickelt in 1 Woche (!) ein System (<a title="Trending Topics" href="http://www.trendingtopics.org/" target="_blank">Trending Topics</a>) zur Auswertung von Trends mithilfe von Cloudera Hadoop, Hive auf EC2</li>
</ul>
<p><a title="Hadoop at #hadoopsummit09 on Twitpic" href="http://twitpic.com/763vd"><img src="http://twitpic.com/show/thumb/763vd.jpg" alt="Hadoop at #hadoopsummit09 on Twitpic" width="150" height="150" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://mimblog.de/2009/06/15/hadoop-summit-2009-remote-review/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hadoop und Cloud Computing</title>
		<link>http://mimblog.de/2009/05/07/hadoop-und-cloud-computing/</link>
		<comments>http://mimblog.de/2009/05/07/hadoop-und-cloud-computing/#comments</comments>
		<pubDate>Thu, 07 May 2009 11:38:59 +0000</pubDate>
		<dc:creator>Hannes Carl Meyer</dc:creator>
				<category><![CDATA[Computing]]></category>
		<category><![CDATA[amazon aws]]></category>
		<category><![CDATA[aws]]></category>
		<category><![CDATA[hadoop]]></category>

		<guid isPermaLink="false">http://mimblog.de/?p=836</guid>
		<description><![CDATA[Florian Leibert berichtete auf der letzten Hadoop LA meet group von seinen Erfahrungen beim Einsatz von Hadoop insbesondere bei Amazon AWS.

Big Data: On Cloud Computing and Hadoop from Roberto Monge on Vimeo.
]]></description>
			<content:encoded><![CDATA[<p>Florian Leibert berichtete auf der letzten <a title="Hadoop LA meet group" href="http://www.meetup.com/hadoopla/" target="_blank">Hadoop LA meet group</a> von seinen Erfahrungen beim Einsatz von Hadoop insbesondere bei Amazon AWS.</p>
<p><object width="400" height="225" data="http://vimeo.com/moogaloop.swf?clip_id=4211288&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" type="application/x-shockwave-flash"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://vimeo.com/moogaloop.swf?clip_id=4211288&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" /></object></p>
<p><a href="http://vimeo.com/4211288">Big Data: On Cloud Computing and Hadoop</a> from <a href="http://vimeo.com/user1600800">Roberto Monge</a> on <a href="http://vimeo.com">Vimeo</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://mimblog.de/2009/05/07/hadoop-und-cloud-computing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Interesting ApacheCon Europe 2009 Sessions</title>
		<link>http://mimblog.de/2009/01/26/interesting-apachecon-europe-2009-sessions/</link>
		<comments>http://mimblog.de/2009/01/26/interesting-apachecon-europe-2009-sessions/#comments</comments>
		<pubDate>Mon, 26 Jan 2009 05:10:56 +0000</pubDate>
		<dc:creator>Hannes Carl Meyer</dc:creator>
				<category><![CDATA[Events]]></category>
		<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[event]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://mimblog.de/?p=733</guid>
		<description><![CDATA[
The registration to ApacheCon Europe 2009 (23-27 March) in Amsterdam was opened last week. I already spent some time watching the available sessions regarding Hadoop, Lucene etc., have a look&#8230;

Hadoop Tools and Tricks for Data Processing Pipelines
Lucene Boot Camp
Solr Boot Camp
Introduction To Hadoop
Introducing Mahout: Apache Machine Learning
Hadoop Map-Reduce: Tuning and Debugging
Lucene Case Studies
Pig &#8211; Making [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone size-full wp-image-442" title="feather" src="http://mimblog.de/wp-content/uploads/2008/10/feather.jpg" alt="feather" width="285" height="86" /></p>
<p>The registration to ApacheCon Europe 2009 (23-27 March) in Amsterdam <a title="Registration is now open" href="http://us.apachecon.com/c/aceu2009/articles/registration-is-now-open/" target="_blank">was opened last week.</a> I already spent some time watching the available sessions regarding Hadoop, Lucene etc., have a look&#8230;</p>
<ul>
<li><span id="more-733"></span><a title="Hadoop Tools and Tricks for Data Processing Pipelines" href="http://us.apachecon.com/c/aceu2009/sessions/230" target="_blank">Hadoop Tools and Tricks for Data Processing Pipelines</a></li>
<li><a title="Lucene Boot Camp" href="http://us.apachecon.com/c/aceu2009/sessions/197" target="_blank">Lucene Boot Camp</a></li>
<li><a title="Solr Boot Camp" href="http://us.apachecon.com/c/aceu2009/sessions/201" target="_blank">Solr Boot Camp</a></li>
<li><a title="Introduction to Hadoop" href="http://us.apachecon.com/c/aceu2009/sessions/222" target="_blank">Introduction To Hadoop</a></li>
<li><a title="Introducing Mahout" href="http://us.apachecon.com/c/aceu2009/sessions/222" target="_blank">Introducing Mahout: Apache Machine Learning</a></li>
<li><a title="Hadoop Map-Reduce: Tuning and Debugging" href="http://us.apachecon.com/c/aceu2009/sessions/223" target="_blank">Hadoop Map-Reduce: Tuning and Debugging</a></li>
<li><a title="Lucene Case Studies" href="http://us.apachecon.com/c/aceu2009/sessions/137" target="_blank">Lucene Case Studies</a></li>
<li><a title="Title: Pig - Making Hadoop Easy" href="http://us.apachecon.com/c/aceu2009/sessions/224" target="_blank">Pig &#8211; Making Hadoop Easy</a></li>
<li><a title="Advanced Indexing Techniques with Apache Lucene" href="http://us.apachecon.com/c/aceu2009/sessions/138" target="_blank">Advanced Indexing Techniques with Apache Lucene</a></li>
<li><a title="Running Hadoop in the Cloud" href="http://us.apachecon.com/c/aceu2009/sessions/225" target="_blank">Running Hadoop in the Cloud</a></li>
<li><a title="Configuring Hadoop for Grid Services" href="http://us.apachecon.com/c/aceu2009/sessions/226" target="_blank">Configuring Hadoop for Grid Services</a></li>
<li><a title="Dynamic Hadoop Clusters" href="http://us.apachecon.com/c/aceu2009/sessions/227" target="_blank">Dynamic Hadoop Clusters</a></li>
<li><a title="Best of breed - httpd, forrest, solr and droids" href="http://us.apachecon.com/c/aceu2009/sessions/163" target="_blank">Best of breed &#8211; httpd, forrest, solr and droids</a></li>
</ul>
<p>As you can see, Apache Hadoop is playing a big role this year, and cloud computing starts to take off at Apache too. Unfortunately I didn&#8217;t find any Nutch related sessions &#8211; whats happening there?</p>
]]></content:encoded>
			<wfw:commentRss>http://mimblog.de/2009/01/26/interesting-apachecon-europe-2009-sessions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Search Team from FAST moving to Information Builders</title>
		<link>http://mimblog.de/2009/01/14/search-team-from-fast-moving-to-information-builders/</link>
		<comments>http://mimblog.de/2009/01/14/search-team-from-fast-moving-to-information-builders/#comments</comments>
		<pubDate>Wed, 14 Jan 2009 15:31:54 +0000</pubDate>
		<dc:creator>Hannes Carl Meyer</dc:creator>
				<category><![CDATA[Breakfast Links]]></category>
		<category><![CDATA[business]]></category>
		<category><![CDATA[Enterprise Search]]></category>
		<category><![CDATA[hadoop]]></category>

		<guid isPermaLink="false">http://mimblog.de/?p=716</guid>
		<description><![CDATA[
Breakfast Links 14.01.2009
* Information Builders (Schweiz) AG übernimmt spezialisiertes Search–Team von fast, A Microsoft Subsidiary
Information Builders (ch) just acquired a 2 person team from FAST to build its new search excellence team. Their current product iWay Enterprise Index is an Enterprise Search solution and being built upon GSA and Lucene. Does anyone has any details [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone size-medium wp-image-719" title="swiss-knife" src="http://mimblog.de/wp-content/uploads/2009/01/swiss-knife-300x201.png" alt="" width="300" height="201" /></p>
<p><strong>Breakfast Links 14.01.2009</strong></p>
<p>* <a title="Information Builders (Schweiz) AG übernimmt spezialisiertes Search–Team von fast, A Microsoft Subsidiary" href="http://moneycab.presscab.com/de/templates/?a=58169&amp;z=79" target="_blank">Information Builders (Schweiz) AG übernimmt spezialisiertes Search–Team von fast, A Microsoft Subsidiary</a></p>
<p>Information Builders (ch) just acquired a 2 person team from FAST to build its new <strong>search excellence</strong> team. Their current product <a title="iWay Enterprise Index" href="http://www.iwaysoftware.com/products/poweredbygoogle.html" target="_blank">iWay Enterprise Index</a> is an Enterprise Search solution and being built upon GSA and Lucene. <em>Does anyone has any details about this combination of GSA and Lucene?</em></p>
<p>* <a title="HDFS Reliability" href="http://www.cloudera.com/resources/hdfs-reliability" target="_blank">HDFS Reliability</a></p>
<p>Tom White from Cloudera (company from US with a business model around Apache Hadoop) released a paper about the reliability of <a title="HDFS" href="http://en.wikipedia.org/wiki/HDFS" target="_blank">HDFS</a> with some useful recommendations on the productive use of it.</p>
]]></content:encoded>
			<wfw:commentRss>http://mimblog.de/2009/01/14/search-team-from-fast-moving-to-information-builders/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Become a Knowledge Hero</title>
		<link>http://mimblog.de/2008/11/28/become-a-knowledge-hero/</link>
		<comments>http://mimblog.de/2008/11/28/become-a-knowledge-hero/#comments</comments>
		<pubDate>Fri, 28 Nov 2008 10:34:11 +0000</pubDate>
		<dc:creator>Hannes Carl Meyer</dc:creator>
				<category><![CDATA[Breakfast Links]]></category>
		<category><![CDATA[Enterprise Search]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[knowledge management]]></category>
		<category><![CDATA[lucene]]></category>

		<guid isPermaLink="false">http://mimblog.de/?p=660</guid>
		<description><![CDATA[
Breakfast Links 28.11.2008
* You are a knowledge worker. Become a knowledge hero.
I just discovered the belgium company Whatever which launched a software in the field of knowledge management called Knowledge Plaza. Knowledge Plaza is focused on Enterprise Social Search methods which are roughly described here by a Whatever employee. Under the hood, Knowledge Plaza uses [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://mimblog.de/wp-content/uploads/2008/11/hero-flickr-r8r.jpg"><img class="alignnone size-medium wp-image-661" title="hero-flickr-r8r" src="http://mimblog.de/wp-content/uploads/2008/11/hero-flickr-r8r-218x300.jpg" alt="" width="218" height="300" /></a></p>
<p><strong>Breakfast Links 28.11.2008</strong></p>
<p>* <a title="Knowledge Plaza" href="http://www.knowledgeplaza.be/features.html" target="_blank">You are a knowledge worker. Become a knowledge hero.</a></p>
<p>I just discovered the belgium company <a title="Whatever" href="http://www.whatever-company.com/" target="_blank">Whatever</a> which launched a software in the field of knowledge management called Knowledge Plaza. Knowledge Plaza is focused on Enterprise Social Search methods which are roughly <a title="Enterprise Social Search slideshow" href="http://raphael.slinckx.net/blog/2008-04-22/enterprise-social-search" target="_blank">described here</a> by a Whatever employee. Under the hood, Knowledge Plaza uses Apache Lucene core for Search (?) and <a title="Aperture" href="http://aperture.sourceforge.net/" target="_blank">Aperture</a> for text extraction.</p>
<p>* <a title="Hadoop Map-Reduce – Tuning and Debugging" href="http://infram.wordpress.com/2008/11/28/hadoop-map-reduce-%E2%80%93-tuning-and-debugging/" target="_blank">Hadoop Map-Reduce – Tuning and Debugging</a></p>
<p>Just stumbled upon this overview for tuning and debugging Hadoop.</p>
]]></content:encoded>
			<wfw:commentRss>http://mimblog.de/2008/11/28/become-a-knowledge-hero/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ApacheCon US 2008 Lucene (And Beyond) Sessions</title>
		<link>http://mimblog.de/2008/11/17/apachecon-us-2008-lucene-and-beyond-sessions/</link>
		<comments>http://mimblog.de/2008/11/17/apachecon-us-2008-lucene-and-beyond-sessions/#comments</comments>
		<pubDate>Mon, 17 Nov 2008 20:39:50 +0000</pubDate>
		<dc:creator>Hannes Carl Meyer</dc:creator>
				<category><![CDATA[Information Management]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[content extraction]]></category>
		<category><![CDATA[event]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[mahout]]></category>
		<category><![CDATA[tika]]></category>

		<guid isPermaLink="false">http://mimblog.de/?p=619</guid>
		<description><![CDATA[
Unfortunately I was not be able to attend at the ApacheCon US 2008 in New Orleans this year &#8211; way too far away from good ol&#8217; germany! But I reviewed the given sessions on Lucene (and Solr, Mahout, Tika) afterwards the conference to get some inspiration. Some comments on it:

* Advanced Indexing Techniques with Apache [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://mimblog.de/wp-content/uploads/2008/11/acus08basic.jpg"><img class="alignnone size-medium wp-image-620" title="acus08basic" src="http://mimblog.de/wp-content/uploads/2008/11/acus08basic-300x71.jpg" alt="" width="300" height="71" /></a></p>
<p>Unfortunately I was not be able to attend at the <a title="ApacheCon US 2008" href="http://us.apachecon.com/c/acus2008/" target="_blank">ApacheCon US 2008</a> in New Orleans this year &#8211; way too far away from good ol&#8217; germany! But I reviewed the given sessions on <strong>Lucene</strong> (and <strong>Solr</strong>, <strong>Mahout</strong>, <strong>Tika</strong>) afterwards the conference to get some inspiration. Some comments on it:</p>
<p><span id="more-619"></span></p>
<p>* <a title="Advanced Indexing Techniques with Apache Lucene" href="http://us.apachecon.com/c/acus2008/sessions/7" target="_blank">Advanced Indexing Techniques with Apache Lucene</a> (<a title="Michael Busch" href="http://us.apachecon.com/c/acus2008/speakers/44" target="_blank">Michael Busch</a>)</p>
<p>Detailled presentation about Indexing capabilities of Apache Lucene and a very interesting part on how to use <strong>Token Payloads</strong> and POS-Tagging with the new <strong>TokenStream API</strong>.</p>
<p>* <a title="Apache Solr: Out of the Box" href="http://us.apachecon.com/c/acus2008/sessions/9" target="_blank">Apache Solr: Out of the Box</a> (<a title="Chris Hostetter" href="http://people.apache.org/~hossman/" target="_blank">Chris Hostetter</a>)</p>
<p>Introduction to Solr from installation and administration (Admin Console, Luke), querying (Facets, Highlighting) and configuration (Analyzers, Multiple Indexes, Replication).</p>
<p>* <a title="Introducing Mahout: Apache Machine Learning" href="http://us.apachecon.com/c/acus2008/sessions/11" target="_blank">Introducing Mahout: Apache Machine Learning</a> (<a title="Grant Ingersoll" href="http://grantingersoll.com/" target="_blank">Grant Ingersoll</a>)</p>
<p>Already posted this session in my <a title="Intro to Mahout" href="http://mimblog.de/2008/11/10/intro-to-mahout/" target="_blank">recent Breakfast Links</a>. A nice presentation about what Machine Learning stands for and the approach of Mahout.</p>
<p>* <a title="Apache Solr: Beyond the Box" href="http://us.apachecon.com/c/acus2008/sessions/10" target="_blank">Apache Solr: Beyond the Box</a> (<a title="Chris Hostetter" href="http://people.apache.org/~hossman/" target="_blank">Chris Hostetter</a>)</p>
<p>Presentation about Solr&#8217;s history and real world examples such like Geo search.</p>
<p>* <a title="Content analysis for ECM with Apache Tika" href="http://us.apachecon.com/c/acus2008/sessions/12" target="_blank">Content analysis for ECM with Apache Tika</a> (<a title="Paolo Mottadelli" href="http://www.paolomottadelli.com/" target="_blank">Paolo Mottadelli</a>)</p>
<p>Impressive and extensive presentation about Apache Tika and its Alfresco integration for content extraction.</p>
]]></content:encoded>
			<wfw:commentRss>http://mimblog.de/2008/11/17/apachecon-us-2008-lucene-and-beyond-sessions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Another Video Search Engine from Exalead</title>
		<link>http://mimblog.de/2008/11/03/another-video-search-engine-from-exalead/</link>
		<comments>http://mimblog.de/2008/11/03/another-video-search-engine-from-exalead/#comments</comments>
		<pubDate>Mon, 03 Nov 2008 10:43:00 +0000</pubDate>
		<dc:creator>Hannes Carl Meyer</dc:creator>
				<category><![CDATA[Breakfast Links]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[mahout]]></category>
		<category><![CDATA[search engine]]></category>
		<category><![CDATA[video search]]></category>
		<category><![CDATA[voice to text]]></category>

		<guid isPermaLink="false">http://mimblog.de/?p=595</guid>
		<description><![CDATA[
Breakfast Links 03.11.2008
* Exalead-Labs: voxalead
The parisian search engine company Exalead released their video search engine voxalead. It combines Exalead&#8217;s indexing (Entity Recognition) and search technology with third partie&#8217;s Voice recognition (LIMSI). Currently they indexed mostly french and us channels and a small amount of german channels. Beside the search through the voice-to-text extracted contents you [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://mimblog.de/wp-content/uploads/2008/11/logo_voxalead_big.gif"><img class="alignnone size-medium wp-image-596" title="logo_voxalead_big" src="http://mimblog.de/wp-content/uploads/2008/11/logo_voxalead_big.gif" alt="" width="262" height="105" /></a></p>
<p><strong>Breakfast Links 03.11.2008</strong></p>
<p>* <a title="voxalead" href="http://voxalead.labs.exalead.com" target="_blank">Exalead-Labs: voxalead</a></p>
<p>The parisian search engine company <a title="Exalead" href="http://www.exalead.com/software/" target="_blank">Exalead</a> released their video search engine <a title="About Voxalead" href="http://labs.exalead.com/index.php?option=com_content&amp;view=article&amp;catid=37:features-demos&amp;id=49:tv-news-search" target="_blank">voxalead</a>. It combines Exalead&#8217;s indexing (Entity Recognition) and search technology with third partie&#8217;s Voice recognition (<a title="LIMSI" href="http://www.limsi.fr/" target="_blank">LIMSI</a>). Currently they indexed <strong>mostly french and us channels and a small amount of german channels</strong>. Beside the search through the voice-to-text extracted contents you can also view the <strong>original extracted text</strong> which shows off the quality of voxalead!</p>
<p>* <a title="The German metasearch engine “MetaGer”" href="http://altsearchengines.com/2008/11/02/the-german-metasearch-engine-metager/" target="_blank">Metager at Alt Search Engines</a></p>
<p><a title="Metager" href="http://metager.de/" target="_blank">Metager</a> now has its place at <a title="Alt Search Engines" href="http://www.altsearchengines.com/" target="_blank">AltSearchEngines.com</a> in english and german!</p>
<p>* <a title="Twenty Newsgroups Classification" href="http://cwiki.apache.org/confluence/display/MAHOUT/TwentyNewsgroups" target="_blank">Mahout: Twenty Newsgroups Classification</a></p>
<p>This weekend I stumbled upon a nice <strong>Mahout Classification example</strong>. It is all about creating a classification model based on twenty clusters of newsgroup messages. All you need is Mahout and Hadoop to get the example running in approx. 20 minutes.</p>
]]></content:encoded>
			<wfw:commentRss>http://mimblog.de/2008/11/03/another-video-search-engine-from-exalead/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Amazon AWS is Enterprise Ready</title>
		<link>http://mimblog.de/2008/10/28/amazon-aws-is-enterprise-ready/</link>
		<comments>http://mimblog.de/2008/10/28/amazon-aws-is-enterprise-ready/#comments</comments>
		<pubDate>Tue, 28 Oct 2008 13:37:55 +0000</pubDate>
		<dc:creator>Hannes Carl Meyer</dc:creator>
				<category><![CDATA[Breakfast Links]]></category>
		<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[mapreduce]]></category>

		<guid isPermaLink="false">http://mimblog.de/?p=563</guid>
		<description><![CDATA[
Breakfast Links 28.10.2008
* AWS is Enterprise Ready 
CloudAve summarizes the current state of Amazon&#8217;s Cloud Computing services in terms of prices, support and technologies. Amazon&#8217;s EC2 and S3 is a really exciting alternative (to hardware machines) in the large-scale processing field because of the support of Hadoop MapReduce! For german readers, there is also an [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://aws.amazon.com/"><img class="alignnone size-medium wp-image-566" title="logo_aws" src="http://mimblog.de/wp-content/uploads/2008/10/logo_aws.gif" alt="" width="164" height="60" /></a></p>
<p><strong>Breakfast Links 28.10.2008</strong></p>
<p>* <a title="AWS is Enterprise Ready " href="http://www.cloudave.com/link/aws-is-enterprise-ready" target="_blank">AWS is Enterprise Ready </a></p>
<p><a title="CloudAve" href="http://www.cloudave.com" target="_blank">CloudAve</a> summarizes the current state of Amazon&#8217;s <strong>Cloud Computing</strong> services in terms of prices, support and technologies. Amazon&#8217;s EC2 and S3 is a really exciting alternative (to hardware machines) in the large-scale processing field because of the support of Hadoop MapReduce! For german readers, there is also an <a title="Amazon.com beginnt Hosting von Windows und SQL Server" href="http://www.computerwoche.de/knowledge_center/it_services/1876847/" target="_blank">up to date article at Computerwoche</a>. <em>I&#8217;m currently writing an article on <a title="Mozenda" href="http://mimblog.de/2008/10/23/lucene-hadoop-zookeeper-katta/" target="_blank">Mozenda</a> and <a title="GoGrid" href="http://gogrid.com/" target="_blank">GoGrid</a> &#8211; after those I&#8217;m going to check out Amazon EC2!</em></p>
<p>* <a title="Why Defining Distance is Important" href="http://irthoughts.wordpress.com/2008/10/23/why-defining-distance-is-important/" target="_blank">Why Defining Distance Is Important</a></p>
<p>Distance (dissimilarity) of objects explained by a real case (crime) scenario. <em>I really appreciate those practical examples!</em></p>
]]></content:encoded>
			<wfw:commentRss>http://mimblog.de/2008/10/28/amazon-aws-is-enterprise-ready/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Lucene + Hadoop + Zookeeper = Katta</title>
		<link>http://mimblog.de/2008/10/23/lucene-hadoop-zookeeper-katta/</link>
		<comments>http://mimblog.de/2008/10/23/lucene-hadoop-zookeeper-katta/#comments</comments>
		<pubDate>Thu, 23 Oct 2008 08:59:24 +0000</pubDate>
		<dc:creator>Hannes Carl Meyer</dc:creator>
				<category><![CDATA[Breakfast Links]]></category>
		<category><![CDATA[data extraction]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[saas]]></category>
		<category><![CDATA[zookeeper]]></category>

		<guid isPermaLink="false">http://mimblog.de/?p=545</guid>
		<description><![CDATA[
Breakfast Links 23.10.2008
* katta and hadoop survey slides
Katta is a project by 101tec.com a consulting and software development company specialized on large-scale data processing and information management software. Katta adds grid support to Apache Lucene with a combination of Hadoop and Zookeeper (which is going to be moved into Hadoop).
* Mozenda: SaaS Data Extraction
Mozenda is [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://hadoop.apache.org/"><img class="aligncenter size-medium wp-image-544" title="hadoop-logo" src="http://mimblog.de/wp-content/uploads/2008/10/hadoop-logo.jpg" alt="" width="300" height="71" /></a></p>
<p><strong>Breakfast Links 23.10.2008</strong></p>
<p>* <a title="katta and hadoop survey slides" href="http://find23.net/2008/09/23/hadoop-user-group-slides/" target="_blank">katta and hadoop survey slides</a></p>
<p><a title="katta @ sourceforge" href="http://katta.sourceforge.net/" target="_blank">Katta</a> is a project by <a title="101tec" href="http://101tec.com" target="_blank">101tec.com</a> a consulting and software development company specialized on large-scale data processing and information management software. Katta adds grid support to Apache Lucene with a combination of Hadoop and Zookeeper (which is going to be moved into Hadoop).</p>
<p>* <a title="Mozenda: Data Extraction" href="http://arnoldit.com/wordpress/2008/10/23/mozenda-check-it-out/" target="_blank">Mozenda: SaaS Data Extraction</a></p>
<p>Mozenda is a software which is able to extract data from different sources like Websites, Databases, RSS-Feeds and <a title="Mozenda: Get Data" href="http://www.mozenda.com/get-data.php" target="_blank">more</a>. Really interesting sounds the extraction of <a title="Mozenda Examples" href="http://www.mozenda.com/mozenda-samples.php?id=0" target="_blank">content from forum and blogs</a>.</p>
<p>Wow, there is really too much stuff to actually test&#8230; <strong>Does anybody have any experiences in those projects/products</strong>?</p>
]]></content:encoded>
			<wfw:commentRss>http://mimblog.de/2008/10/23/lucene-hadoop-zookeeper-katta/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
