<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Meyer Information Management Blog&#187; content extraction</title>
	<atom:link href="http://mimblog.de/tag/content-extraction/feed/" rel="self" type="application/rss+xml" />
	<link>http://mimblog.de</link>
	<description>Innovationen und Technologien im Information Management</description>
	<lastBuildDate>Thu, 25 Mar 2010 20:18:39 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>ApacheCon US 2008 Lucene (And Beyond) Sessions</title>
		<link>http://mimblog.de/2008/11/17/apachecon-us-2008-lucene-and-beyond-sessions/</link>
		<comments>http://mimblog.de/2008/11/17/apachecon-us-2008-lucene-and-beyond-sessions/#comments</comments>
		<pubDate>Mon, 17 Nov 2008 20:39:50 +0000</pubDate>
		<dc:creator>Hannes Carl Meyer</dc:creator>
				<category><![CDATA[Information Management]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[content extraction]]></category>
		<category><![CDATA[event]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[mahout]]></category>
		<category><![CDATA[tika]]></category>

		<guid isPermaLink="false">http://mimblog.de/?p=619</guid>
		<description><![CDATA[
Unfortunately I was not be able to attend at the ApacheCon US 2008 in New Orleans this year &#8211; way too far away from good ol&#8217; germany! But I reviewed the given sessions on Lucene (and Solr, Mahout, Tika) afterwards the conference to get some inspiration. Some comments on it:

* Advanced Indexing Techniques with Apache [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://mimblog.de/wp-content/uploads/2008/11/acus08basic.jpg"><img class="alignnone size-medium wp-image-620" title="acus08basic" src="http://mimblog.de/wp-content/uploads/2008/11/acus08basic-300x71.jpg" alt="" width="300" height="71" /></a></p>
<p>Unfortunately I was not be able to attend at the <a title="ApacheCon US 2008" href="http://us.apachecon.com/c/acus2008/" target="_blank">ApacheCon US 2008</a> in New Orleans this year &#8211; way too far away from good ol&#8217; germany! But I reviewed the given sessions on <strong>Lucene</strong> (and <strong>Solr</strong>, <strong>Mahout</strong>, <strong>Tika</strong>) afterwards the conference to get some inspiration. Some comments on it:</p>
<p><span id="more-619"></span></p>
<p>* <a title="Advanced Indexing Techniques with Apache Lucene" href="http://us.apachecon.com/c/acus2008/sessions/7" target="_blank">Advanced Indexing Techniques with Apache Lucene</a> (<a title="Michael Busch" href="http://us.apachecon.com/c/acus2008/speakers/44" target="_blank">Michael Busch</a>)</p>
<p>Detailled presentation about Indexing capabilities of Apache Lucene and a very interesting part on how to use <strong>Token Payloads</strong> and POS-Tagging with the new <strong>TokenStream API</strong>.</p>
<p>* <a title="Apache Solr: Out of the Box" href="http://us.apachecon.com/c/acus2008/sessions/9" target="_blank">Apache Solr: Out of the Box</a> (<a title="Chris Hostetter" href="http://people.apache.org/~hossman/" target="_blank">Chris Hostetter</a>)</p>
<p>Introduction to Solr from installation and administration (Admin Console, Luke), querying (Facets, Highlighting) and configuration (Analyzers, Multiple Indexes, Replication).</p>
<p>* <a title="Introducing Mahout: Apache Machine Learning" href="http://us.apachecon.com/c/acus2008/sessions/11" target="_blank">Introducing Mahout: Apache Machine Learning</a> (<a title="Grant Ingersoll" href="http://grantingersoll.com/" target="_blank">Grant Ingersoll</a>)</p>
<p>Already posted this session in my <a title="Intro to Mahout" href="http://mimblog.de/2008/11/10/intro-to-mahout/" target="_blank">recent Breakfast Links</a>. A nice presentation about what Machine Learning stands for and the approach of Mahout.</p>
<p>* <a title="Apache Solr: Beyond the Box" href="http://us.apachecon.com/c/acus2008/sessions/10" target="_blank">Apache Solr: Beyond the Box</a> (<a title="Chris Hostetter" href="http://people.apache.org/~hossman/" target="_blank">Chris Hostetter</a>)</p>
<p>Presentation about Solr&#8217;s history and real world examples such like Geo search.</p>
<p>* <a title="Content analysis for ECM with Apache Tika" href="http://us.apachecon.com/c/acus2008/sessions/12" target="_blank">Content analysis for ECM with Apache Tika</a> (<a title="Paolo Mottadelli" href="http://www.paolomottadelli.com/" target="_blank">Paolo Mottadelli</a>)</p>
<p>Impressive and extensive presentation about Apache Tika and its Alfresco integration for content extraction.</p>
]]></content:encoded>
			<wfw:commentRss>http://mimblog.de/2008/11/17/apachecon-us-2008-lucene-and-beyond-sessions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

