<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Meyer Information Management Blog&#187; tika</title>
	<atom:link href="http://mimblog.de/tag/tika/feed/" rel="self" type="application/rss+xml" />
	<link>http://mimblog.de</link>
	<description>Innovationen und Technologien im Information Management</description>
	<lastBuildDate>Thu, 25 Mar 2010 20:18:39 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Enterprise Search Study</title>
		<link>http://mimblog.de/2008/12/08/enterprise-search-study/</link>
		<comments>http://mimblog.de/2008/12/08/enterprise-search-study/#comments</comments>
		<pubDate>Mon, 08 Dec 2008 09:42:47 +0000</pubDate>
		<dc:creator>Hannes Carl Meyer</dc:creator>
				<category><![CDATA[Breakfast Links]]></category>
		<category><![CDATA[Enterprise Search]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[study]]></category>
		<category><![CDATA[tika]]></category>

		<guid isPermaLink="false">http://mimblog.de/?p=677</guid>
		<description><![CDATA[
Breakfast Links 08.12.2008
* Arnold White Study Published
Galatea published the study &#8220;Successful Enterprise Search Management&#8221; by Stephen E. Arnold (Beyond Search) and Martin White (Interview at ArnoldIT). The study is essentially about how to manage an enterprise search environnement, beginning from the selection and integration of a vendor and how a project team is setup. There [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.galatea.co.uk/index.php?page=shop.product_details&amp;flypage=shop.flypage&amp;product_id=36&amp;category_id=8&amp;manufacturer_id=0&amp;option=com_virtuemart&amp;Itemid=44"><img class="alignnone size-medium wp-image-679" title="study-enterprise-search" src="http://mimblog.de/wp-content/uploads/2008/12/study-enterprise-search.jpg" alt="" width="126" height="178" /></a></p>
<p><strong>Breakfast Links 08.12.2008</strong></p>
<p>* <a title="Arnold White Study Published" href="http://arnoldit.com/wordpress/2008/12/08/arnold-white-study-published/" target="_blank">Arnold White Study Published</a></p>
<p>Galatea published the study &#8220;<strong>Successful Enterprise Search Management</strong>&#8221; by Stephen E. Arnold (<a title="Beyond Search" href="http://arnoldit.com/wordpress" target="_blank">Beyond Search</a>) and Martin White (<a title="Search Wizards Speak Martin White" href="http://www.arnoldit.com/search-wizards-speak/martin-white.html" target="_blank">Interview</a> at ArnoldIT). The study is essentially about how <strong>to manage an enterprise search environnement</strong>, beginning from the selection and integration of a vendor and how a project team is setup. There are also some chapters about search and technology in general like content processing (text mining). <em>I don&#8217;t know yet wether it makes sense to get the study for businesses in germany but will write a report on it!</em></p>
<p>* <a title="Tika and Solr" href="http://lucene.grantingersoll.com/2008/12/06/tika-and-solr/" target="_blank">Tika and Solr</a></p>
<p>Grant Ingersoll just committed an <a title="ExtractingRequestHandler Solr Wiki" href="http://wiki.apache.org/solr/ExtractingRequestHandler" target="_blank">ExtractingRequestHandler</a> module for Solr which adds <a title="Apache Tika" href="http://lucene.apache.org/tika/" target="_blank">Tika</a> support to solr. Meaning, it is now possible to <strong>feed Solr with a variety of common document formats</strong> like Office, PDF, HTML and <a title="Tika supported file formats" href="http://lucene.apache.org/tika/formats.html" target="_blank">more</a>. <em>Awesome!</em></p>
]]></content:encoded>
			<wfw:commentRss>http://mimblog.de/2008/12/08/enterprise-search-study/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ApacheCon US 2008 Lucene (And Beyond) Sessions</title>
		<link>http://mimblog.de/2008/11/17/apachecon-us-2008-lucene-and-beyond-sessions/</link>
		<comments>http://mimblog.de/2008/11/17/apachecon-us-2008-lucene-and-beyond-sessions/#comments</comments>
		<pubDate>Mon, 17 Nov 2008 20:39:50 +0000</pubDate>
		<dc:creator>Hannes Carl Meyer</dc:creator>
				<category><![CDATA[Information Management]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[content extraction]]></category>
		<category><![CDATA[event]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[mahout]]></category>
		<category><![CDATA[tika]]></category>

		<guid isPermaLink="false">http://mimblog.de/?p=619</guid>
		<description><![CDATA[
Unfortunately I was not be able to attend at the ApacheCon US 2008 in New Orleans this year &#8211; way too far away from good ol&#8217; germany! But I reviewed the given sessions on Lucene (and Solr, Mahout, Tika) afterwards the conference to get some inspiration. Some comments on it:

* Advanced Indexing Techniques with Apache [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://mimblog.de/wp-content/uploads/2008/11/acus08basic.jpg"><img class="alignnone size-medium wp-image-620" title="acus08basic" src="http://mimblog.de/wp-content/uploads/2008/11/acus08basic-300x71.jpg" alt="" width="300" height="71" /></a></p>
<p>Unfortunately I was not be able to attend at the <a title="ApacheCon US 2008" href="http://us.apachecon.com/c/acus2008/" target="_blank">ApacheCon US 2008</a> in New Orleans this year &#8211; way too far away from good ol&#8217; germany! But I reviewed the given sessions on <strong>Lucene</strong> (and <strong>Solr</strong>, <strong>Mahout</strong>, <strong>Tika</strong>) afterwards the conference to get some inspiration. Some comments on it:</p>
<p><span id="more-619"></span></p>
<p>* <a title="Advanced Indexing Techniques with Apache Lucene" href="http://us.apachecon.com/c/acus2008/sessions/7" target="_blank">Advanced Indexing Techniques with Apache Lucene</a> (<a title="Michael Busch" href="http://us.apachecon.com/c/acus2008/speakers/44" target="_blank">Michael Busch</a>)</p>
<p>Detailled presentation about Indexing capabilities of Apache Lucene and a very interesting part on how to use <strong>Token Payloads</strong> and POS-Tagging with the new <strong>TokenStream API</strong>.</p>
<p>* <a title="Apache Solr: Out of the Box" href="http://us.apachecon.com/c/acus2008/sessions/9" target="_blank">Apache Solr: Out of the Box</a> (<a title="Chris Hostetter" href="http://people.apache.org/~hossman/" target="_blank">Chris Hostetter</a>)</p>
<p>Introduction to Solr from installation and administration (Admin Console, Luke), querying (Facets, Highlighting) and configuration (Analyzers, Multiple Indexes, Replication).</p>
<p>* <a title="Introducing Mahout: Apache Machine Learning" href="http://us.apachecon.com/c/acus2008/sessions/11" target="_blank">Introducing Mahout: Apache Machine Learning</a> (<a title="Grant Ingersoll" href="http://grantingersoll.com/" target="_blank">Grant Ingersoll</a>)</p>
<p>Already posted this session in my <a title="Intro to Mahout" href="http://mimblog.de/2008/11/10/intro-to-mahout/" target="_blank">recent Breakfast Links</a>. A nice presentation about what Machine Learning stands for and the approach of Mahout.</p>
<p>* <a title="Apache Solr: Beyond the Box" href="http://us.apachecon.com/c/acus2008/sessions/10" target="_blank">Apache Solr: Beyond the Box</a> (<a title="Chris Hostetter" href="http://people.apache.org/~hossman/" target="_blank">Chris Hostetter</a>)</p>
<p>Presentation about Solr&#8217;s history and real world examples such like Geo search.</p>
<p>* <a title="Content analysis for ECM with Apache Tika" href="http://us.apachecon.com/c/acus2008/sessions/12" target="_blank">Content analysis for ECM with Apache Tika</a> (<a title="Paolo Mottadelli" href="http://www.paolomottadelli.com/" target="_blank">Paolo Mottadelli</a>)</p>
<p>Impressive and extensive presentation about Apache Tika and its Alfresco integration for content extraction.</p>
]]></content:encoded>
			<wfw:commentRss>http://mimblog.de/2008/11/17/apachecon-us-2008-lucene-and-beyond-sessions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

