<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Dario Quintana &#187; Lucene</title>
	<atom:link href="http://darioquintana.com.ar/blogging/category/lucene/feed/" rel="self" type="application/rss+xml" />
	<link>http://darioquintana.com.ar/blogging</link>
	<description>at blogging</description>
	<lastBuildDate>Mon, 30 Apr 2012 07:05:48 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Analyzers at Lucene.Net</title>
		<link>http://darioquintana.com.ar/blogging/2007/12/11/analyzers-en-lucenenet/</link>
		<comments>http://darioquintana.com.ar/blogging/2007/12/11/analyzers-en-lucenenet/#comments</comments>
		<pubDate>Tue, 11 Dec 2007 12:40:28 +0000</pubDate>
		<dc:creator>Dario</dc:creator>
				<category><![CDATA[Lucene]]></category>
		<category><![CDATA[NHibernate]]></category>

		<guid isPermaLink="false">http://darioquintana.com.ar/blogging/?p=22</guid>
		<description><![CDATA[<p>In the <a href="http://darioquintana.com.ar/blogging/?p=21">last post</a> we were watching the Lucene.Net integration with NHibernate through NHibernate Search. Now lets talking about a little more about Lucene.</p> <p>Lucene can index text from many sources: PDF, HTML, Word documents, etc; and this make it so attractive for applications to solve text-search problems. When Lucene finish the document parsing [...]]]></description>
			<content:encoded><![CDATA[<p>In the <a href="http://darioquintana.com.ar/blogging/?p=21">last post</a> we were watching the Lucene.Net integration with NHibernate through NHibernate Search. Now lets talking about a little more about Lucene.</p>
<p>Lucene can index text from many sources: PDF, HTML, Word documents, etc; and this make it so attractive for applications to solve text-search problems. When Lucene finish the document parsing from a rich media, then need to convert this stream in a plain-text token format for it can digest and thus make the context get indexed.&nbsp; The previous step to the index content is the analysis, and for this are the Analyzers. Lucene provide some classes that you can use for example: <em>WhitespaceAnalyzer</em>, this class tokenizes the text without take into account the white spaces; <em>StopAnalyzer</em> delete some English StopWords from the text in order to index it, for example: <em>the, an, a, that, this</em>, etc.</p>
<p align="center"><img src="http://uooopaa.googlepages.com/lucene-parser-analyzer.png"> </p>
<p>Using <a href="http://blog.darioquintana.com.ar/2007/12/09/nhibernate-search/">NHibernate Search</a>, we can make queries against the index that Lucene maintain, whether in Memory or on File System. This is query using NHibernate Search: </p>
</p>
<div class="wlWriterSmartContent" id="scid:57F11A72-B0E5-49c7-9094-E3A15BD5B5E6:56d5ee07-0a42-4a84-87e3-1977d3cce389" style="padding-right: 0px; display: inline; padding-left: 0px; float: none; padding-bottom: 0px; margin: 0px; padding-top: 0px">
<pre style="background-color:White;;overflow: auto;">
<div><!--

Code highlighting produced by Actipro CodeHighlighter (freeware)

http://www.CodeHighlighter.com/

--><span style="color: #000000;">QueryParser qp </span><span style="color: #000000;">=</span><span style="color: #000000;"> </span><span style="color: #0000FF;">new</span><span style="color: #000000;"> QueryParser(</span><span style="color: #800000;">&quot;</span><span style="color: #800000;">Summary</span><span style="color: #800000;">&quot;</span><span style="color: #000000;">, </span><span style="color: #0000FF;">new</span><span style="color: #000000;"> StopAnalyzer());
IQuery NHQuery </span><span style="color: #000000;">=</span><span style="color: #000000;"> s.CreateFullTextQuery(qp.Parse(</span><span style="color: #800000;">&quot;</span><span style="color: #800000;">series</span><span style="color: #800000;">&quot;</span><span style="color: #000000;">), </span><span style="color: #0000FF;">typeof</span><span style="color: #000000;">(Book));
IList result </span><span style="color: #000000;">=</span><span style="color: #000000;"> NHQuery.List();</span></div>
</pre>
<p><!-- Code inserted with Steve Dunn's Windows Live Writer Code Formatter Plugin.  http://dunnhq.com --></div>
</p>
<p><em>QueryParser</em> receives as parameter an <em>Analyzer</em>, at this case <em>StopAnalyzer</em>. Using this Analyzer, you find the search terms within the search query. This has nothing to do with the Analyzer that you configure at Lucene startup, that show the way that the token go to persist at index. This analyzer realize a filter at the query string in order to find the search-keywords.</p>
<p>To understand a little bit more about Analyzer, a made this console application based on the Lucene In Action code examples. The idea was see what output token are produced by the distinct Analyzers. Sorry the example is in Spanish, and the custom Analyzer that I made has the <a href="http://darioquintana.googlecode.com/svn/trunk/Examples/SpanishStopWordsAnalyzer/SpanishStopAnalyzer.cs">Spanish Stop Words</a>. You can checkout the example <a href="http://darioquintana.googlecode.com/svn/trunk/Examples/SpanishStopWordsAnalyzer/">here</a>.</p>
<p><img src="http://uooopaa.googlepages.com/lucene-output.png"></p>
]]></content:encoded>
			<wfw:commentRss>http://darioquintana.com.ar/blogging/2007/12/11/analyzers-en-lucenenet/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

