Analyzers,Tokenizers,Filters In Apache Solr

Full List Of Tokenizer

The full list of tokenizers can be found here https://cwiki.apache.org/confluence/display/solr/Tokenizers.

For a complete explanation of these 3 key term see the solr documentation Analyzers, Tokenizers, and Filters
You configure the tokenizer for a text field type in schema.xml with a <tokenizer></tokenizer>element, as a child of <analyzer></analyzer>

#1 Tokenizers

Tokenizers determines how string are broken up for indexing. An example of a tokenizer is the solr.WhitespaceTokenizerFactory. This breaks up a the text on white spaces for indexing.
Analyzers can apply to both the indexing and the query phase. This analyzer applies to both.

Applies to both indexing and querying.

&lt;?xml version='1.0' encoding='UTF-8' ?&gt;
&lt;schema name='simple' version='1.1'&gt;
&lt;fieldType name="nametext" class="solr.TextField"&gt;
  &lt;analyzer&gt;
    &lt;!-- tokenize the string on whitespace --&gt;
    &lt;tokenizer class="solr.WhitespaceTokenizerFactory"/&gt;
    &lt;filter class="solr.LowerCaseFilterFactory"/&gt;
  &lt;/analyzer&gt;
&lt;/fieldType&gt;
&lt;!-- .... other tags .... --&gt;
&lt;/schema&gt;

There are separate analyzers with different tokenizers and filters.

&lt;?xml version='1.0' encoding='UTF-8' ?&gt;
&lt;schema name='simple' version='1.1'&gt;
&lt;fieldType name="nametext" class="solr.TextField"&gt;
  &lt;!-- applies to indexing --&gt;
  &lt;analyzer type="index"&gt;
    &lt;!-- tokenize the string on whitespace --&gt;
    &lt;tokenizer class="solr.WhitespaceTokenizerFactory"/&gt;
    &lt;filter class="solr.LowerCaseFilterFactory"/&gt;
  &lt;/analyzer&gt;
  &lt;!-- applies to the query --&gt;
  &lt;analyzer type="query"&gt;
    &lt;!-- tokenize the string on whitespace --&gt;
    &lt;tokenizer class="solr.WhitespaceTokenizerFactory"/&gt;
    &lt;filter class="solr.LowerCaseFilterFactory"/&gt;
  &lt;/analyzer&gt;
&lt;/fieldType&gt;
&lt;!-- .... other tags .... --&gt;
&lt;/schema&gt;
Cookbook Category: 

Add new comment

Plain text

  • No HTML tags allowed.
  • Lines and paragraphs break automatically.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.