Analyzers,Tokenizers,Filters In Apache Solr

Full List Of Tokenizer

The full list of tokenizers can be found here https://cwiki.apache.org/confluence/display/solr/Tokenizers.

For a complete explanation of these 3 key term see the solr documentation Analyzers, Tokenizers, and Filters
You configure the tokenizer for a text field type in schema.xml with a element, as a child of

#1 Tokenizers

Tokenizers determines how string are broken up for indexing. An example of a tokenizer is the solr.WhitespaceTokenizerFactory. This breaks up a the text on white spaces for indexing.
Analyzers can apply to both the indexing and the query phase. This analyzer applies to both.

Applies to both indexing and querying.

<?xml version='1.0' encoding='UTF-8' ?>
<schema name='simple' version='1.1'>
<fieldType name="nametext" class="solr.TextField">
  <analyzer>
    <!-- tokenize the string on whitespace -->
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>
<!-- .... other tags .... -->
</schema>

There are separate analyzers with different tokenizers and filters.

<?xml version='1.0' encoding='UTF-8' ?>
<schema name='simple' version='1.1'>
<fieldType name="nametext" class="solr.TextField">
  <!-- applies to indexing -->
  <analyzer type="index">
    <!-- tokenize the string on whitespace -->
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <!-- applies to the query -->
  <analyzer type="query">
    <!-- tokenize the string on whitespace -->
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>
<!-- .... other tags .... -->
</schema>
Cookbook Category: 

Add new comment

Plain text

  • No HTML tags allowed.
  • Lines and paragraphs break automatically.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.