Class: Ferret::Analysis::Analyzer

Inherits:
Object
  • Object
show all
Defined in:
lib/ferret/analysis/analyzers.rb

Overview

An Analyzer builds TokenStreams, which analyze text. It thus represents a policy for extracting index terms from text.

Typical implementations first build a Tokenizer, which breaks the stream of characters from the Reader into raw Tokens. One or more TokenFilter s may then be applied to the output of the Tokenizer.

The default Analyzer just creates a LowerCaseTokenizer which converts all text to lowercase tokens. See LowerCaseTokenizer for more details.

Instance Method Summary collapse

Instance Method Details

#token_stream(field, string) ⇒ Object

Creates a TokenStream which tokenizes all the text in the provided Reader. Override to allow Analyzer to choose strategy based on document and/or field.

string

the string representing the text in the field

field

name of the field. Not required.



17
18
19
# File 'lib/ferret/analysis/analyzers.rb', line 17

def token_stream(field, string)
  return LowerCaseTokenizer.new(string)
end