Class: Ferret::Analysis::AsciiStandardAnalyzer

Inherits:
Object
  • Object
show all
Defined in:
ext/r_analysis.c

Overview

Summary

The AsciiStandardAnalyzer is the most advanced of the available ASCII-analyzers. If it were implemented in Ruby it would look like this;

class AsciiStandardAnalyzer
  def initialize(stop_words = FULL_ENGLISH_STOP_WORDS, lower = true)
    @lower = lower
    @stop_words = stop_words
  end

  def token_stream(field, str)
    ts = AsciiStandardTokenizer.new(str)
    ts = AsciiLowerCaseFilter.new(ts) if @lower
    ts = StopFilter.new(ts, @stop_words)
    ts = HyphenFilter.new(ts)
  end
end

As you can see it makes use of the AsciiStandardTokenizer and you can also add your own list of stop-words if you wish. Note that this tokenizer won’t recognize non-ASCII characters so you should use the StandardAnalyzer is you want to analyze multi-byte data like “UTF-8”.