Class: Ferret::Analysis::StandardTokenizer

Inherits:
RegExpTokenizer show all
Defined in:
lib/ferret/analysis/standard_tokenizer.rb

Overview

The standard tokenizer is an advanced tokenizer which tokenizes morst words correctly as well as tokenizing things like email addresses, web addresses, phone numbers, etc.

Constant Summary collapse

ALPHA =
/[[:alpha:]]+/
APOSTROPHE =
/#{ALPHA}('#{ALPHA})+/
ACRONYM =
/#{ALPHA}\.(#{ALPHA}\.)+/
P =
/[_\/.,-]/
HASDIGIT =
/\w*\d\w*/

Method Summary

Methods inherited from RegExpTokenizer

#close, #initialize, #next

Methods inherited from Tokenizer

#close

Methods inherited from TokenStream

#close, #each, #next

Constructor Details

This class inherits a constructor from Ferret::Analysis::RegExpTokenizer