Class: Ferret::Analysis::StandardTokenizer
- Inherits:
-
RegExpTokenizer
- Object
- TokenStream
- Tokenizer
- RegExpTokenizer
- Ferret::Analysis::StandardTokenizer
- Defined in:
- lib/ferret/analysis/standard_tokenizer.rb
Overview
The standard tokenizer is an advanced tokenizer which tokenizes morst words correctly as well as tokenizing things like email addresses, web addresses, phone numbers, etc.
Constant Summary collapse
- ALPHA =
/[[:alpha:]]+/
- APOSTROPHE =
/#{ALPHA}('#{ALPHA})+/
- ACRONYM =
/#{ALPHA}\.(#{ALPHA}\.)+/
- P =
/[_\/.,-]/
- HASDIGIT =
/\w*\d\w*/
Method Summary
Methods inherited from RegExpTokenizer
Methods inherited from Tokenizer
Methods inherited from TokenStream
Constructor Details
This class inherits a constructor from Ferret::Analysis::RegExpTokenizer