Class: Tokenizer::Tokenizer
- Inherits:
-
Object
- Object
- Tokenizer::Tokenizer
- Defined in:
- lib/tokenizer/tokenizer.rb
Overview
The class Tokenizer defines the tokenizer itself.
Constant Summary collapse
- WL =
WL is the word limit used by the tokenizer.
/\s+/
Instance Method Summary collapse
-
#initialize(lang = :de) ⇒ Tokenizer
constructor
Constructs a Tokenizer with specified language.
-
#tokenize(str) ⇒ Object
Returns the tokens contained in the given string.
Constructor Details
#initialize(lang = :de) ⇒ Tokenizer
Constructs a Tokenizer with specified language. Standard = :de
12 13 14 |
# File 'lib/tokenizer/tokenizer.rb', line 12 def initialize(lang = :de) @lang = lang end |
Instance Method Details
#tokenize(str) ⇒ Object
Returns the tokens contained in the given string.
16 17 18 19 20 21 |
# File 'lib/tokenizer/tokenizer.rb', line 16 def tokenize(str) tokens = str.split(WL) tokens end |