Class: Tokenizer::Tokenizer

Inherits:
Object
  • Object
show all
Defined in:
lib/tokenizer/tokenizer.rb

Overview

The class Tokenizer defines the tokenizer itself.

Constant Summary collapse

WL =

WL is the word limit used by the tokenizer.

/\s+/

Instance Method Summary collapse

Constructor Details

#initialize(lang = :de) ⇒ Tokenizer

Constructs a Tokenizer with specified language. Standard = :de



12
13
14
# File 'lib/tokenizer/tokenizer.rb', line 12

def initialize(lang = :de)
	@lang = lang
end

Instance Method Details

#tokenize(str) ⇒ Object

Returns the tokens contained in the given string.



16
17
18
19
20
21
# File 'lib/tokenizer/tokenizer.rb', line 16

def tokenize(str)
	
	tokens = str.split(WL)
	
	tokens
end