Class: CorpusProcessor::Token

Inherits:
Object
  • Object
show all
Defined in:
lib/corpus-processor/token.rb

Overview

The internal representation of a token.

Tokens are extracted from original corpus and are defined by single words or punctuation.

They also contain a category, which is originated form the tagging in the corpus.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(word = '', category = nil) ⇒ Token

Returns a new instance of Token.

Parameters:

  • word (String) (defaults to: '')

    the word from text. It shouldn't contain spaces.

  • category (Symbol) (defaults to: nil)

    the type of the CorpusProcessor::Token. It should be a valid category from Categories.



20
21
22
23
# File 'lib/corpus-processor/token.rb', line 20

def initialize word = '', category = nil
  self.word     = word
  self.category = category
end

Instance Attribute Details

#categorySymbol

Returns the type of the CorpusProcessor::Token. It should be a valid category from Categories.

Returns:



15
16
17
# File 'lib/corpus-processor/token.rb', line 15

def category
  @category
end

#wordString

Returns the word from text. It shouldn't contain spaces.

Returns:

  • (String)

    the word from text. It shouldn't contain spaces.



11
12
13
# File 'lib/corpus-processor/token.rb', line 11

def word
  @word
end

Instance Method Details

#==(other) ⇒ Object

Determine equality of two CorpusProcessor::Tokens.

Parameters:



28
29
30
# File 'lib/corpus-processor/token.rb', line 28

def ==(other)
  word == other.word && category == other.category
end