Class: Linnaeus

Inherits:
Object
  • Object
show all
Defined in:
lib/linnaeus.rb

Overview

The base class. You won’t use this directly - use one of the subclasses.

Direct Known Subclasses

Classifier, Persistence, Trainer

Defined Under Namespace

Classes: Classifier, Persistence, Stopwords, Trainer

Instance Method Summary collapse

Constructor Details

#initialize(opts = {}) ⇒ Linnaeus

Returns a new instance of Linnaeus.



9
10
11
12
13
14
15
16
17
18
19
20
21
# File 'lib/linnaeus.rb', line 9

def initialize(opts = {})
  options = {
    persistence_class: Persistence,
    stopwords_class: Stopwords,
    skip_stemming: false,
    encoding: 'UTF-8'
  }.merge(opts)

  @db = options[:persistence_class].new(options)
  @stopword_generator = options[:stopwords_class].new
  @skip_stemming = options[:skip_stemming]
  @encoding = options[:encoding]
end

Instance Method Details

#count_word_occurrences(text = '') ⇒ Object

Count occurences of words in a text corpus.

Parameters

text

A string representing a document. Stopwords are removed and words are stemmed using the “Stemmer” gem.



28
29
30
31
32
33
34
35
36
37
# File 'lib/linnaeus.rb', line 28

def count_word_occurrences(text = '')
  count = {}
  text.encode(@encoding).downcase.split.each do |word|
    stemmed_word = (@skip_stemming) ? word : word.stem_porter
    unless stopwords.include? stemmed_word
      count[stemmed_word] = count[stemmed_word] ? count[stemmed_word] + 1 : 1
    end
  end
  count
end