Class: Linnaeus::Trainer

Inherits:
Linnaeus show all
Defined in:
lib/linnaeus/trainer.rb

Overview

Train or untrain documents from the Bayesian corpus.

lt = Linnaeus::Trainer.new(<options hash>)
lt.train 'category', 'a string of text' 
lt.train 'differentcategory', 'another string of text' 
lt.untrain 'category', 'a document we just removed'

Constructor Options

persistence_class

A class implementing persistence - the default (Linnaeus::Persistence) uses redis.

stopwords_class

A class that emits a set of stopwords. The default is Linnaeus::Stopwords

skip_stemming

Set to true to skip porter stemming.

encoding

Force text to use this character set. UTF-8 by default.

redis_host

Passed to persistence class constructor. Defaults to “127.0.0.1”

redis_port

Passed to persistence class constructor. Defaults to “6379”.

redis_db

Passed to persistence class constructor. Defaults to “0”.

redis_*

Please see Linnaeus::Persistence for the rest of the options that’re passed through directly to the Redis client connection.

Instance Method Summary collapse

Methods inherited from Linnaeus

#count_word_occurrences, #initialize

Constructor Details

This class inherits a constructor from Linnaeus

Instance Method Details

#train(categories, text) ⇒ Object

Add a document to the training corpus.

Parameters

categories

A string or array of categories

text

A string of text in this document.



34
35
36
37
38
39
40
41
42
# File 'lib/linnaeus/trainer.rb', line 34

def train(categories, text)
  categories = normalize_categories categories
  @db.add_categories(categories)

  word_occurrences = count_word_occurrences text
  categories.each do|cat|
    @db.increment_word_counts_for_category cat, word_occurrences
  end
end

#untrain(categories, text) ⇒ Object

Remove a document from the training corpus.

Parameters

categories

A string or array of categories

text

A string of text in this document.



51
52
53
54
55
56
57
58
59
# File 'lib/linnaeus/trainer.rb', line 51

def untrain(categories, text)
  categories = normalize_categories categories

  word_occurrences = count_word_occurrences text
  categories.each do|cat|
    @db.decrement_word_counts_for_category cat, word_occurrences
    @db.cleanup_empty_words_in_category cat
  end
end