Class: Basset::NaiveBayes

Inherits:
Object
  • Object
show all
Includes:
YamlSerialization
Defined in:
lib/basset/naive_bayes.rb

Overview

A class for running Naive Bayes classification. Documents are added to the classifier. Once they are added it can be used to classify new documents.

Defined Under Namespace

Classes: FeatureCount

Instance Method Summary collapse

Methods included from YamlSerialization

included, #save_to_file

Constructor Details

#initializeNaiveBayes



11
12
13
14
15
16
# File 'lib/basset/naive_bayes.rb', line 11

def initialize
  @number_of_documents = 0
  @number_of_documents_in_class = Hash.new(0)
  @features = []
  reset_cached_probabilities
end

Instance Method Details

#add_document(classification, feature_vector) ⇒ Object

takes a classification which can be a string and a vector of numbered features.



20
21
22
23
24
25
26
27
28
29
30
# File 'lib/basset/naive_bayes.rb', line 20

def add_document(classification, feature_vector)
  reset_cached_probabilities

  @number_of_documents_in_class[classification] += 1
  @number_of_documents += 1
  
  feature_vector.each do |feature|
    @features[feature.name] ||= FeatureCount.new
    @features[feature.name].add_count_for_class(feature.value, classification)
  end
end

#classify(feature_vector) ⇒ Object

returns the most likely class given a vector of features



33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# File 'lib/basset/naive_bayes.rb', line 33

def classify(feature_vector)
  class_probabilities = []
  
  @number_of_documents_in_class.keys.each do |classification|
    class_probability = Math.log10(probability_of_class(classification))
    feature_vector.each do |feature|
      class_probability += Math.log10(probability_of_feature_given_class(feature.name, classification)) * feature.value
    end
    class_probabilities << [class_probability, classification]
  end
  
  # this next bit picks a random item first
  # this covers the case that all the class probabilities are equal and we need to randomly select a class
  max = class_probabilities.pick_random
  class_probabilities.each do |cp|
    max = cp if cp.first > max.first
  end
  max
end