Class: Basset::NaiveBayes
- Inherits:
-
Object
- Object
- Basset::NaiveBayes
- Includes:
- YamlSerialization
- Defined in:
- lib/basset/naive_bayes.rb
Overview
A class for running Naive Bayes classification. Documents are added to the classifier. Once they are added it can be used to classify new documents.
Defined Under Namespace
Classes: FeatureCount
Instance Method Summary collapse
-
#add_document(classification, feature_vector) ⇒ Object
takes a classification which can be a string and a vector of numbered features.
-
#classify(feature_vector) ⇒ Object
returns the most likely class given a vector of features.
-
#initialize ⇒ NaiveBayes
constructor
A new instance of NaiveBayes.
Methods included from YamlSerialization
Constructor Details
#initialize ⇒ NaiveBayes
11 12 13 14 15 16 |
# File 'lib/basset/naive_bayes.rb', line 11 def initialize @number_of_documents = 0 @number_of_documents_in_class = Hash.new(0) @features = [] reset_cached_probabilities end |
Instance Method Details
#add_document(classification, feature_vector) ⇒ Object
takes a classification which can be a string and a vector of numbered features.
20 21 22 23 24 25 26 27 28 29 30 |
# File 'lib/basset/naive_bayes.rb', line 20 def add_document(classification, feature_vector) reset_cached_probabilities @number_of_documents_in_class[classification] += 1 @number_of_documents += 1 feature_vector.each do |feature| @features[feature.name] ||= FeatureCount.new @features[feature.name].add_count_for_class(feature.value, classification) end end |
#classify(feature_vector) ⇒ Object
returns the most likely class given a vector of features
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
# File 'lib/basset/naive_bayes.rb', line 33 def classify(feature_vector) class_probabilities = [] @number_of_documents_in_class.keys.each do |classification| class_probability = Math.log10(probability_of_class(classification)) feature_vector.each do |feature| class_probability += Math.log10(probability_of_feature_given_class(feature.name, classification)) * feature.value end class_probabilities << [class_probability, classification] end # this next bit picks a random item first # this covers the case that all the class probabilities are equal and we need to randomly select a class max = class_probabilities.pick_random class_probabilities.each do |cp| max = cp if cp.first > max.first end max end |