Class: Classifier::Bayes
Instance Method Summary collapse
-
#add_category(category) ⇒ Object
(also: #append_category)
Allows you to add categories to the classifier.
-
#categories ⇒ Object
Provides a list of category names For example: b.categories => [‘This’, ‘That’, ‘the_other’].
-
#classifications(text) ⇒ Object
Returns the scores in each category the provided
text. -
#classify(text) ⇒ Object
Returns the classification of the provided
text, which is one of the categories given in the initializer. -
#initialize(*categories) ⇒ Bayes
constructor
The class can be created with one or more categories, each of which will be initialized and given a training method.
-
#method_missing(name, *args) ⇒ Object
Provides training and untraining methods for the categories specified in Bayes#new For example: b = Classifier::Bayes.new ‘This’, ‘That’, ‘the_other’ b.train_this “This text” b.train_that “That text” b.untrain_that “That text” b.train_the_other “The other text”.
-
#remove_category(category) ⇒ Object
Allows you to remove categories from the classifier.
- #respond_to_missing?(name, include_private = false) ⇒ Boolean
-
#train(category, text) ⇒ Object
Provides a general training method for all categories specified in Bayes#new For example: b = Classifier::Bayes.new ‘This’, ‘That’, ‘the_other’ b.train :this, “This text” b.train “that”, “That text” b.train “The other”, “The other text”.
-
#untrain(category, text) ⇒ Object
Provides a untraining method for all categories specified in Bayes#new Be very careful with this method.
Constructor Details
#initialize(*categories) ⇒ Bayes
The class can be created with one or more categories, each of which will be initialized and given a training method. E.g.,
b = Classifier::Bayes.new 'Interesting', 'Uninteresting', 'Spam'
18 19 20 21 22 23 24 |
# File 'lib/classifier/bayes.rb', line 18 def initialize(*categories) @categories = {} categories.each { |category| @categories[category.prepare_category_name] = {} } @total_words = 0 @category_counts = Hash.new(0) @category_word_count = Hash.new(0) end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(name, *args) ⇒ Object
Provides training and untraining methods for the categories specified in Bayes#new For example:
b = Classifier::Bayes.new 'This', 'That', 'the_other'
b.train_this "This text"
b.train_that "That text"
b.untrain_that "That text"
b.train_the_other "The other text"
115 116 117 118 119 120 121 122 123 |
# File 'lib/classifier/bayes.rb', line 115 def method_missing(name, *args) return super unless name.to_s =~ /(un)?train_(\w+)/ category = name.to_s.gsub(/(un)?train_(\w+)/, '\2').prepare_category_name raise StandardError, "No such category: #{category}" unless @categories.key?(category) method = name.to_s.start_with?('untrain_') ? :untrain : :train args.each { |text| send(method, category, text) } end |
Instance Method Details
#add_category(category) ⇒ Object Also known as: append_category
Allows you to add categories to the classifier. For example:
b.add_category "Not spam"
WARNING: Adding categories to a trained classifier will result in an undertrained category that will tend to match more criteria than the trained selective categories. In short, try to initialize your categories at initialization.
150 151 152 |
# File 'lib/classifier/bayes.rb', line 150 def add_category(category) @categories[category.prepare_category_name] = {} end |
#categories ⇒ Object
Provides a list of category names For example:
b.categories
=> ['This', 'That', 'the_other']
136 137 138 |
# File 'lib/classifier/bayes.rb', line 136 def categories @categories.keys.collect(&:to_s) end |
#classifications(text) ⇒ Object
Returns the scores in each category the provided text. E.g.,
b.classifications "I hate bad words and you"
=> {"Uninteresting"=>-12.6997928013932, "Interesting"=>-18.4206807439524}
The largest of these scores (the one closest to 0) is the one picked out by #classify
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
# File 'lib/classifier/bayes.rb', line 78 def classifications(text) words = text.word_hash.keys training_count = @category_counts.values.sum.to_f vocab_size = [@categories.values.flat_map(&:keys).uniq.size, 1].max @categories.to_h do |category, category_words| smoothed_total = ((@category_word_count[category] || 0) + vocab_size).to_f # Laplace smoothing: P(word|category) = (count + α) / (total + α * V) word_score = words.sum { |w| Math.log(((category_words[w] || 0) + 1) / smoothed_total) } prior_score = Math.log((@category_counts[category] || 0.1) / training_count) [category.to_s, word_score + prior_score] end end |
#classify(text) ⇒ Object
Returns the classification of the provided text, which is one of the categories given in the initializer. E.g.,
b.classify "I hate bad words and you"
=> 'Uninteresting'
100 101 102 103 104 105 |
# File 'lib/classifier/bayes.rb', line 100 def classify(text) best = classifications(text).min_by { |a| -a[1] } raise StandardError, 'No classifications available' unless best best.first.to_s end |
#remove_category(category) ⇒ Object
Allows you to remove categories from the classifier. For example:
b.remove_category "Spam"
WARNING: Removing categories from a trained classifier will result in the loss of all training data for that category. Make sure you really want to do this before calling this method.
165 166 167 168 169 170 171 172 173 174 |
# File 'lib/classifier/bayes.rb', line 165 def remove_category(category) category = category.prepare_category_name raise StandardError, "No such category: #{category}" unless @categories.key?(category) @total_words -= @category_word_count[category].to_i @categories.delete(category) @category_counts.delete(category) @category_word_count.delete(category) end |
#respond_to_missing?(name, include_private = false) ⇒ Boolean
126 127 128 |
# File 'lib/classifier/bayes.rb', line 126 def respond_to_missing?(name, include_private = false) !!(name.to_s =~ /(un)?train_(\w+)/) || super end |
#train(category, text) ⇒ Object
Provides a general training method for all categories specified in Bayes#new For example:
b = Classifier::Bayes.new 'This', 'That', 'the_other'
b.train :this, "This text"
b.train "that", "That text"
b.train "The other", "The other text"
34 35 36 37 38 39 40 41 42 43 |
# File 'lib/classifier/bayes.rb', line 34 def train(category, text) category = category.prepare_category_name @category_counts[category] += 1 text.word_hash.each do |word, count| @categories[category][word] ||= 0 @categories[category][word] += count @total_words += count @category_word_count[category] += count end end |
#untrain(category, text) ⇒ Object
Provides a untraining method for all categories specified in Bayes#new Be very careful with this method.
For example:
b = Classifier::Bayes.new 'This', 'That', 'the_other'
b.train :this, "This text"
b.untrain :this, "This text"
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
# File 'lib/classifier/bayes.rb', line 54 def untrain(category, text) category = category.prepare_category_name @category_counts[category] -= 1 text.word_hash.each do |word, count| next unless @total_words >= 0 orig = @categories[category][word] || 0 @categories[category][word] ||= 0 @categories[category][word] -= count if @categories[category][word] <= 0 @categories[category].delete(word) count = orig end @category_word_count[category] -= count if @category_word_count[category] >= count @total_words -= count end end |