Class: Classifier::Bayes
Instance Method Summary collapse
-
#add_category(category) ⇒ Object
(also: #append_category)
Allows you to add categories to the classifier.
-
#categories ⇒ Object
Provides a list of category names For example: b.categories => [‘This’, ‘That’, ‘the_other’].
-
#classifications(text) ⇒ Object
Returns the scores in each category the provided
text. -
#classify(text) ⇒ Object
Returns the classification of the provided
text, which is one of the categories given in the initializer. -
#initialize(lang, *categories) ⇒ Bayes
constructor
The class can be created with one or more categories, each of which will be initialized and given a training method.
-
#method_missing(name, *args) ⇒ Object
Provides training and untraining methods for the categories specified in Bayes#new For example: b = Classifier::Bayes.new ‘This’, ‘That’, ‘the_other’ b.train_this “This text” b.train_that “That text” b.untrain_that “That text” b.train_the_other “The other text”.
-
#train(category, text) ⇒ Object
Provides a general training method for all categories specified in Bayes#new For example: b = Classifier::Bayes.new ‘This’, ‘That’, ‘the_other’ b.train :this, “This text” b.train “that”, “That text” b.train “The other”, “The other text”.
-
#untrain(category, text) ⇒ Object
Provides a untraining method for all categories specified in Bayes#new Be very careful with this method.
Constructor Details
#initialize(lang, *categories) ⇒ Bayes
The class can be created with one or more categories, each of which will be initialized and given a training method. E.g.,
b = Classifier::Bayes.new 'Interesting', 'Uninteresting', 'Spam'
13 14 15 16 17 18 19 20 |
# File 'lib/classifier/bayes.rb', line 13 def initialize(lang, *categories) #@categories = Hash.new #categories.each { |category| @categories[category.prepare_category_name] = Hash.new } # RedisStore.total_words = 0 @categories = RedisStore.new lang, categories @categories.init_total @stemmer = Lingua::Stemmer.new(:language => lang.downcase) end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(name, *args) ⇒ Object
Provides training and untraining methods for the categories specified in Bayes#new For example:
b = Classifier::Bayes.new 'This', 'That', 'the_other'
b.train_this "This text"
b.train_that "That text"
b.untrain_that "That text"
b.train_the_other "The other text"
123 124 125 126 127 128 129 130 131 132 133 |
# File 'lib/classifier/bayes.rb', line 123 def method_missing(name, *args) category = name.to_s.gsub(/(un)?train_([\w]+)/, '\2').prepare_category_name # categories.has_key?(key) if @categories.names.include? category args.each { |text| eval("#{$1}train(category, text)") } elsif name.to_s =~ /(un)?train_([\w]+)/ raise StandardError, "No such category: #{category}" else super #raise StandardError, "No such method: #{name}" end end |
Instance Method Details
#add_category(category) ⇒ Object Also known as: append_category
Allows you to add categories to the classifier. For example:
b.add_category "Not spam"
WARNING: Adding categories to a trained classifier will result in an undertrained category that will tend to match more criteria than the trained selective categories. In short, try to initialize your categories at initialization.
153 154 155 |
# File 'lib/classifier/bayes.rb', line 153 def add_category(category) @categories[category.prepare_category_name] = Hash.new end |
#categories ⇒ Object
Provides a list of category names For example:
b.categories
=> ['This', 'That', 'the_other']
140 141 142 |
# File 'lib/classifier/bayes.rb', line 140 def categories # :nodoc: @categories end |
#classifications(text) ⇒ Object
Returns the scores in each category the provided text. E.g.,
b.classifications "I hate bad words and you"
=> {"Uninteresting"=>-12.6997928013932, "Interesting"=>-18.4206807439524}
The largest of these scores (the one closest to 0) is the one picked out by #classify
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
# File 'lib/classifier/bayes.rb', line 83 def classifications(text) score = Hash.new # actual categories saved in the beggining but each do |category| @categories.each do |category, category_words| score[category.to_s] = 0 # total = category_words.values.inject(0) {|sum, element| sum+element} begin total = category_words.inject(0) { |sum, element| sum + element } rescue raise "Bayes needs to be trained before trying to classify" end text.word_hash(@stemmer).each do |word, count| #s = category_words.has_key?(word) ? category_words[word] : 0.1 s = @categories.has_word?(category, word) ? @categories.get(category, word) : 0.1 score[category.to_s] += Math.log(s/total.to_f) end end return score end |
#classify(text) ⇒ Object
Returns the classification of the provided text, which is one of the categories given in the initializer. E.g.,
b.classify "I hate bad words and you"
=> 'Uninteresting'
111 112 113 |
# File 'lib/classifier/bayes.rb', line 111 def classify(text) (classifications(text).sort_by { |a| -a[1] })[0][0] end |
#train(category, text) ⇒ Object
Provides a general training method for all categories specified in Bayes#new For example:
b = Classifier::Bayes.new 'This', 'That', 'the_other'
b.train :this, "This text"
b.train "that", "That text"
b.train "The other", "The other text"
29 30 31 32 33 34 35 36 37 38 39 40 41 |
# File 'lib/classifier/bayes.rb', line 29 def train(category, text) category = category.prepare_category_name text.word_hash(@stemmer).each do |word, count| # @categories[category][word] ||= 0 @categories.init(category, word) # @categories[category][word] += count @categories.incr(category, word, count) # @total_words += count @categories.incr_total(count) end end |
#untrain(category, text) ⇒ Object
Provides a untraining method for all categories specified in Bayes#new Be very careful with this method.
For example:
b = Classifier::Bayes.new 'This', 'That', 'the_other'
b.train :this, "This text"
b.untrain :this, "This text"
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
# File 'lib/classifier/bayes.rb', line 51 def untrain(category, text) category = category.prepare_category_name text.word_hash(@stemmer).each do |word, count| # @total_words >= 0 if @categories.total_words >= 0 # orig = @categories[category][word] orig = @categories.get(category,word) # @categories[category][word] ||= 0 @categories.init(category, word) # @categories[category][word] -= count @categories.decr(category, word, count) #if @categories[category][word] <= 0 if @categories.get(category,word) <= 0 # @categories[category].delete(word) @categories.remove(category,word) count = orig end #@total_words -= count @categories.decr_total(count) end end end |