nb

Code Climate Build Status Gem Version

yet another Naive Bayes library with support of memory and Redis backend

Installation

Add this line to your application's Gemfile:

gem 'nb'

And then execute:

$ bundle

Or install it yourself as:

$ gem install nb

Usage

classifier = NaiveBayes::Classifier.new :love, :hate

classifier.train :love, 'I', 'love', 'you'
classifier.train :hate, 'I', 'hate', 'you'

classifier.classifications(*%w{ I love you }).should == [[:love, 0.5], [:hate, 0.25]]
classifier.classify(*%w{ I love you }).should == [:love, 0.5]
classifier.classify(*%w{ love }).should == [:love, 0.5]

Ability to view top tokens

classifier.top_tokens_of_category(:spam)

+------------+------+--------------------+
| 学生       | 1966 | 0.9995149465854383 |
| 多劳多得   | 1953 | 0.999511719439795  |
| 党         | 1517 | 0.9993714712416684 |
| 结         | 1327 | 0.9992815430836995 |
| 工资       | 1213 | 0.9992140742313297 |
| 不等       | 1135 | 0.999160108836817  |
| 诚聘       | 1107 | 0.9991388832706672 |
| 咨询       | 1095 | 0.9991294545902496 |
| 加入       | 1071 | 0.9991099639327047 |
| 限制       | 1046 | 0.9990887109454397 |
| 50         | 1041 | 0.9990843379645474 |
| 上网       | 1020 | 0.9990655037161098 |
| 流动资金   | 952  | 0.9989988208099915 |
| 曰         | 902  | 0.9989433817121107 |
| 办公室     | 861  | 0.9988931222482719 |
| 职员       | 827  | 0.9988476682254364 |
| 绝对       | 823  | 0.9988420740701035 |
+------------+------+--------------------+

Use Redis backend

classifier = Classifier.new(:spam, :ham, backend: :redis, host: 'localhost', port: 30000)

it generates 2 + N keys in redis:

127.0.0.1:30000> keys *
1) "nb:hash:tokens_count:ham"
2) "nb:hash:tokens_count:spam"
3) "nb:set:categories"
4) "nb:hash:categories_count"

Support default category

in case the probability of each category is too low:

@classifier = NaiveBayes::Classifer.new :spam, :ham
@classifier.default_category = :ham
bayes filter mark as spam: false
bayes classifications: [[:ham, 5.044818725004143e-80], [:spam, 1.938475275819746e-119]]

bayes filter mark as spam: false
bayes classifications: [[:spam, 0.0], [:ham, 0.0]]

Credits

Contributing

  1. Fork it ( https://github.com/forresty/nb/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

Changelog

0.1.1 / 2014-12-15

  • fix redis backend

0.1.0 / 2014-12-15

  • init implementation of redis backend