Class: Basset::Classifier
- Inherits:
-
Object
- Object
- Basset::Classifier
- Includes:
- YamlSerialization
- Defined in:
- lib/basset/classifier.rb
Overview
Classifier wraps up all of the operations spread between Document and friends, FeatureExtractor, FeatureSelector, and specific classifiers such as NaiveBayes into one convenient interface.
Direct Known Subclasses
Constant Summary collapse
- DEFAULTS =
{:type => "naive_bayes", :doctype => "document"}
Instance Attribute Summary collapse
-
#doctype ⇒ Object
readonly
Returns the value of attribute doctype.
-
#engine ⇒ Object
readonly
Returns the value of attribute engine.
Instance Method Summary collapse
- #==(other) ⇒ Object
-
#classify(text) ⇒ Object
Classifies text based on training.
-
#initialize(opts = {}) ⇒ Classifier
constructor
Create a new classifier object.
-
#similarity_score(classification, text) ⇒ Object
Gives a numeric value for the similarity of text to previously seen texts of class classification.
-
#train(classification, *texts) ⇒ Object
Trains the classifier with texts of class classification.
-
#train_iterative(classification, text) ⇒ Object
Trains the classifier on a text repeatedly until the classifier recognizes it as being in class classification (up to a maximum of 5 retrainings).
Methods included from YamlSerialization
Constructor Details
#initialize(opts = {}) ⇒ Classifier
Create a new classifier object. You can specify the type of classifier and kind of documents with the options. The defaults are :type => :naive_bayes, :doctype => :document; There is also a uri_document,ie. opts: {:type => :naive_bayes, :doctype => :uri_document }
22 23 24 25 |
# File 'lib/basset/classifier.rb', line 22 def initialize(opts={}) @engine = constanize_opt(opts[:type] || DEFAULTS[:type]).new @doctype = constanize_opt(opts[:doctype] || DEFAULTS[:doctype]) end |
Instance Attribute Details
#doctype ⇒ Object (readonly)
Returns the value of attribute doctype.
15 16 17 |
# File 'lib/basset/classifier.rb', line 15 def doctype @doctype end |
#engine ⇒ Object (readonly)
Returns the value of attribute engine.
15 16 17 |
# File 'lib/basset/classifier.rb', line 15 def engine @engine end |
Instance Method Details
#==(other) ⇒ Object
63 64 65 |
# File 'lib/basset/classifier.rb', line 63 def ==(other) other.is_a?(self.class) && other.engine == engine && other.doctype == doctype end |
#classify(text) ⇒ Object
Classifies text based on training
50 51 52 |
# File 'lib/basset/classifier.rb', line 50 def classify(text) classify_features(features_of(text)).last end |
#similarity_score(classification, text) ⇒ Object
Gives a numeric value for the similarity of text to previously seen texts of class classification. For a Naive Bayes filter, this will be the log10 of the probabilities of each token in text occuring in a text of class classification, normalized for the number of tokens.
59 60 61 |
# File 'lib/basset/classifier.rb', line 59 def similarity_score(classification, text) similarity_score_for_features(classification, features_of(text)) end |
#train(classification, *texts) ⇒ Object
Trains the classifier with texts of class classification. texts gets flattened, so you can pass in an array without breaking anything.
31 32 33 34 35 |
# File 'lib/basset/classifier.rb', line 31 def train(classification, *texts) texts.flatten.each do |text| train_with_features(classification, features_of(text, classification)) end end |
#train_iterative(classification, text) ⇒ Object
Trains the classifier on a text repeatedly until the classifier recognizes it as being in class classification (up to a maximum of 5 retrainings). Handy for training the classifier quickly or when it has been mistrained.
41 42 43 44 45 46 |
# File 'lib/basset/classifier.rb', line 41 def train_iterative(classification, text) (1 .. 5).each do |i| train(classification, text) break if classify(text) == classification end end |