Class: Basset::Svm
- Inherits:
-
Object
- Object
- Basset::Svm
- Defined in:
- lib/basset/svm.rb
Overview
Overview
A class for SVM document classification. Follows the same basic interface as NaiveBayes; add labeled training documents to the classifier, then use it to classify unlabeled documents. Do test your accuracy before using the classifier in production, there are a lot of knobs to tweak. When testing, it is usually best to use a separate set of documents, i.e., not the training set.
Learning Resources
SVM can be tricky to understand at first, try the following references: en.wikipedia.org/wiki/Support_vector_machine www.igvita.com/2008/01/07/support-vector-machines-svm-in-ruby/ www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
Implementation
This class wraps libsvm-ruby-swig, which is itself a swig based wrapper for libsvm. libsvm-ruby-swig: github.com/tomz/libsvm-ruby-swig libsvm: www.csie.ntu.edu.tw/~cjlin/libsvm verbose version: Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines, 2001. Software available at www.csie.ntu.edu.tw/~cjlin/libsvm
There is also the libsvm-ruby implementation. It was originally available from debian.cilibrar.com/debian/pool/main/libs/libsvm-ruby/libsvm-ruby_2.8.4.orig.tar.gz but was not available from there when I last checked. The Ubuntu package was still available as of this writing.
Instance Attribute Summary collapse
-
#class_labels ⇒ Object
readonly
include YamlSerialization.
-
#feature_dictionary ⇒ Object
readonly
include YamlSerialization.
Instance Method Summary collapse
-
#add_document(classification, feature_vectors) ⇒ Object
Adds a new document to the training set.
- #classes ⇒ Object
- #classify(feature_vectors) ⇒ Object
-
#initialize ⇒ Svm
constructor
A new instance of Svm.
-
#labels_and_document_vectors ⇒ Object
Returns the vectorized representation of the training data, suitable for use in the constructor for the libsvm Problem class.
-
#parameters ⇒ Object
Exposes the libsvm-ruby-swig Parameter object.
-
#vectorized_docs(classification) ⇒ Object
Gives the vector representation of the training documents of class classification.
Constructor Details
#initialize ⇒ Svm
Returns a new instance of Svm.
34 35 36 37 38 39 40 |
# File 'lib/basset/svm.rb', line 34 def initialize @total_classes = 0 @feature_dictionary = [] @class_labels = {} @documents_for_class = Hash.new {|docs_hash,key| docs_hash[key] = []} @svm_parameter = default_svm_parameter end |
Instance Attribute Details
#class_labels ⇒ Object (readonly)
include YamlSerialization
32 33 34 |
# File 'lib/basset/svm.rb', line 32 def class_labels @class_labels end |
#feature_dictionary ⇒ Object (readonly)
include YamlSerialization
32 33 34 |
# File 'lib/basset/svm.rb', line 32 def feature_dictionary @feature_dictionary end |
Instance Method Details
#add_document(classification, feature_vectors) ⇒ Object
Adds a new document to the training set.
43 44 45 46 47 48 49 |
# File 'lib/basset/svm.rb', line 43 def add_document(classification, feature_vectors) update_class_labels_with_new(classification) if new_class?(classification) @feature_dictionary += feature_vectors.map { |fv| fv.name } @feature_dictionary.uniq! @documents_for_class[classification] << feature_vectors.map { |fv| fv.name } reset_memoized_vars! end |
#classes ⇒ Object
79 80 81 |
# File 'lib/basset/svm.rb', line 79 def classes @class_labels.keys end |
#classify(feature_vectors) ⇒ Object
75 76 77 |
# File 'lib/basset/svm.rb', line 75 def classify(feature_vectors) class_of_label(model.predict(vectorize_doc(feature_vectors.map { |fv| fv.name }))) end |
#labels_and_document_vectors ⇒ Object
Returns the vectorized representation of the training data, suitable for use in the constructor for the libsvm Problem class.
63 64 65 66 67 68 69 70 71 72 73 |
# File 'lib/basset/svm.rb', line 63 def labels_and_document_vectors # {labels => [features1-label, features2-label, ...], :features => [features1, features2, ...]} labels_features = {:labels => [], :features => []} @class_labels.each do |classification, label| vectorized_docs(classification).each do |document_vector| labels_features[:labels] << label labels_features[:features] << document_vector end end labels_features end |
#parameters ⇒ Object
Exposes the libsvm-ruby-swig Parameter object. If given a block, the parameter object is yielded, otherwise, it’s returned.
For example, to set parameters to their default values:
basset_svm_obj.parameters do |param|
param.C = 100
param.svm_type = NU_SVC
param.degree = 1
param.coef0 = 0
param.eps= 0.001
param.kernel_type = RBF
end
To access one value:
basset_svm_obj.parameters.svm_type
=> NU_SVC
101 102 103 104 105 106 107 |
# File 'lib/basset/svm.rb', line 101 def parameters if block_given? yield @svm_parameter else @svm_parameter end end |
#vectorized_docs(classification) ⇒ Object
Gives the vector representation of the training documents of class classification
53 54 55 56 57 58 59 |
# File 'lib/basset/svm.rb', line 53 def vectorized_docs(classification) # hardwired to binary representation @documents_for_class[classification].map do |features| vectorize_doc(features) #@feature_dictionary.map { |dict_feature| features.include?(dict_feature) ? 1 : 0} end end |