Class: Basset::Svm

Inherits:
Object
  • Object
show all
Defined in:
lib/basset/svm.rb

Overview

Overview

A class for SVM document classification. Follows the same basic interface as NaiveBayes; add labeled training documents to the classifier, then use it to classify unlabeled documents. Do test your accuracy before using the classifier in production, there are a lot of knobs to tweak. When testing, it is usually best to use a separate set of documents, i.e., not the training set.

Learning Resources

SVM can be tricky to understand at first, try the following references: en.wikipedia.org/wiki/Support_vector_machine www.igvita.com/2008/01/07/support-vector-machines-svm-in-ruby/ www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf

Implementation

This class wraps libsvm-ruby-swig, which is itself a swig based wrapper for libsvm. libsvm-ruby-swig: github.com/tomz/libsvm-ruby-swig libsvm: www.csie.ntu.edu.tw/~cjlin/libsvm verbose version: Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines, 2001. Software available at www.csie.ntu.edu.tw/~cjlin/libsvm

There is also the libsvm-ruby implementation. It was originally available from debian.cilibrar.com/debian/pool/main/libs/libsvm-ruby/libsvm-ruby_2.8.4.orig.tar.gz but was not available from there when I last checked. The Ubuntu package was still available as of this writing.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initializeSvm

Returns a new instance of Svm.



34
35
36
37
38
39
40
# File 'lib/basset/svm.rb', line 34

def initialize
  @total_classes = 0
  @feature_dictionary = []
  @class_labels = {}
  @documents_for_class = Hash.new {|docs_hash,key| docs_hash[key] = []}
  @svm_parameter = default_svm_parameter
end

Instance Attribute Details

#class_labelsObject (readonly)

include YamlSerialization



32
33
34
# File 'lib/basset/svm.rb', line 32

def class_labels
  @class_labels
end

#feature_dictionaryObject (readonly)

include YamlSerialization



32
33
34
# File 'lib/basset/svm.rb', line 32

def feature_dictionary
  @feature_dictionary
end

Instance Method Details

#add_document(classification, feature_vectors) ⇒ Object

Adds a new document to the training set.



43
44
45
46
47
48
49
# File 'lib/basset/svm.rb', line 43

def add_document(classification, feature_vectors)
  update_class_labels_with_new(classification) if new_class?(classification)
  @feature_dictionary += feature_vectors.map { |fv| fv.name }
  @feature_dictionary.uniq!
  @documents_for_class[classification] << feature_vectors.map { |fv| fv.name }
  reset_memoized_vars!
end

#classesObject



79
80
81
# File 'lib/basset/svm.rb', line 79

def classes
  @class_labels.keys
end

#classify(feature_vectors) ⇒ Object



75
76
77
# File 'lib/basset/svm.rb', line 75

def classify(feature_vectors)
  class_of_label(model.predict(vectorize_doc(feature_vectors.map { |fv| fv.name })))
end

#labels_and_document_vectorsObject

Returns the vectorized representation of the training data, suitable for use in the constructor for the libsvm Problem class.



63
64
65
66
67
68
69
70
71
72
73
# File 'lib/basset/svm.rb', line 63

def labels_and_document_vectors
  # {labels => [features1-label, features2-label, ...], :features => [features1, features2, ...]}
  labels_features = {:labels => [], :features => []}
  @class_labels.each do |classification, label|
    vectorized_docs(classification).each do |document_vector|
      labels_features[:labels] << label
      labels_features[:features] << document_vector
    end
  end
  labels_features
end

#parametersObject

Exposes the libsvm-ruby-swig Parameter object. If given a block, the parameter object is yielded, otherwise, it’s returned.

For example, to set parameters to their default values:

basset_svm_obj.parameters do |param|
  param.C = 100           
  param.svm_type = NU_SVC
  param.degree = 1
  param.coef0 = 0
  param.eps= 0.001
  param.kernel_type = RBF
end

To access one value:

basset_svm_obj.parameters.svm_type
=> NU_SVC


101
102
103
104
105
106
107
# File 'lib/basset/svm.rb', line 101

def parameters
  if block_given?
    yield @svm_parameter
  else
    @svm_parameter
  end
end

#vectorized_docs(classification) ⇒ Object

Gives the vector representation of the training documents of class classification



53
54
55
56
57
58
59
# File 'lib/basset/svm.rb', line 53

def vectorized_docs(classification)
  # hardwired to binary representation
  @documents_for_class[classification].map do |features| 
    vectorize_doc(features)
    #@feature_dictionary.map { |dict_feature| features.include?(dict_feature) ? 1 : 0}
  end
end