Class: Classifier

Inherits:
Object
  • Object
show all
Defined in:
lib/rbbt/bow/classifier.rb

Overview

This class uses R to build and use classification models. It needs the ‘e1071’ R package.

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(modelfile) ⇒ Classifier

Loads an R interpreter which loads the svm model under modelfile.



25
26
27
28
29
30
31
32
33
34
# File 'lib/rbbt/bow/classifier.rb', line 25

def initialize(modelfile)
  @r = RSRuby.instance
  @r.library('e1071')
  @r.source(File.join(Rbbt.datadir, 'classifier/R/classify.R'))

  @r.load(modelfile)

  @model = @r.svm_model
  @terms = @r.eval_R("terms = unlist(attr(attr(svm.model$terms,'factors'),'dimnames')[2])")
end

Instance Attribute Details

#termsObject (readonly)

Returns the value of attribute terms.



22
23
24
# File 'lib/rbbt/bow/classifier.rb', line 22

def terms
  @terms
end

Class Method Details

.create_model(featuresfile, modelfile, dictfile = nil) ⇒ Object

Given the path to a features file, which specifies a number of instances along with their classes and features in a tab separated format, it uses R to build a svm model which is save to file in the path specified as modelfile.



13
14
15
16
17
18
19
20
# File 'lib/rbbt/bow/classifier.rb', line 13

def self.create_model(featuresfile, modelfile, dictfile = nil)

  r = RSRuby.instance
  r.source(File.join(Rbbt.datadir, 'classifier/R/classify.R'))
  r.BOW_classification_model(featuresfile, modelfile)

  nil
end

Instance Method Details

#classify(input) ⇒ Object

This is a polymorphic method. The input variable may be a single input, in which case the results will be just the class, a hash of inputs, in which case the result will be a hash with the results for each input, or an array, in which case the result is an array of the results in the same order. Each input may also be in the form of a string, in which case it will be transformed into a feature vector, or an array in which case it will be considered as an feature vector itself.



90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
# File 'lib/rbbt/bow/classifier.rb', line 90

def classify(input)
  if input.is_a? String
    return classify_text_array([input]).first
  end


  if input.is_a? Hash
    return  {} if input.empty?
    if input.values.first.is_a? String
      return classify_text_hash(input)
    elsif input.values.first.is_a? Array
      return classify_feature_hash(input)
    end
  end

  if input.is_a? Array
    return  [] if input.empty?
    if input.first.is_a? String
      return classify_text_array(input)
    elsif input.first.is_a? Array
      return classify_feature_array(input)
    end
  end

end

#classify_feature_array(input) ⇒ Object

:nodoc:



36
37
38
39
40
41
42
43
44
45
# File 'lib/rbbt/bow/classifier.rb', line 36

def classify_feature_array(input) #:nodoc:
  @r.assign('input', input)

  @r.eval_R('input = t(as.data.frame(input))')
  @r.eval_R('rownames(input) <- NULL')
  @r.eval_R('colnames(input) <- terms')

  results = @r.eval_R('BOW.classification.classify(svm.model, input, svm.weights)')
  results.sort.collect{|p| p[1]}
end

#classify_feature_hash(input) ⇒ Object

:nodoc:



47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
# File 'lib/rbbt/bow/classifier.rb', line 47

def classify_feature_hash(input) #:nodoc:
  names = []
  features = []
  input.each{|name, feats|
    names << name.to_s
    features << feats
  }

  @r.assign('input', features)
  @r.assign('input.names', names)

  @r.eval_R('input = t(as.data.frame(input))')
  @r.eval_R('rownames(input) <- input.names')
  @r.eval_R('colnames(input) <- terms')

  @r.eval_R('BOW.classification.classify(svm.model, input, svm.weights)')
end

#classify_text_array(input) ⇒ Object

:nodoc:



65
66
67
68
69
70
71
# File 'lib/rbbt/bow/classifier.rb', line 65

def classify_text_array(input) #:nodoc:
  features = input.collect{|text|
    BagOfWords.features(text, @terms)
  }

  classify_feature_array(features)
end

#classify_text_hash(input) ⇒ Object

:nodoc:



73
74
75
76
77
78
79
80
# File 'lib/rbbt/bow/classifier.rb', line 73

def classify_text_hash(input) #:nodoc:
  features = {}
  input.each{|key,text|
    features[key] = BagOfWords.features(text, @terms)
  }

  classify_feature_hash(features)
end