Class: Eluka::FeatureVectors

Inherits:
Object
  • Object
show all
Defined in:
lib/eluka/feature_vector.rb

Instance Method Summary collapse

Constructor Details

#initialize(features, train) ⇒ FeatureVectors

Feature Vectors for a data point need to know the global list of features and their respective ids

During training, as we keep finding new features we add them to the features list

Hence we need to know whether the vectors we are computing are for training or classification



25
26
27
28
29
# File 'lib/eluka/feature_vector.rb', line 25

def initialize (features, train)
  @fvs      = Array.new
  @features = features  #Instance of features
  @train    = train     #Boolean
end

Instance Method Details

#add(vector, label = 0) ⇒ Object

We just keep all data points stored and convert them to feature vectors only on demand



34
35
36
# File 'lib/eluka/feature_vector.rb', line 34

def add (vector, label = 0)
  @fvs.push([vector, label])
end

#define_featuresObject

For training data points we make sure all the features are added to the feature list



41
42
43
44
45
46
47
# File 'lib/eluka/feature_vector.rb', line 41

def define_features
  @fvs.each do |vector, label|
    vector.each do |term, value|
      @features.add(term)
    end
  end
end

#to_libSVM(sel_features = nil) ⇒ Object

Creates feature vectors and converts them to LibSVM format – a multiline string with one data point per line

If provided with a list of selected features then insert only those features



56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
# File 'lib/eluka/feature_vector.rb', line 56

def to_libSVM (sel_features = nil)
  
  #Load the selected features into a Hash
  sf = Hash.new
  if (sel_features)
    sel_features.each do |f| 
      sf[f] = 1 
    end
  end
  
  self.define_features if (@train) #This method is needed only for training data
  
  output = Array.new
  @fvs.each do |vector, label|
    line = Array.new
    line.push(label)
    
    (1..@features.f_count).each do |id| #OPTIMIZE: Change this line to consider sorting in case of terms being features
      term = @features.term(id)
      if ( value = vector[term] ) then
        line.push([id, value].join(":")) if sf[term] or not sel_features
      end
    end
    output.push(line.join(" "))
  end
  output.join("\n")
end