Class: SvmToolkit::Problem

Inherits:
Object
  • Object
show all
Defined in:
lib/svm_toolkit/problem.rb

Overview

Extends the Java Problem class with some additional features.

Constant Summary collapse

SvmLight =

To select SvmLight input file format

0
Csv =

To select Csv input file format

1
Arff =

To select ARFF input file format

2

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.from_array(instances, labels) ⇒ Object

Support constructing a problem from arrays of double values.

instances

an array of instances, each instance being an array of doubles.

labels

an array of doubles, forming the labels for each instance.

An ArgumentError exception is raised if all the following conditions are not met:

  • the number of instances should equal the number of labels,

  • there must be at least one instance, and

  • every instance must have the same number of features.



17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# File 'lib/svm_toolkit/problem.rb', line 17

def Problem.from_array(instances, labels)
  unless instances.size == labels.size
    raise ArgumentError.new "Number of instances must equal number of labels"
  end
  unless instances.size > 0
    raise ArgumentError.new "There must be at least one instance."
  end
  unless instances.collect {|i| i.size}.min == instances.collect {|i| i.size}.max
    raise ArgumentError.new "All instances must have the same size"
  end

  problem = Problem.new
  problem.l = labels.size
  # -- add in the training data

  problem.x = Node[instances.size, instances[0].size].new
  instances.each_with_index do |instance, i|
    instance.each_with_index do |v, j|
      problem.x[i][j] = Node.new(j, v)
    end
  end
  # -- add in the labels

  problem.y = Java::double[labels.size].new
  labels.each_with_index do |v, i| 
    problem.y[i] = v
  end

  return problem
end

.from_file(filename, format = SvmLight) ⇒ Object

Read in a problem definition from a file.

filename

the name of the file

format

either Svm::SvmLight (default), Svm::Csv or Svm::Arff

Raises ArgumentError if there is any error in format.



63
64
65
66
67
68
69
70
71
72
# File 'lib/svm_toolkit/problem.rb', line 63

def Problem.from_file(filename, format = SvmLight)
  case format 
  when SvmLight
    return Problem.from_file_svmlight filename
  when Csv
    return Problem.from_file_csv filename
  when Arff
    return Problem.from_file_arff filename
  end
end

.from_file_arff(filename) ⇒ Object

Read in a problem definition in arff format. Assumes all values are numbers (non-numbers converted to 0.0), and that the class is the last field.

filename

the name of the file

Raises ArgumentError if there is any error in format.



186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
# File 'lib/svm_toolkit/problem.rb', line 186

def Problem.from_file_arff filename
  instances = []
  labels = []
  max_index = 0
  found_data = false
  IO.foreach(filename) do |line|
    unless found_data
      puts "Ignoring", line
      found_data = line.downcase.strip == "@data"
      next # repeat the loop

    end
    tokens = line.split(",")
    labels << tokens.last.to_f
    instance = []
    tokens[1...-1].each_with_index do |value, index|
      instance << Node.new(index, value.to_f)
    end
    max_index = [tokens.size, max_index].max 
    instances << instance
  end
  max_index += 1 # to allow for 0 position

  unless instances.size == labels.size
    raise ArgumentError.new "Number of labels read differs from number of instances"
  end
  # now create a Problem definition

  problem = Problem.new
  problem.l = instances.size
  # -- add in the training data

  problem.x = Node[instances.size, max_index].new
  # -- fill with blank nodes

  instances.size.times do |i|
    max_index.times do |j|
      problem.x[i][j] = Node.new(i, 0)
    end
  end
  # -- add known values

  instances.each_with_index do |instance, i|
    instance.each do |node|
      problem.x[i][node.index] = node
    end
  end
  # -- add in the labels

  problem.y = Java::double[labels.size].new
  labels.each_with_index do |v, i| 
    problem.y[i] = v
  end

  return problem
end

.from_file_csv(filename) ⇒ Object

Read in a problem definition in csv format.

filename

the name of the file

Raises ArgumentError if there is any error in format.



133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
# File 'lib/svm_toolkit/problem.rb', line 133

def Problem.from_file_csv filename
  instances = []
  labels = []
  max_index = 0
  csv_data = CSV.parse(File.read(filename), headers: false)
  csv_data.each do |tokens|
    labels << tokens[0].to_f
    instance = []
    tokens[1..-1].each_with_index do |value, index|
      instance << Node.new(index, value.to_f)
    end
    max_index = [tokens.size, max_index].max 
    instances << instance
  end
  max_index += 1 # to allow for 0 position

  unless instances.size == labels.size
    raise ArgumentError.new "Number of labels read differs from number of instances"
  end
  # now create a Problem definition

  problem = Problem.new
  problem.l = instances.size
  # -- add in the training data

  problem.x = Node[instances.size, max_index].new
  # -- fill with blank nodes

  instances.size.times do |i|
    max_index.times do |j|
      problem.x[i][j] = Node.new(i, 0)
    end
  end
  # -- add known values

  instances.each_with_index do |instance, i|
    instance.each do |node|
      problem.x[i][node.index] = node
    end
  end
  # -- add in the labels

  problem.y = Java::double[labels.size].new
  labels.each_with_index do |v, i| 
    problem.y[i] = v
  end

  return problem
end

.from_file_svmlight(filename) ⇒ Object

Read in a problem definition in svmlight format.

filename

the name of the file

Raises ArgumentError if there is any error in format.



81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
# File 'lib/svm_toolkit/problem.rb', line 81

def Problem.from_file_svmlight filename
  instances = []
  labels = []
  max_index = 0
  IO.foreach(filename) do |line|
    tokens = line.split(" ")
    labels << tokens[0].to_f
    instance = []
    tokens[1..-1].each do |feature|
      index, value = feature.split(":")
      instance << Node.new(index.to_i, value.to_f)
      max_index = [index.to_i, max_index].max 
    end
    instances << instance
  end
  max_index += 1 # to allow for 0 position

  unless instances.size == labels.size
    raise ArgumentError.new "Number of labels read differs from number of instances"
  end
  # now create a Problem definition

  problem = Problem.new
  problem.l = instances.size
  # -- add in the training data

  problem.x = Node[instances.size, max_index].new
  # -- fill with blank nodes

  instances.size.times do |i|
    max_index.times do |j|
      problem.x[i][j] = Node.new(i, 0)
    end
  end
  # -- add known values

  instances.each_with_index do |instance, i|
    instance.each do |node|
      problem.x[i][node.index] = node
    end
  end
  # -- add in the labels

  problem.y = Java::double[labels.size].new
  labels.each_with_index do |v, i| 
    problem.y[i] = v
  end

  return problem
end

Instance Method Details

#merge(problem) ⇒ Object

Create a new problem by combining the instances in this problem with those in the given problem.



253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
# File 'lib/svm_toolkit/problem.rb', line 253

def merge problem
  unless self.x[0].size == problem.x[0].size
    raise ArgumentError.new "Cannot merge two problems with different numbers of features"
  end
  num_features = self.x[0].size
  num_instances = size + problem.size

  new_problem = Problem.new
  new_problem.l = num_instances
  new_problem.x = Node[num_instances, num_features].new
  new_problem.y = Java::double[num_instances].new
  # fill out the features

  num_instances.times do |i|
    num_features.times do |j|
      if i < size
        new_problem.x[i][j] = self.x[i][j]
      else
        new_problem.x[i][j] = problem.x[i-size][j]
      end
    end
  end
  # fill out the labels

  num_instances.times do |i|
    if i < size
      new_problem.y[i] = self.y[i]
    else
      new_problem.y[i] = problem.y[i-size]
    end
  end

  return new_problem
end

#rescale(min_value = 0.0, max_value = 1.0) ⇒ Object

Rescale values within problem to be in range min_value to max_value

For SVM models, it is recommended all features be in range [0,1] or [-1,1]



244
245
246
247
248
249
# File 'lib/svm_toolkit/problem.rb', line 244

def rescale(min_value = 0.0, max_value = 1.0)
  return if self.l.zero?
  x[0].size.times do |i|
    rescale_column(i, min_value, max_value)
  end
end

#sizeObject

Returns the number of instances



237
238
239
# File 'lib/svm_toolkit/problem.rb', line 237

def size
  self.l
end