Class: LibSVMLoader

Inherits:
Object
  • Object
show all
Defined in:
lib/libsvmloader.rb,
lib/libsvmloader/version.rb

Overview

LibSVMLoader loads (and dumps) dataset file with the libsvm file format.

Constant Summary collapse

VERSION =
'0.2.1'.freeze

Class Method Summary collapse

Class Method Details

.dump_libsvm_file(data, labels, filename, zero_based: false) ⇒ Object

Dump the dataset with the libsvm file format.

Parameters:

  • data (Array)

    (n_samples x n_features) matrix consisting of feature vectors.

  • labels (Array)

    (n_samples) vector consisting of labels or target values.

  • filename (String)

    Path to the output libsvm file.

  • zero_based (Boolean) (defaults to: false)

    Whether the column index starts from 0 (true) or 1 (false).



40
41
42
43
44
45
46
47
# File 'lib/libsvmloader.rb', line 40

def dump_libsvm_file(data, labels, filename, zero_based: false)
  n_samples = [data.size, labels.size].min
  label_format = detect_format(labels.first)
  value_format = detect_format(data.flatten.first)
  File.open(filename, 'w') do |file|
    n_samples.times { |n| file.puts(dump_libsvm_line(labels[n], data[n], label_format, value_format, zero_based)) }
  end
end

.load_libsvm_file(filename, zero_based: false, label_dtype: 'int', value_dtype: 'float') ⇒ Array<Array>

Load a dataset with the libsvm file format.

Parameters:

  • filename (String)

    Path to a dataset file.

  • zero_based (Boolean) (defaults to: false)

    Whether the column index starts from 0 (true) or 1 (false).

  • label_dtype (String) (defaults to: 'int')

    Data type of labels or target values (‘int’, ‘float’, ‘complex’).

  • value_dtype (String) (defaults to: 'float')

    Data type of feature vectors (‘int’, ‘float’, ‘complex’).

Returns:

  • (Array<Array>)

    Returns array containing the (n_samples x n_features) matrix for feature vectors and (n_samples) vector for labels or target values.



19
20
21
22
23
24
25
26
27
28
29
30
31
32
# File 'lib/libsvmloader.rb', line 19

def load_libsvm_file(filename, zero_based: false, label_dtype: 'int', value_dtype: 'float')
  labels = []
  ftvecs = []
  maxids = []
  label_class = parse_dtype(label_dtype)
  value_class = parse_dtype(value_dtype)
  CSV.foreach(filename, col_sep: "\s", headers: false) do |row|
    label, ftvec, maxid = parse_libsvm_row(row, zero_based, label_class, value_class)
    labels.push(label)
    ftvecs.push(ftvec)
    maxids.push(maxid)
  end
  [convert_to_matrix(ftvecs, maxids.max + 1, value_class), labels]
end