Class: SVMKit::Preprocessing::OneHotEncoder

Inherits:
Object
  • Object
show all
Includes:
Base::BaseEstimator, Base::Transformer
Defined in:
lib/svmkit/preprocessing/one_hot_encoder.rb

Overview

Encode categorical integer features to one-hot-vectors.

Examples:

encoder = SVMKit::Preprocessing::OneHotEncoder.new
labels = Numo::Int32[0, 0, 2, 3, 2, 1]
one_hot_vectors = encoder.fit_transform(labels)
# > pp one_hot_vectors
# Numo::DFloat#shape[6, 4]
# [[1, 0, 0, 0],
#  [1, 0, 0, 0],
#  [0, 0, 1, 0],
#  [0, 0, 0, 1],
#  [0, 0, 1, 0],
#  [0, 1, 0, 0]]

Instance Attribute Summary collapse

Attributes included from Base::BaseEstimator

#params

Instance Method Summary collapse

Constructor Details

#initializeOneHotEncoder

Create a new encoder for encoding categorical integer features to one-hot-vectors



36
37
38
39
40
# File 'lib/svmkit/preprocessing/one_hot_encoder.rb', line 36

def initialize
  @params = {}
  @n_values = nil
  @feature_indices = nil
end

Instance Attribute Details

#feature_indicesNumo::Int32 (readonly)

Return the indices to feature ranges.

Returns:

  • (Numo::Int32)

    (shape: [n_features + 1])



33
34
35
# File 'lib/svmkit/preprocessing/one_hot_encoder.rb', line 33

def feature_indices
  @feature_indices
end

#n_valuesNumo::Int32 (readonly)

Return the maximum values for each feature.

Returns:

  • (Numo::Int32)

    (shape: [n_features])



29
30
31
# File 'lib/svmkit/preprocessing/one_hot_encoder.rb', line 29

def n_values
  @n_values
end

Instance Method Details

#fit(x) ⇒ OneHotEncoder

Fit one-hot-encoder to samples.

Parameters:

  • x (Numo::Int32)

    (shape: [n_samples, n_features]) The samples to fit one-hot-encoder.

Returns:



48
49
50
51
52
53
# File 'lib/svmkit/preprocessing/one_hot_encoder.rb', line 48

def fit(x, _y = nil)
  SVMKit::Validation.check_params_type(Numo::Int32, x: x)
  @n_values = x.max(0) + 1
  @feature_indices = Numo::Int32.hstack([[0], @n_values]).cumsum
  self
end

#fit_transform(x) ⇒ Numo::DFloat

Fit one-hot-encoder to samples, then encode samples into one-hot-vectors

Parameters:

  • x (Numo::Int32)

    (shape: [n_samples, n_features]) The samples to encode into one-hot-vectors.

Returns:

  • (Numo::DFloat)

    The one-hot-vectors.



61
62
63
64
# File 'lib/svmkit/preprocessing/one_hot_encoder.rb', line 61

def fit_transform(x, _y = nil)
  SVMKit::Validation.check_params_type(Numo::Int32, x: x)
  fit(x).transform(x)
end

#marshal_dumpHash

Dump marshal data.

Returns:

  • (Hash)

    The marshal data about OneHotEncoder.



83
84
85
86
87
# File 'lib/svmkit/preprocessing/one_hot_encoder.rb', line 83

def marshal_dump
  { params: @params,
    n_values: @n_values,
    feature_indices: @feature_indices }
end

#marshal_load(obj) ⇒ nil

Load marshal data.

Returns:

  • (nil)


91
92
93
94
95
96
# File 'lib/svmkit/preprocessing/one_hot_encoder.rb', line 91

def marshal_load(obj)
  @params = obj[:params]
  @n_values = obj[:n_values]
  @feature_indices = obj[:feature_indices]
  nil
end

#transform(x) ⇒ Numo::DFloat

Encode samples into one-hot-vectors.

Parameters:

  • x (Numo::Int32)

    (shape: [n_samples, n_features]) The samples to encode into one-hot-vectors.

Returns:

  • (Numo::DFloat)

    The one-hot-vectors.



70
71
72
73
74
75
76
77
78
79
# File 'lib/svmkit/preprocessing/one_hot_encoder.rb', line 70

def transform(x)
  SVMKit::Validation.check_params_type(Numo::Int32, x: x)
  n_samples, n_features = x.shape
  n_features = 1 if n_features.nil?
  column_indices = (x + @feature_indices[0...-1]).flatten.to_a
  row_indices = Numo::Int32.new(n_samples).seq.repeat(n_features).to_a
  codes = Numo::DFloat.zeros(n_samples, @feature_indices[-1])
  row_indices.zip(column_indices).each { |r, c| codes[r, c] = 1.0 }
  codes
end