Class: Rumale::Preprocessing::OneHotEncoder

Inherits:
Object
  • Object
show all
Includes:
Base::BaseEstimator, Base::Transformer
Defined in:
lib/rumale/preprocessing/one_hot_encoder.rb

Overview

Encode categorical integer features to one-hot-vectors.

Examples:

encoder = Rumale::Preprocessing::OneHotEncoder.new
labels = Numo::Int32[0, 0, 2, 3, 2, 1]
one_hot_vectors = encoder.fit_transform(labels)
# > pp one_hot_vectors
# Numo::DFloat#shape[6, 4]
# [[1, 0, 0, 0],
#  [1, 0, 0, 0],
#  [0, 0, 1, 0],
#  [0, 0, 0, 1],
#  [0, 0, 1, 0],
#  [0, 1, 0, 0]]

Instance Attribute Summary collapse

Attributes included from Base::BaseEstimator

#params

Instance Method Summary collapse

Constructor Details

#initializeOneHotEncoder

Create a new encoder for encoding categorical integer features to one-hot-vectors



35
36
37
38
39
# File 'lib/rumale/preprocessing/one_hot_encoder.rb', line 35

def initialize
  @params = {}
  @n_values = nil
  @feature_indices = nil
end

Instance Attribute Details

#feature_indicesNumo::Int32 (readonly)

Return the indices to feature ranges.

Returns:

  • (Numo::Int32)

    (shape: [n_features + 1])



32
33
34
# File 'lib/rumale/preprocessing/one_hot_encoder.rb', line 32

def feature_indices
  @feature_indices
end

#n_valuesNumo::Int32 (readonly)

Return the maximum values for each feature.

Returns:

  • (Numo::Int32)

    (shape: [n_features])



28
29
30
# File 'lib/rumale/preprocessing/one_hot_encoder.rb', line 28

def n_values
  @n_values
end

Instance Method Details

#fit(x) ⇒ OneHotEncoder

Fit one-hot-encoder to samples.

Parameters:

  • x (Numo::Int32)

    (shape: [n_samples, n_features]) The samples to fit one-hot-encoder.

Returns:



47
48
49
50
51
52
# File 'lib/rumale/preprocessing/one_hot_encoder.rb', line 47

def fit(x, _y = nil)
  check_params_type(Numo::Int32, x: x)
  @n_values = x.max(0) + 1
  @feature_indices = Numo::Int32.hstack([[0], @n_values]).cumsum
  self
end

#fit_transform(x) ⇒ Numo::DFloat

Fit one-hot-encoder to samples, then encode samples into one-hot-vectors

Parameters:

  • x (Numo::Int32)

    (shape: [n_samples, n_features]) The samples to encode into one-hot-vectors.

Returns:

  • (Numo::DFloat)

    The one-hot-vectors.



60
61
62
63
# File 'lib/rumale/preprocessing/one_hot_encoder.rb', line 60

def fit_transform(x, _y = nil)
  check_params_type(Numo::Int32, x: x)
  fit(x).transform(x)
end

#marshal_dumpHash

Dump marshal data.

Returns:

  • (Hash)

    The marshal data about OneHotEncoder.



82
83
84
85
86
# File 'lib/rumale/preprocessing/one_hot_encoder.rb', line 82

def marshal_dump
  { params: @params,
    n_values: @n_values,
    feature_indices: @feature_indices }
end

#marshal_load(obj) ⇒ nil

Load marshal data.

Returns:

  • (nil)


90
91
92
93
94
95
# File 'lib/rumale/preprocessing/one_hot_encoder.rb', line 90

def marshal_load(obj)
  @params = obj[:params]
  @n_values = obj[:n_values]
  @feature_indices = obj[:feature_indices]
  nil
end

#transform(x) ⇒ Numo::DFloat

Encode samples into one-hot-vectors.

Parameters:

  • x (Numo::Int32)

    (shape: [n_samples, n_features]) The samples to encode into one-hot-vectors.

Returns:

  • (Numo::DFloat)

    The one-hot-vectors.



69
70
71
72
73
74
75
76
77
78
# File 'lib/rumale/preprocessing/one_hot_encoder.rb', line 69

def transform(x)
  check_params_type(Numo::Int32, x: x)
  n_samples, n_features = x.shape
  n_features = 1 if n_features.nil?
  column_indices = (x + @feature_indices[0...-1]).flatten.to_a
  row_indices = Numo::Int32.new(n_samples).seq.repeat(n_features).to_a
  codes = Numo::DFloat.zeros(n_samples, @feature_indices[-1])
  row_indices.zip(column_indices).each { |r, c| codes[r, c] = 1.0 }
  codes
end