Class: Rumale::Tree::GradientTreeRegressor

Inherits:
Object
  • Object
show all
Includes:
Base::BaseEstimator, Base::Regressor
Defined in:
lib/rumale/tree/gradient_tree_regressor.rb

Overview

GradientTreeRegressor is a class that implements decision tree for regression with exact gredy algorithm. This class is used internally for estimators with gradient tree boosting.

reference

  • J H. Friedman, “Greedy Function Approximation: A Gradient Boosting Machine,” Annals of Statistics, 29 (5), pp. 1189–1232, 2001.

  • J H. Friedman, “Stochastic Gradient Boosting,” Computational Statistics and Data Analysis, 38 (4), pp. 367–378, 2002.

    1. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” Proc. KDD’16, pp. 785–794, 2016.

Instance Attribute Summary collapse

Attributes included from Base::BaseEstimator

#params

Instance Method Summary collapse

Methods included from Base::Regressor

#score

Constructor Details

#initialize(reg_lambda: 0.0, shrinkage_rate: 1.0, max_depth: nil, max_leaf_nodes: nil, min_samples_leaf: 1, max_features: nil, random_seed: nil) ⇒ GradientTreeRegressor

Initialize a gradient tree regressor

Parameters:

  • reg_lambda (Float) (defaults to: 0.0)

    The L2 regularization term on weight.

  • shrinkage_rate (Float) (defaults to: 1.0)

    The shrinkage rate for weight.

  • max_depth (Integer) (defaults to: nil)

    The maximum depth of the tree. If nil is given, decision tree grows without concern for depth.

  • max_leaf_nodes (Integer) (defaults to: nil)

    The maximum number of leaves on decision tree. If nil is given, number of leaves is not limited.

  • min_samples_leaf (Integer) (defaults to: 1)

    The minimum number of samples at a leaf node.

  • max_features (Integer) (defaults to: nil)

    The number of features to consider when searching optimal split point. If nil is given, split process considers all features.

  • random_seed (Integer) (defaults to: nil)

    The seed value using to initialize the random generator. It is used to randomly determine the order of features when deciding spliting point.



53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
# File 'lib/rumale/tree/gradient_tree_regressor.rb', line 53

def initialize(reg_lambda: 0.0, shrinkage_rate: 1.0,
               max_depth: nil, max_leaf_nodes: nil, min_samples_leaf: 1, max_features: nil, random_seed: nil)
  check_params_type_or_nil(Integer, max_depth: max_depth, max_leaf_nodes: max_leaf_nodes,
                                    max_features: max_features, random_seed: random_seed)
  check_params_float(reg_lambda: reg_lambda, shrinkage_rate: shrinkage_rate)
  check_params_integer(min_samples_leaf: min_samples_leaf)
  check_params_positive(reg_lambda: reg_lambda, shrinkage_rate: shrinkage_rate,
                        max_depth: max_depth, max_leaf_nodes: max_leaf_nodes,
                        min_samples_leaf: min_samples_leaf, max_features: max_features)
  @params = {}
  @params[:reg_lambda] = reg_lambda
  @params[:shrinkage_rate] = shrinkage_rate
  @params[:max_depth] = max_depth
  @params[:max_leaf_nodes] = max_leaf_nodes
  @params[:min_samples_leaf] = min_samples_leaf
  @params[:max_features] = max_features
  @params[:random_seed] = random_seed
  @params[:random_seed] ||= srand
  @tree = nil
  @feature_importances = nil
  @n_leaves = nil
  @leaf_weights = nil
  @rng = Random.new(@params[:random_seed])
end

Instance Attribute Details

#feature_importancesNumo::DFloat (readonly)

Return the importance for each feature. The feature importances are calculated based on the numbers of times the feature is used for splitting.

Returns:

  • (Numo::DFloat)

    (shape: [n_features])



26
27
28
# File 'lib/rumale/tree/gradient_tree_regressor.rb', line 26

def feature_importances
  @feature_importances
end

#leaf_weightsNumo::DFloat (readonly)

Return the values assigned each leaf.

Returns:

  • (Numo::DFloat)

    (shape: [n_leaves])



38
39
40
# File 'lib/rumale/tree/gradient_tree_regressor.rb', line 38

def leaf_weights
  @leaf_weights
end

#rngRandom (readonly)

Return the random generator for random selection of feature index.

Returns:

  • (Random)


34
35
36
# File 'lib/rumale/tree/gradient_tree_regressor.rb', line 34

def rng
  @rng
end

#treeNode (readonly)

Return the learned tree.

Returns:



30
31
32
# File 'lib/rumale/tree/gradient_tree_regressor.rb', line 30

def tree
  @tree
end

Instance Method Details

#apply(x) ⇒ Numo::Int32

Return the index of the leaf that each sample reached.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The samples to predict the labels.

Returns:

  • (Numo::Int32)

    (shape: [n_samples]) Leaf index for sample.



116
117
118
119
# File 'lib/rumale/tree/gradient_tree_regressor.rb', line 116

def apply(x)
  check_sample_array(x)
  Numo::Int32[*(Array.new(x.shape[0]) { |n| apply_at_node(@tree, x[n, true]) })]
end

#fit(x, y, g, h) ⇒ GradientTreeRegressor

Fit the model with given training data.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The training data to be used for fitting the model.

  • y (Numo::DFloat)

    (shape: [n_samples]) The taget values to be used for fitting the model.

  • g (Numo::DFloat)

    (shape: [n_samples]) The gradient of loss function.

  • h (Numo::DFloat)

    (shape: [n_samples]) The hessian of loss function.

Returns:



85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
# File 'lib/rumale/tree/gradient_tree_regressor.rb', line 85

def fit(x, y, g, h)
  check_sample_array(x)
  check_tvalue_array(y)
  check_sample_tvalue_size(x, y)
  check_params_type(Numo::DFloat, g: g, h: g)
  # Initialize some variables.
  n_features = x.shape[1]
  @params[:max_features] ||= n_features
  @n_leaves = 0
  @leaf_weights = []
  @feature_importances = Numo::DFloat.zeros(n_features)
  @sub_rng = @rng.dup
  # Build tree.
  build_tree(x, y, g, h)
  @leaf_weights = Numo::DFloat[*@leaf_weights]
  self
end

#marshal_dumpHash

Dump marshal data.

Returns:

  • (Hash)

    The marshal data about DecisionTreeRegressor



123
124
125
126
127
128
129
# File 'lib/rumale/tree/gradient_tree_regressor.rb', line 123

def marshal_dump
  { params: @params,
    tree: @tree,
    feature_importances: @feature_importances,
    leaf_weights: @leaf_weights,
    rng: @rng }
end

#marshal_load(obj) ⇒ nil

Load marshal data.

Returns:

  • (nil)


133
134
135
136
137
138
139
140
# File 'lib/rumale/tree/gradient_tree_regressor.rb', line 133

def marshal_load(obj)
  @params = obj[:params]
  @tree = obj[:tree]
  @feature_importances = obj[:feature_importances]
  @leaf_weights = obj[:leaf_weights]
  @rng = obj[:rng]
  nil
end

#predict(x) ⇒ Numo::DFloat

Predict values for samples.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The samples to predict the values.

Returns:

  • (Numo::DFloat)

    (size: n_samples) Predicted values per sample.



107
108
109
110
# File 'lib/rumale/tree/gradient_tree_regressor.rb', line 107

def predict(x)
  check_sample_array(x)
  @leaf_weights[apply(x)].dup
end