Class: SVMKit::Ensemble::AdaBoostClassifier
- Inherits:
-
Object
- Object
- SVMKit::Ensemble::AdaBoostClassifier
- Includes:
- Base::BaseEstimator, Base::Classifier
- Defined in:
- lib/svmkit/ensemble/ada_boost_classifier.rb
Overview
AdaBoostClassifier is a class that implements AdaBoost (SAMME.R) for classification. This class uses decision tree for a weak learner.
Reference
-
Zhu, S. Rosset, H. Zou, and T.Hashie, “Multi-class AdaBoost,” Technical Report No. 430, Department of Statistics, University of Michigan, 2005.
-
Instance Attribute Summary collapse
-
#classes ⇒ Numo::Int32
readonly
Return the class labels.
-
#estimators ⇒ Array<DecisionTreeClassifier>
readonly
Return the set of estimators.
-
#feature_importances ⇒ Numo::DFloat
readonly
Return the importance for each feature.
-
#rng ⇒ Random
readonly
Return the random generator for random selection of feature index.
Attributes included from Base::BaseEstimator
Instance Method Summary collapse
-
#decision_function(x) ⇒ Numo::DFloat
Calculate confidence scores for samples.
-
#fit(x, y) ⇒ AdaBoostClassifier
Fit the model with given training data.
-
#initialize(n_estimators: 50, criterion: 'gini', max_depth: nil, max_leaf_nodes: nil, min_samples_leaf: 1, max_features: nil, random_seed: nil) ⇒ AdaBoostClassifier
constructor
Create a new classifier with AdaBoost.
-
#marshal_dump ⇒ Hash
Dump marshal data.
-
#marshal_load(obj) ⇒ nil
Load marshal data.
-
#predict(x) ⇒ Numo::Int32
Predict class labels for samples.
-
#predict_proba(x) ⇒ Numo::DFloat
Predict probability for samples.
Methods included from Base::Classifier
Constructor Details
#initialize(n_estimators: 50, criterion: 'gini', max_depth: nil, max_leaf_nodes: nil, min_samples_leaf: 1, max_features: nil, random_seed: nil) ⇒ AdaBoostClassifier
Create a new classifier with AdaBoost.
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
# File 'lib/svmkit/ensemble/ada_boost_classifier.rb', line 55 def initialize(n_estimators: 50, criterion: 'gini', max_depth: nil, max_leaf_nodes: nil, min_samples_leaf: 1, max_features: nil, random_seed: nil) SVMKit::Validation.check_params_type_or_nil(Integer, max_depth: max_depth, max_leaf_nodes: max_leaf_nodes, max_features: max_features, random_seed: random_seed) SVMKit::Validation.check_params_integer(n_estimators: n_estimators, min_samples_leaf: min_samples_leaf) SVMKit::Validation.check_params_string(criterion: criterion) SVMKit::Validation.check_params_positive(n_estimators: n_estimators, max_depth: max_depth, max_leaf_nodes: max_leaf_nodes, min_samples_leaf: min_samples_leaf, max_features: max_features) @params = {} @params[:n_estimators] = n_estimators @params[:criterion] = criterion @params[:max_depth] = max_depth @params[:max_leaf_nodes] = max_leaf_nodes @params[:min_samples_leaf] = min_samples_leaf @params[:max_features] = max_features @params[:random_seed] = random_seed @params[:random_seed] ||= srand @estimators = nil @classes = nil @feature_importances = nil @rng = Random.new(@params[:random_seed]) end |
Instance Attribute Details
#classes ⇒ Numo::Int32 (readonly)
Return the class labels.
32 33 34 |
# File 'lib/svmkit/ensemble/ada_boost_classifier.rb', line 32 def classes @classes end |
#estimators ⇒ Array<DecisionTreeClassifier> (readonly)
Return the set of estimators.
28 29 30 |
# File 'lib/svmkit/ensemble/ada_boost_classifier.rb', line 28 def estimators @estimators end |
#feature_importances ⇒ Numo::DFloat (readonly)
Return the importance for each feature.
36 37 38 |
# File 'lib/svmkit/ensemble/ada_boost_classifier.rb', line 36 def feature_importances @feature_importances end |
#rng ⇒ Random (readonly)
Return the random generator for random selection of feature index.
40 41 42 |
# File 'lib/svmkit/ensemble/ada_boost_classifier.rb', line 40 def rng @rng end |
Instance Method Details
#decision_function(x) ⇒ Numo::DFloat
Calculate confidence scores for samples.
136 137 138 139 140 141 142 143 144 145 146 |
# File 'lib/svmkit/ensemble/ada_boost_classifier.rb', line 136 def decision_function(x) SVMKit::Validation.check_sample_array(x) n_samples, = x.shape n_classes = @classes.size sum_probs = Numo::DFloat.zeros(n_samples, n_classes) @estimators.each do |tree| log_proba = Numo::NMath.log(tree.predict_proba(x).clip(1.0e-15, nil)) sum_probs += (n_classes - 1) * (log_proba - 1.fdiv(n_classes) * Numo::DFloat[log_proba.sum(1)].transpose) end sum_probs /= @estimators.size end |
#fit(x, y) ⇒ AdaBoostClassifier
Fit the model with given training data.
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
# File 'lib/svmkit/ensemble/ada_boost_classifier.rb', line 84 def fit(x, y) # rubocop:disable Metrics/AbcSize SVMKit::Validation.check_sample_array(x) SVMKit::Validation.check_label_array(y) SVMKit::Validation.check_sample_label_size(x, y) ## Initialize some variables. n_samples, n_features = x.shape @estimators = [] @feature_importances = Numo::DFloat.zeros(n_features) @params[:max_features] = n_features unless @params[:max_features].is_a?(Integer) @params[:max_features] = [[1, @params[:max_features]].max, n_features].min @classes = Numo::Int32.asarray(y.to_a.uniq.sort) n_classes = @classes.shape[0] ## Boosting. classes_arr = @classes.to_a y_codes = Numo::DFloat.zeros(n_samples, n_classes) - 1.fdiv(n_classes - 1) n_samples.times { |n| y_codes[n, classes_arr.index(y[n])] = 1.0 } observation_weights = Numo::DFloat.zeros(n_samples) + 1.fdiv(n_samples) @params[:n_estimators].times do |_t| # Fit classfier. ids = weighted_sampling(observation_weights) break if y[ids].to_a.uniq.size != n_classes tree = Tree::DecisionTreeClassifier.new( criterion: @params[:criterion], max_depth: @params[:max_depth], max_leaf_nodes: @params[:max_leaf_nodes], min_samples_leaf: @params[:min_samples_leaf], max_features: @params[:max_features], random_seed: @rng.rand(int_max) ) tree.fit(x[ids, true], y[ids]) # Calculate estimator error. proba = tree.predict_proba(x).clip(1.0e-15, nil) p = Numo::Int32.asarray(Array.new(n_samples) { |n| @classes[proba[n, true].max_index] }) inds = p.ne(y) error = (observation_weights * inds).sum / observation_weights.sum # Store model. @estimators.push(tree) @feature_importances += tree.feature_importances break if error.zero? # Update observation weights. log_proba = Numo::NMath.log(proba) observation_weights *= Numo::NMath.exp(-1.0 * (n_classes - 1).fdiv(n_classes) * (y_codes * log_proba).sum(1)) observation_weights = observation_weights.clip(1.0e-15, nil) sum_observation_weights = observation_weights.sum break if sum_observation_weights.zero? observation_weights /= sum_observation_weights end @feature_importances /= @feature_importances.sum self end |
#marshal_dump ⇒ Hash
Dump marshal data.
174 175 176 177 |
# File 'lib/svmkit/ensemble/ada_boost_classifier.rb', line 174 def marshal_dump { params: @params, estimators: @estimators, classes: @classes, feature_importances: @feature_importances, rng: @rng } end |
#marshal_load(obj) ⇒ nil
Load marshal data.
181 182 183 184 185 186 187 188 |
# File 'lib/svmkit/ensemble/ada_boost_classifier.rb', line 181 def marshal_load(obj) @params = obj[:params] @estimators = obj[:estimators] @classes = obj[:classes] @feature_importances = obj[:feature_importances] @rng = obj[:rng] nil end |
#predict(x) ⇒ Numo::Int32
Predict class labels for samples.
152 153 154 155 156 157 |
# File 'lib/svmkit/ensemble/ada_boost_classifier.rb', line 152 def predict(x) SVMKit::Validation.check_sample_array(x) n_samples, = x.shape probs = decision_function(x) Numo::Int32.asarray(Array.new(n_samples) { |n| @classes[probs[n, true].max_index] }) end |
#predict_proba(x) ⇒ Numo::DFloat
Predict probability for samples.
163 164 165 166 167 168 169 170 |
# File 'lib/svmkit/ensemble/ada_boost_classifier.rb', line 163 def predict_proba(x) SVMKit::Validation.check_sample_array(x) n_classes = @classes.size probs = Numo::NMath.exp(1.fdiv(n_classes - 1) * decision_function(x)) sum_probs = probs.sum(1) probs /= Numo::DFloat[sum_probs].transpose probs end |