Class: Rumale::NaiveBayes::BernoulliNB

Inherits:
BaseNaiveBayes show all
Defined in:
lib/rumale/naive_bayes/naive_bayes.rb

Overview

BernoulliNB is a class that implements Bernoulli Naive Bayes classifier.

Reference

  • C D. Manning, P. Raghavan, and H. Schutze, “Introduction to Information Retrieval,” Cambridge University Press., 2008.

Examples:

estimator = Rumale::NaiveBayes::BernoulliNB.new(smoothing_param: 1.0, bin_threshold: 0.0)
estimator.fit(training_samples, training_labels)
results = estimator.predict(testing_samples)

Instance Attribute Summary collapse

Attributes included from Base::BaseEstimator

#params

Instance Method Summary collapse

Methods inherited from BaseNaiveBayes

#predict, #predict_log_proba, #predict_proba

Methods included from Base::Classifier

#predict, #score

Constructor Details

#initialize(smoothing_param: 1.0, bin_threshold: 0.0) ⇒ BernoulliNB

Create a new classifier with Bernoulli Naive Bayes.

Parameters:

  • smoothing_param (Float) (defaults to: 1.0)

    The Laplace smoothing parameter.

  • bin_threshold (Float) (defaults to: 0.0)

    The threshold for binarizing of features.



245
246
247
248
249
250
251
# File 'lib/rumale/naive_bayes/naive_bayes.rb', line 245

def initialize(smoothing_param: 1.0, bin_threshold: 0.0)
  check_params_float(smoothing_param: smoothing_param, bin_threshold: bin_threshold)
  check_params_positive(smoothing_param: smoothing_param)
  @params = {}
  @params[:smoothing_param] = smoothing_param
  @params[:bin_threshold] = bin_threshold
end

Instance Attribute Details

#class_priorsNumo::DFloat (readonly)

Return the prior probabilities of the classes.

Returns:

  • (Numo::DFloat)

    (shape: [n_classes])



235
236
237
# File 'lib/rumale/naive_bayes/naive_bayes.rb', line 235

def class_priors
  @class_priors
end

#classesNumo::Int32 (readonly)

Return the class labels.

Returns:

  • (Numo::Int32)

    (size: n_classes)



231
232
233
# File 'lib/rumale/naive_bayes/naive_bayes.rb', line 231

def classes
  @classes
end

#feature_probsNumo::DFloat (readonly)

Return the conditional probabilities for features of each class.

Returns:

  • (Numo::DFloat)

    (shape: [n_classes, n_features])



239
240
241
# File 'lib/rumale/naive_bayes/naive_bayes.rb', line 239

def feature_probs
  @feature_probs
end

Instance Method Details

#decision_function(x) ⇒ Numo::DFloat

Calculate confidence scores for samples.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The samples to compute the scores.

Returns:

  • (Numo::DFloat)

    (shape: [n_samples, n_classes]) Confidence scores per sample for each class.



280
281
282
283
284
285
286
287
288
289
290
291
# File 'lib/rumale/naive_bayes/naive_bayes.rb', line 280

def decision_function(x)
  check_sample_array(x)
  n_classes = @classes.size
  bin_x = Numo::DFloat[*x.gt(@params[:bin_threshold])]
  not_bin_x = Numo::DFloat[*x.le(@params[:bin_threshold])]
  log_likelihoods = Array.new(n_classes) do |l|
    Math.log(@class_priors[l]) + (
      (Numo::DFloat[*bin_x] * Numo::NMath.log(@feature_probs[l, true])).sum(1)
      (Numo::DFloat[*not_bin_x] * Numo::NMath.log(1.0 - @feature_probs[l, true])).sum(1))
  end
  Numo::DFloat[*log_likelihoods].transpose
end

#fit(x, y) ⇒ BernoulliNB

Fit the model with given training data.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The training data to be used for fitting the model.

  • y (Numo::Int32)

    (shape: [n_samples]) The categorical variables (e.g. labels) to be used for fitting the model.

Returns:



259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
# File 'lib/rumale/naive_bayes/naive_bayes.rb', line 259

def fit(x, y)
  check_sample_array(x)
  check_label_array(y)
  check_sample_label_size(x, y)
  n_samples, = x.shape
  bin_x = Numo::DFloat[*x.gt(@params[:bin_threshold])]
  @classes = Numo::Int32[*y.to_a.uniq.sort]
  n_samples_each_class = Numo::DFloat[*@classes.to_a.map { |l| y.eq(l).count.to_f }]
  @class_priors = n_samples_each_class / n_samples
  count_features = Numo::DFloat[*@classes.to_a.map { |l| bin_x[y.eq(l).where, true].sum(0) }]
  count_features += @params[:smoothing_param]
  n_samples_each_class += 2.0 * @params[:smoothing_param]
  n_classes = @classes.size
  @feature_probs = count_features / n_samples_each_class.reshape(n_classes, 1)
  self
end

#marshal_dumpHash

Dump marshal data.

Returns:

  • (Hash)

    The marshal data about BernoulliNB.



296
297
298
299
300
301
# File 'lib/rumale/naive_bayes/naive_bayes.rb', line 296

def marshal_dump
  { params: @params,
    classes: @classes,
    class_priors: @class_priors,
    feature_probs: @feature_probs }
end

#marshal_load(obj) ⇒ nil

Load marshal data.

Returns:

  • (nil)


306
307
308
309
310
311
312
# File 'lib/rumale/naive_bayes/naive_bayes.rb', line 306

def marshal_load(obj)
  @params = obj[:params]
  @classes = obj[:classes]
  @class_priors = obj[:class_priors]
  @feature_probs = obj[:feature_probs]
  nil
end