Class: ConfusionMatrix

Inherits:

Object

Object
ConfusionMatrix

show all

Defined in:: lib/confusion_matrix.rb

Overview

This class holds the confusion matrix information. It is designed to be called incrementally, as results are obtained from the classifier model.

At any point, statistics may be obtained by calling the relevant methods.

A two-class example is:

Classified      Classified    | 
Positive        Negative      | Actual
------------------------------+------------
    a               b         | Positive
    c               d         | Negative

Statistical methods will be described with reference to this example.

Instance Method Summary collapse

#add_for(actual, prediction, n = 1) ⇒ Object

Adds one result to the matrix for a given (actual, prediction) pair of labels.
#count_for(actual, prediction) ⇒ Integer

Return the count for (actual,prediction) pair.
#f_measure(label = @labels.first) ⇒ Float

The F-measure for a given label is the harmonic mean of the precision and recall for that label.
#false_negative(label = @labels.first) ⇒ Float

Returns the number of instances of the given class label which are incorrectly classified.
#false_positive(label = @labels.first) ⇒ Float

Returns the number of instances incorrectly classified with the given class label.
#false_rate(label = @labels.first) ⇒ Float

The false rate for a given class label is the proportion of instances incorrectly classified as that label, out of all those instances not originally of that label.
#geometric_mean ⇒ Float

The geometric mean is the nth-root of the product of the true_rate for each label.
#initialize(*labels) ⇒ ConfusionMatrix constructor

Creates a new, empty instance of a confusion matrix.
#kappa(label = @labels.first) ⇒ Float

The Kappa statistic compares the observed accuracy with an expected accuracy.
#labels ⇒ Array<String>

Returns a list of labels used in the matrix.
#matthews_correlation(label = @labels.first) ⇒ Float

Matthews Correlation Coefficient is a measure of the quality of binary classifications.
#overall_accuracy ⇒ Float

The overall accuracy is the proportion of instances which are correctly labelled.
#precision(label = @labels.first) ⇒ Float

The precision for a given class label is the proportion of instances classified as that class which are correct.
#prevalence(label = @labels.first) ⇒ Float

The prevalence for a given class label is the proportion of instances which were classified as of that label, out of the total.
#recall(label = @labels.first) ⇒ Float

The recall is another name for the true rate.
#sensitivity(label = @labels.first) ⇒ Float

Sensitivity is another name for the true rate.
#specificity(label = @labels.first) ⇒ Float

The specificity for a given class label is 1 - false_rate(label).
#to_s ⇒ String

Returns the table in a string format, representing the entries as a printable table.
#total ⇒ Integer

Returns the total number of instances referenced in the matrix.
#true_negative(label = @labels.first) ⇒ Integer

Returns the number of instances NOT of the given class label which are correctly classified.
#true_positive(label = @labels.first) ⇒ Integer

Returns the number of instances of the given class label which are correctly classified.
#true_rate(label = @labels.first) ⇒ Float

The true rate for a given class label is the proportion of instances of that class which are correctly classified.

Constructor Details

#initialize(*labels) ⇒ `ConfusionMatrix`

Creates a new, empty instance of a confusion matrix.

Parameters:

labels (<String, Symbol>, ...) —

if provided, makes the matrix use the first label as a default label, and also check all operations use one of the pre-defined labels.

Raises:

(ArgumentError) —

if there are not at least two unique labels, when provided.

# File 'lib/confusion_matrix.rb', line 25

def initialize(*labels)
  @matrix = {}
  @labels = labels.uniq
  if @labels.size == 1
    raise ArgumentError.new("If labels are provided, there must be at least two.")
  else # preset the matrix Hash

    @labels.each do |actual|
      @matrix[actual] = {}
      @labels.each do |predicted|
        @matrix[actual][predicted] = 0
      end
    end
  end
end

Instance Method Details

#add_for(actual, prediction, n = 1) ⇒ `Object`

Adds one result to the matrix for a given (actual, prediction) pair of labels. If the matrix was given a pre-defined list of labels on construction, then these given labels must be from the pre-defined list. If no pre-defined list of labels was used in constructing the matrix, then labels will be added to matrix.

Class labels may be any hashable value, though ideally they are strings or symbols.

Parameters:

actual (String, Symbol) —

is actual class of the instance, which we expect the classifier to predict
prediction (String, Symbol) —

is the predicted class of the instance, as output from the classifier
n (Integer) (defaults to: 1) —

number of observations to add

Raises:

(ArgumentError) —

if n is not an Integer
(ArgumentError) —

if actual or predicted are not one of any pre-defined labels in matrix

# File 'lib/confusion_matrix.rb', line 99

def add_for(actual, prediction, n = 1)
  validate_label actual, prediction
  if !@matrix.has_key?(actual)
    @matrix[actual] = {}
  end
  predictions = @matrix[actual]
  if !predictions.has_key?(prediction)
    predictions[prediction] = 0
  end

  unless n.class == Integer and n.positive?
    raise ArgumentError.new("add_for requires n to be a positive Integer, but got #{n}")
  end

  @matrix[actual][prediction] += n
end

#count_for(actual, prediction) ⇒ `Integer`

Return the count for (actual,prediction) pair.

cm = ConfusionMatrix.new
cm.add_for(:pos, :neg)
cm.count_for(:pos, :neg) # => 1

Parameters:

actual (String, Symbol) —

is actual class of the instance, which we expect the classifier to predict
prediction (String, Symbol) —

is the predicted class of the instance, as output from the classifier

Returns:

(Integer) —

number of observations of (actual, prediction) pair

Raises:

(ArgumentError) —

if actual or predicted are not one of any pre-defined labels in matrix

# File 'lib/confusion_matrix.rb', line 77

def count_for(actual, prediction)
  validate_label actual, prediction
  predictions = @matrix.fetch(actual, {})
  predictions.fetch(prediction, 0)
end

#f_measure(label = @labels.first) ⇒ `Float`

The F-measure for a given label is the harmonic mean of the precision and recall for that label.

F = 2*(precision*recall)/(precision+recall)

Parameters:

label (String, Symbol) (defaults to: @labels.first) —

of class to use, defaults to first of any pre-defined labels in matrix

Returns:

(Float) —

value of F-measure

Raises:

(ArgumentError) —

if label is not one of any pre-defined labels in matrix

# File 'lib/confusion_matrix.rb', line 184

def f_measure(label = @labels.first)
  validate_label label
  2*precision(label)*recall(label)/(precision(label) + recall(label))
end

#false_negative(label = @labels.first) ⇒ `Float`

Returns the number of instances of the given class label which are incorrectly classified.

false_negative(:positive) = b

Parameters:

label (String, Symbol) (defaults to: @labels.first) —

of class to use, defaults to first of any pre-defined labels in matrix

Returns:

(Float) —

value of false negative

Raises:

(ArgumentError) —

if label is not one of any pre-defined labels in matrix

# File 'lib/confusion_matrix.rb', line 124

def false_negative(label = @labels.first)
  validate_label label
  predictions = @matrix.fetch(label, {})
  total = 0

  predictions.each_pair do |key, count|
    if key != label 
      total += count
    end
  end

  total
end

#false_positive(label = @labels.first) ⇒ `Float`

Returns the number of instances incorrectly classified with the given class label.

false_positive(:positive) = c

Parameters:

label (String, Symbol) (defaults to: @labels.first) —

of class to use, defaults to first of any pre-defined labels in matrix

Returns:

(Float) —

value of false positive

Raises:

(ArgumentError) —

if label is not one of any pre-defined labels in matrix

# File 'lib/confusion_matrix.rb', line 146

def false_positive(label = @labels.first)
  validate_label label
  total = 0

  @matrix.each_pair do |key, predictions|
    if key != label
      total += predictions.fetch(label, 0)
    end
  end

  total
end

#false_rate(label = @labels.first) ⇒ `Float`

The false rate for a given class label is the proportion of instances incorrectly classified as that label, out of all those instances not originally of that label.

false_rate(:positive) = c/(c+d)

Parameters:

label (String, Symbol) (defaults to: @labels.first) —

of class to use, defaults to first of any pre-defined labels in matrix

Returns:

(Float) —

value of false rate

Raises:

(ArgumentError) —

if label is not one of any pre-defined labels in matrix

# File 'lib/confusion_matrix.rb', line 168

def false_rate(label = @labels.first)
  validate_label label
  fp = false_positive(label)
  tn = true_negative(label)

  divide(fp, fp+tn)
end

#geometric_mean ⇒ `Float`

The geometric mean is the nth-root of the product of the true_rate for each label.

a1 = a/(a+b)
a2 = d/(c+d)
geometric_mean = Math.sqrt(a1*a2)

Returns:

(Float) —

value of geometric mean

# File 'lib/confusion_matrix.rb', line 197

def geometric_mean
  product = 1

  @matrix.each_key do |key|
    product *= true_rate(key)
  end

  product**(1.0/@matrix.size)
end

#kappa(label = @labels.first) ⇒ `Float`

The Kappa statistic compares the observed accuracy with an expected accuracy.

Parameters:

label (String, Symbol) (defaults to: @labels.first) —

of class to use, defaults to first of any pre-defined labels in matrix

Returns:

(Float) —

value of Cohen’s Kappa Statistic

Raises:

(ArgumentError) —

if label is not one of any pre-defined labels in matrix

# File 'lib/confusion_matrix.rb', line 213

def kappa(label = @labels.first)
  validate_label label
  tp = true_positive(label)
  fn = false_negative(label)
  fp = false_positive(label)
  tn = true_negative(label)
  total = tp+fn+fp+tn

  total_accuracy = divide(tp+tn, tp+tn+fp+fn)
  random_accuracy = divide((tn+fp)*(tn+fn) + (fn+tp)*(fp+tp), total*total)

  divide(total_accuracy - random_accuracy, 1 - random_accuracy)
end

#labels ⇒ `Array<String>`

Returns a list of labels used in the matrix.

cm = ConfusionMatrix.new
cm.add_for(:pos, :neg)
cm.labels # => [:neg, :pos]

Returns:

(Array<String>) —

labels used in the matrix.

# File 'lib/confusion_matrix.rb', line 47

def labels
  if @labels.size >= 2 # if we defined some labels, return them

    @labels
  else
    result = []

    @matrix.each_pair do |key, predictions|
      result << key
      predictions.each_key do |key|
        result << key
      end
    end

    result.uniq.sort
  end
end

#matthews_correlation(label = @labels.first) ⇒ `Float`

Matthews Correlation Coefficient is a measure of the quality of binary classifications.

mathews_correlation(:positive) = (a*d - c*b) / sqrt((a+c)(a+b)(d+c)(d+b))

Parameters:

label (String, Symbol) (defaults to: @labels.first) —

of class to use, defaults to first of any pre-defined labels in matrix

Returns:

(Float) —

value of Matthews Correlation Coefficient

Raises:

(ArgumentError) —

if label is not one of any pre-defined labels in matrix

# File 'lib/confusion_matrix.rb', line 235

def matthews_correlation(label = @labels.first)
  validate_label label
  tp = true_positive(label)
  fn = false_negative(label)
  fp = false_positive(label)
  tn = true_negative(label)

  divide(tp*tn - fp*fn, Math.sqrt((tp+fp)*(tp+fn)*(tn+fp)*(tn+fn)))
end

#overall_accuracy ⇒ `Float`

The overall accuracy is the proportion of instances which are correctly labelled.

overall_accuracy = (a+d)/(a+b+c+d)

Returns:

(Float) —

value of overall accuracy

# File 'lib/confusion_matrix.rb', line 251

def overall_accuracy
  total_correct = 0

  @matrix.each_pair do |key, predictions|
    total_correct += true_positive(key)
  end

  divide(total_correct, total)
end

#precision(label = @labels.first) ⇒ `Float`

The precision for a given class label is the proportion of instances classified as that class which are correct.

precision(:positive) = a/(a+c)

Parameters:

label (String, Symbol) (defaults to: @labels.first) —

of class to use, defaults to first of any pre-defined labels in matrix

Returns:

(Float) —

value of precision

Raises:

(ArgumentError) —

if label is not one of any pre-defined labels in matrix

# File 'lib/confusion_matrix.rb', line 269

def precision(label = @labels.first)
  validate_label label
  tp = true_positive(label)
  fp = false_positive(label)

  divide(tp, tp+fp)
end

#prevalence(label = @labels.first) ⇒ `Float`

The prevalence for a given class label is the proportion of instances which were classified as of that label, out of the total.

prevalence(:positive) = (a+c)/(a+b+c+d)

Parameters:

label (String, Symbol) (defaults to: @labels.first) —

of class to use, defaults to first of any pre-defined labels in matrix

Returns:

(Float) —

value of prevalence

Raises:

(ArgumentError) —

if label is not one of any pre-defined labels in matrix

# File 'lib/confusion_matrix.rb', line 285

def prevalence(label = @labels.first)
  validate_label label
  tp = true_positive(label)
  fn = false_negative(label)
  fp = false_positive(label)
  tn = true_negative(label)
  total = tp+fn+fp+tn

  divide(tp+fn, total)
end

#recall(label = @labels.first) ⇒ `Float`

The recall is another name for the true rate.

Parameters:

label (String, Symbol) (defaults to: @labels.first) —

of class to use, defaults to first of any pre-defined labels in matrix

Returns:

(Float) —

proportion of instances which are correctly classified

Raises:

(ArgumentError) —

if label is not one of any pre-defined labels in matrix

#sensitivity(label = @labels.first) ⇒ `Float`

Sensitivity is another name for the true rate.

Parameters:

label (String, Symbol) (defaults to: @labels.first) —

of class to use, defaults to first of any pre-defined labels in matrix

Returns:

(Float) —

proportion of instances which are correctly classified

Raises:

(ArgumentError) —

if label is not one of any pre-defined labels in matrix

#specificity(label = @labels.first) ⇒ `Float`

The specificity for a given class label is 1 - false_rate(label)

In two-class case, specificity = 1 - false_positive_rate

Parameters:

label (String, Symbol) (defaults to: @labels.first) —

of class to use, defaults to first of any pre-defined labels in matrix

Returns:

(Float) —

value of specificity

Raises:

(ArgumentError) —

if label is not one of any pre-defined labels in matrix

# File 'lib/confusion_matrix.rb', line 325

def specificity(label = @labels.first)
  validate_label label
  1-false_rate(label)
end

#to_s ⇒ `String`

Returns the table in a string format, representing the entries as a printable table.

Returns:

(String) —

representation as a printable table.

# File 'lib/confusion_matrix.rb', line 334

def to_s
  ls = labels
  result = ""

  title_line = "Predicted " 
  label_line = ""
  ls.each { |l| label_line << "#{l} " }
  label_line << " " while label_line.size < title_line.size
  title_line << " " while title_line.size < label_line.size
  result << title_line << "|\n" << label_line << "| Actual\n"
  result << "-"*title_line.size << "+-------\n"

  ls.each do |l|
    count_line = ""
    ls.each_with_index do |m, i|
      count_line << "#{count_for(l, m)}".rjust(labels[i].size) << " "
    end
    result << count_line.ljust(title_line.size) << "| #{l}\n"
  end

  result
end

#total ⇒ `Integer`

Returns the total number of instances referenced in the matrix.

total = a+b+c+d

Returns:

(Integer) —

total number of instances referenced in the matrix.

# File 'lib/confusion_matrix.rb', line 362

def total
  total = 0

  @matrix.each_value do |predictions|
    predictions.each_value do |count|
      total += count
    end
  end

  total
end

#true_negative(label = @labels.first) ⇒ `Integer`

Returns the number of instances NOT of the given class label which are correctly classified.

true_negative(:positive) = d

Parameters:

label (String, Symbol) (defaults to: @labels.first) —

of class to use, defaults to first of any pre-defined labels in matrix

Returns:

(Integer) —

number of instances not of given label which are correctly classified

Raises:

(ArgumentError) —

if label is not one of any pre-defined labels in matrix

# File 'lib/confusion_matrix.rb', line 382

def true_negative(label = @labels.first)
  validate_label label
  total = 0

  @matrix.each_pair do |key, predictions|
    if key != label 
      total += predictions.fetch(key, 0)
    end
  end

  total
end

#true_positive(label = @labels.first) ⇒ `Integer`

Returns the number of instances of the given class label which are correctly classified.

true_positive(:positive) = a

Parameters:

label (String, Symbol) (defaults to: @labels.first) —

of class to use, defaults to first of any pre-defined labels in matrix

Returns:

(Integer) —

number of instances of given label which are correctly classified

Raises:

(ArgumentError) —

if label is not one of any pre-defined labels in matrix

# File 'lib/confusion_matrix.rb', line 403

def true_positive(label = @labels.first)
  validate_label label
  predictions = @matrix.fetch(label, {})
  predictions.fetch(label, 0)
end

#true_rate(label = @labels.first) ⇒ `Float`

The true rate for a given class label is the proportion of instances of that class which are correctly classified.

true_rate(:positive) = a/(a+b)

Parameters:

label (String, Symbol) (defaults to: @labels.first) —

of class to use, defaults to first of any pre-defined labels in matrix

Returns:

(Float) —

proportion of instances which are correctly classified

Raises:

(ArgumentError) —

if label is not one of any pre-defined labels in matrix

# File 'lib/confusion_matrix.rb', line 417

def true_rate(label = @labels.first)
  validate_label label
  tp = true_positive(label)
  fn = false_negative(label)

  divide(tp, tp+fn)
end

Class: ConfusionMatrix

Overview

Instance Method Summary collapse

Constructor Details

#initialize(*labels) ⇒ ConfusionMatrix

Instance Method Details

#add_for(actual, prediction, n = 1) ⇒ Object

#count_for(actual, prediction) ⇒ Integer

#f_measure(label = @labels.first) ⇒ Float

#false_negative(label = @labels.first) ⇒ Float

#false_positive(label = @labels.first) ⇒ Float

#false_rate(label = @labels.first) ⇒ Float

#geometric_mean ⇒ Float

#kappa(label = @labels.first) ⇒ Float

#labels ⇒ Array<String>

#matthews_correlation(label = @labels.first) ⇒ Float

#overall_accuracy ⇒ Float

#precision(label = @labels.first) ⇒ Float

#prevalence(label = @labels.first) ⇒ Float

#recall(label = @labels.first) ⇒ Float

#sensitivity(label = @labels.first) ⇒ Float

#specificity(label = @labels.first) ⇒ Float

#to_s ⇒ String

#total ⇒ Integer

#true_negative(label = @labels.first) ⇒ Integer

#true_positive(label = @labels.first) ⇒ Integer

#true_rate(label = @labels.first) ⇒ Float

#initialize(*labels) ⇒ `ConfusionMatrix`

#add_for(actual, prediction, n = 1) ⇒ `Object`

#count_for(actual, prediction) ⇒ `Integer`

#f_measure(label = @labels.first) ⇒ `Float`

#false_negative(label = @labels.first) ⇒ `Float`

#false_positive(label = @labels.first) ⇒ `Float`

#false_rate(label = @labels.first) ⇒ `Float`

#geometric_mean ⇒ `Float`

#kappa(label = @labels.first) ⇒ `Float`

#labels ⇒ `Array<String>`

#matthews_correlation(label = @labels.first) ⇒ `Float`

#overall_accuracy ⇒ `Float`

#precision(label = @labels.first) ⇒ `Float`

#prevalence(label = @labels.first) ⇒ `Float`

#recall(label = @labels.first) ⇒ `Float`

#sensitivity(label = @labels.first) ⇒ `Float`

#specificity(label = @labels.first) ⇒ `Float`

#to_s ⇒ `String`

#total ⇒ `Integer`

#true_negative(label = @labels.first) ⇒ `Integer`

#true_positive(label = @labels.first) ⇒ `Integer`

#true_rate(label = @labels.first) ⇒ `Float`