Class: ConfusionMatrix

Inherits:
Object
  • Object
show all
Defined in:
lib/confusion_matrix.rb

Overview

This class holds the confusion matrix information. It is designed to be called incrementally, as results are obtained from the classifier model.

At any point, statistics may be obtained by calling the relevant methods.

A two-class example is:

Classified      Classified    | 
Positive        Negative      | Actual
------------------------------+------------
    a               b         | Positive
    c               d         | Negative

Statistical methods will be described with reference to this example.

Instance Method Summary collapse

Constructor Details

#initialize(*labels) ⇒ ConfusionMatrix

Creates a new, empty instance of a confusion matrix.

Parameters:

  • labels (<String, Symbol>, ...)

    if provided, makes the matrix use the first label as a default label, and also check all operations use one of the pre-defined labels.

Raises:

  • (ArgumentError)

    if there are not at least two unique labels, when provided.



25
26
27
28
29
30
31
32
33
34
35
36
37
38
# File 'lib/confusion_matrix.rb', line 25

def initialize(*labels)
  @matrix = {}
  @labels = labels.uniq
  if @labels.size == 1
    raise ArgumentError.new("If labels are provided, there must be at least two.")
  else # preset the matrix Hash

    @labels.each do |actual|
      @matrix[actual] = {}
      @labels.each do |predicted|
        @matrix[actual][predicted] = 0
      end
    end
  end
end

Instance Method Details

#add_for(actual, prediction, n = 1) ⇒ Object

Adds one result to the matrix for a given (actual, prediction) pair of labels. If the matrix was given a pre-defined list of labels on construction, then these given labels must be from the pre-defined list. If no pre-defined list of labels was used in constructing the matrix, then labels will be added to matrix.

Class labels may be any hashable value, though ideally they are strings or symbols.

Parameters:

  • actual (String, Symbol)

    is actual class of the instance, which we expect the classifier to predict

  • prediction (String, Symbol)

    is the predicted class of the instance, as output from the classifier

  • n (Integer) (defaults to: 1)

    number of observations to add

Raises:

  • (ArgumentError)

    if n is not an Integer

  • (ArgumentError)

    if actual or predicted are not one of any pre-defined labels in matrix



99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
# File 'lib/confusion_matrix.rb', line 99

def add_for(actual, prediction, n = 1)
  validate_label actual, prediction
  if !@matrix.has_key?(actual)
    @matrix[actual] = {}
  end
  predictions = @matrix[actual]
  if !predictions.has_key?(prediction)
    predictions[prediction] = 0
  end

  unless n.class == Integer and n.positive?
    raise ArgumentError.new("add_for requires n to be a positive Integer, but got #{n}")
  end

  @matrix[actual][prediction] += n
end

#count_for(actual, prediction) ⇒ Integer

Return the count for (actual,prediction) pair.

cm = ConfusionMatrix.new
cm.add_for(:pos, :neg)
cm.count_for(:pos, :neg) # => 1

Parameters:

  • actual (String, Symbol)

    is actual class of the instance, which we expect the classifier to predict

  • prediction (String, Symbol)

    is the predicted class of the instance, as output from the classifier

Returns:

  • (Integer)

    number of observations of (actual, prediction) pair

Raises:

  • (ArgumentError)

    if actual or predicted are not one of any pre-defined labels in matrix



77
78
79
80
81
# File 'lib/confusion_matrix.rb', line 77

def count_for(actual, prediction)
  validate_label actual, prediction
  predictions = @matrix.fetch(actual, {})
  predictions.fetch(prediction, 0)
end

#f_measure(label = @labels.first) ⇒ Float

The F-measure for a given label is the harmonic mean of the precision and recall for that label.

F = 2*(precision*recall)/(precision+recall)

Parameters:

  • label (String, Symbol) (defaults to: @labels.first)

    of class to use, defaults to first of any pre-defined labels in matrix

Returns:

  • (Float)

    value of F-measure

Raises:

  • (ArgumentError)

    if label is not one of any pre-defined labels in matrix



184
185
186
187
# File 'lib/confusion_matrix.rb', line 184

def f_measure(label = @labels.first)
  validate_label label
  2*precision(label)*recall(label)/(precision(label) + recall(label))
end

#false_negative(label = @labels.first) ⇒ Float

Returns the number of instances of the given class label which are incorrectly classified.

false_negative(:positive) = b

Parameters:

  • label (String, Symbol) (defaults to: @labels.first)

    of class to use, defaults to first of any pre-defined labels in matrix

Returns:

  • (Float)

    value of false negative

Raises:

  • (ArgumentError)

    if label is not one of any pre-defined labels in matrix



124
125
126
127
128
129
130
131
132
133
134
135
136
# File 'lib/confusion_matrix.rb', line 124

def false_negative(label = @labels.first)
  validate_label label
  predictions = @matrix.fetch(label, {})
  total = 0

  predictions.each_pair do |key, count|
    if key != label 
      total += count
    end
  end

  total
end

#false_positive(label = @labels.first) ⇒ Float

Returns the number of instances incorrectly classified with the given class label.

false_positive(:positive) = c

Parameters:

  • label (String, Symbol) (defaults to: @labels.first)

    of class to use, defaults to first of any pre-defined labels in matrix

Returns:

  • (Float)

    value of false positive

Raises:

  • (ArgumentError)

    if label is not one of any pre-defined labels in matrix



146
147
148
149
150
151
152
153
154
155
156
157
# File 'lib/confusion_matrix.rb', line 146

def false_positive(label = @labels.first)
  validate_label label
  total = 0

  @matrix.each_pair do |key, predictions|
    if key != label
      total += predictions.fetch(label, 0)
    end
  end

  total
end

#false_rate(label = @labels.first) ⇒ Float

The false rate for a given class label is the proportion of instances incorrectly classified as that label, out of all those instances not originally of that label.

false_rate(:positive) = c/(c+d)

Parameters:

  • label (String, Symbol) (defaults to: @labels.first)

    of class to use, defaults to first of any pre-defined labels in matrix

Returns:

  • (Float)

    value of false rate

Raises:

  • (ArgumentError)

    if label is not one of any pre-defined labels in matrix



168
169
170
171
172
173
174
# File 'lib/confusion_matrix.rb', line 168

def false_rate(label = @labels.first)
  validate_label label
  fp = false_positive(label)
  tn = true_negative(label)

  divide(fp, fp+tn)
end

#geometric_meanFloat

The geometric mean is the nth-root of the product of the true_rate for each label.

a1 = a/(a+b)
a2 = d/(c+d)
geometric_mean = Math.sqrt(a1*a2)

Returns:

  • (Float)

    value of geometric mean



197
198
199
200
201
202
203
204
205
# File 'lib/confusion_matrix.rb', line 197

def geometric_mean
  product = 1

  @matrix.each_key do |key|
    product *= true_rate(key)
  end

  product**(1.0/@matrix.size)
end

#kappa(label = @labels.first) ⇒ Float

The Kappa statistic compares the observed accuracy with an expected accuracy.

Parameters:

  • label (String, Symbol) (defaults to: @labels.first)

    of class to use, defaults to first of any pre-defined labels in matrix

Returns:

  • (Float)

    value of Cohen’s Kappa Statistic

Raises:

  • (ArgumentError)

    if label is not one of any pre-defined labels in matrix



213
214
215
216
217
218
219
220
221
222
223
224
225
# File 'lib/confusion_matrix.rb', line 213

def kappa(label = @labels.first)
  validate_label label
  tp = true_positive(label)
  fn = false_negative(label)
  fp = false_positive(label)
  tn = true_negative(label)
  total = tp+fn+fp+tn

  total_accuracy = divide(tp+tn, tp+tn+fp+fn)
  random_accuracy = divide((tn+fp)*(tn+fn) + (fn+tp)*(fp+tp), total*total)

  divide(total_accuracy - random_accuracy, 1 - random_accuracy)
end

#labelsArray<String>

Returns a list of labels used in the matrix.

cm = ConfusionMatrix.new
cm.add_for(:pos, :neg)
cm.labels # => [:neg, :pos]

Returns:

  • (Array<String>)

    labels used in the matrix.



47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
# File 'lib/confusion_matrix.rb', line 47

def labels
  if @labels.size >= 2 # if we defined some labels, return them

    @labels
  else
    result = []

    @matrix.each_pair do |key, predictions|
      result << key
      predictions.each_key do |key|
        result << key
      end
    end

    result.uniq.sort
  end
end

#matthews_correlation(label = @labels.first) ⇒ Float

Matthews Correlation Coefficient is a measure of the quality of binary classifications.

mathews_correlation(:positive) = (a*d - c*b) / sqrt((a+c)(a+b)(d+c)(d+b))

Parameters:

  • label (String, Symbol) (defaults to: @labels.first)

    of class to use, defaults to first of any pre-defined labels in matrix

Returns:

  • (Float)

    value of Matthews Correlation Coefficient

Raises:

  • (ArgumentError)

    if label is not one of any pre-defined labels in matrix



235
236
237
238
239
240
241
242
243
# File 'lib/confusion_matrix.rb', line 235

def matthews_correlation(label = @labels.first)
  validate_label label
  tp = true_positive(label)
  fn = false_negative(label)
  fp = false_positive(label)
  tn = true_negative(label)

  divide(tp*tn - fp*fn, Math.sqrt((tp+fp)*(tp+fn)*(tn+fp)*(tn+fn)))
end

#overall_accuracyFloat

The overall accuracy is the proportion of instances which are correctly labelled.

overall_accuracy = (a+d)/(a+b+c+d)

Returns:

  • (Float)

    value of overall accuracy



251
252
253
254
255
256
257
258
259
# File 'lib/confusion_matrix.rb', line 251

def overall_accuracy
  total_correct = 0

  @matrix.each_pair do |key, predictions|
    total_correct += true_positive(key)
  end

  divide(total_correct, total)
end

#precision(label = @labels.first) ⇒ Float

The precision for a given class label is the proportion of instances classified as that class which are correct.

precision(:positive) = a/(a+c)

Parameters:

  • label (String, Symbol) (defaults to: @labels.first)

    of class to use, defaults to first of any pre-defined labels in matrix

Returns:

  • (Float)

    value of precision

Raises:

  • (ArgumentError)

    if label is not one of any pre-defined labels in matrix



269
270
271
272
273
274
275
# File 'lib/confusion_matrix.rb', line 269

def precision(label = @labels.first)
  validate_label label
  tp = true_positive(label)
  fp = false_positive(label)

  divide(tp, tp+fp)
end

#prevalence(label = @labels.first) ⇒ Float

The prevalence for a given class label is the proportion of instances which were classified as of that label, out of the total.

prevalence(:positive) = (a+c)/(a+b+c+d)

Parameters:

  • label (String, Symbol) (defaults to: @labels.first)

    of class to use, defaults to first of any pre-defined labels in matrix

Returns:

  • (Float)

    value of prevalence

Raises:

  • (ArgumentError)

    if label is not one of any pre-defined labels in matrix



285
286
287
288
289
290
291
292
293
294
# File 'lib/confusion_matrix.rb', line 285

def prevalence(label = @labels.first)
  validate_label label
  tp = true_positive(label)
  fn = false_negative(label)
  fp = false_positive(label)
  tn = true_negative(label)
  total = tp+fn+fp+tn

  divide(tp+fn, total)
end

#recall(label = @labels.first) ⇒ Float

The recall is another name for the true rate.

Parameters:

  • label (String, Symbol) (defaults to: @labels.first)

    of class to use, defaults to first of any pre-defined labels in matrix

Returns:

  • (Float)

    proportion of instances which are correctly classified

Raises:

  • (ArgumentError)

    if label is not one of any pre-defined labels in matrix

See Also:



302
303
304
305
# File 'lib/confusion_matrix.rb', line 302

def recall(label = @labels.first)
  validate_label label
  true_rate(label)
end

#sensitivity(label = @labels.first) ⇒ Float

Sensitivity is another name for the true rate.

Parameters:

  • label (String, Symbol) (defaults to: @labels.first)

    of class to use, defaults to first of any pre-defined labels in matrix

Returns:

  • (Float)

    proportion of instances which are correctly classified

Raises:

  • (ArgumentError)

    if label is not one of any pre-defined labels in matrix

See Also:



313
314
315
316
# File 'lib/confusion_matrix.rb', line 313

def sensitivity(label = @labels.first)
  validate_label label
  true_rate(label)
end

#specificity(label = @labels.first) ⇒ Float

The specificity for a given class label is 1 - false_rate(label)

In two-class case, specificity = 1 - false_positive_rate

Parameters:

  • label (String, Symbol) (defaults to: @labels.first)

    of class to use, defaults to first of any pre-defined labels in matrix

Returns:

  • (Float)

    value of specificity

Raises:

  • (ArgumentError)

    if label is not one of any pre-defined labels in matrix



325
326
327
328
# File 'lib/confusion_matrix.rb', line 325

def specificity(label = @labels.first)
  validate_label label
  1-false_rate(label)
end

#to_sString

Returns the table in a string format, representing the entries as a printable table.

Returns:

  • (String)

    representation as a printable table.



334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
# File 'lib/confusion_matrix.rb', line 334

def to_s
  ls = labels
  result = ""

  title_line = "Predicted " 
  label_line = ""
  ls.each { |l| label_line << "#{l} " }
  label_line << " " while label_line.size < title_line.size
  title_line << " " while title_line.size < label_line.size
  result << title_line << "|\n" << label_line << "| Actual\n"
  result << "-"*title_line.size << "+-------\n"

  ls.each do |l|
    count_line = ""
    ls.each_with_index do |m, i|
      count_line << "#{count_for(l, m)}".rjust(labels[i].size) << " "
    end
    result << count_line.ljust(title_line.size) << "| #{l}\n"
  end

  result
end

#totalInteger

Returns the total number of instances referenced in the matrix.

total = a+b+c+d

Returns:

  • (Integer)

    total number of instances referenced in the matrix.



362
363
364
365
366
367
368
369
370
371
372
# File 'lib/confusion_matrix.rb', line 362

def total
  total = 0

  @matrix.each_value do |predictions|
    predictions.each_value do |count|
      total += count
    end
  end

  total
end

#true_negative(label = @labels.first) ⇒ Integer

Returns the number of instances NOT of the given class label which are correctly classified.

true_negative(:positive) = d

Parameters:

  • label (String, Symbol) (defaults to: @labels.first)

    of class to use, defaults to first of any pre-defined labels in matrix

Returns:

  • (Integer)

    number of instances not of given label which are correctly classified

Raises:

  • (ArgumentError)

    if label is not one of any pre-defined labels in matrix



382
383
384
385
386
387
388
389
390
391
392
393
# File 'lib/confusion_matrix.rb', line 382

def true_negative(label = @labels.first)
  validate_label label
  total = 0

  @matrix.each_pair do |key, predictions|
    if key != label 
      total += predictions.fetch(key, 0)
    end
  end

  total
end

#true_positive(label = @labels.first) ⇒ Integer

Returns the number of instances of the given class label which are correctly classified.

true_positive(:positive) = a

Parameters:

  • label (String, Symbol) (defaults to: @labels.first)

    of class to use, defaults to first of any pre-defined labels in matrix

Returns:

  • (Integer)

    number of instances of given label which are correctly classified

Raises:

  • (ArgumentError)

    if label is not one of any pre-defined labels in matrix



403
404
405
406
407
# File 'lib/confusion_matrix.rb', line 403

def true_positive(label = @labels.first)
  validate_label label
  predictions = @matrix.fetch(label, {})
  predictions.fetch(label, 0)
end

#true_rate(label = @labels.first) ⇒ Float

The true rate for a given class label is the proportion of instances of that class which are correctly classified.

true_rate(:positive) = a/(a+b)

Parameters:

  • label (String, Symbol) (defaults to: @labels.first)

    of class to use, defaults to first of any pre-defined labels in matrix

Returns:

  • (Float)

    proportion of instances which are correctly classified

Raises:

  • (ArgumentError)

    if label is not one of any pre-defined labels in matrix



417
418
419
420
421
422
423
# File 'lib/confusion_matrix.rb', line 417

def true_rate(label = @labels.first)
  validate_label label
  tp = true_positive(label)
  fn = false_negative(label)

  divide(tp, tp+fn)
end