Class: ConfusionMatrix
- Inherits:
-
Object
- Object
- ConfusionMatrix
- Defined in:
- lib/confusion_matrix.rb
Overview
This class holds the confusion matrix information. It is designed to be called incrementally, as results are obtained from the classifier model.
At any point, statistics may be obtained by calling the relevant methods.
A two-class example is:
Classified Classified |
Positive Negative | Actual
------------------------------+------------
a b | Positive
c d | Negative
Statistical methods will be described with reference to this example.
Instance Method Summary collapse
-
#add_for(actual, prediction, n = 1) ⇒ Object
Adds one result to the matrix for a given (actual, prediction) pair of labels.
-
#count_for(actual, prediction) ⇒ Integer
Return the count for (actual,prediction) pair.
-
#f_measure(label = @labels.first) ⇒ Float
The F-measure for a given label is the harmonic mean of the precision and recall for that label.
-
#false_negative(label = @labels.first) ⇒ Float
Returns the number of instances of the given class label which are incorrectly classified.
-
#false_positive(label = @labels.first) ⇒ Float
Returns the number of instances incorrectly classified with the given class label.
-
#false_rate(label = @labels.first) ⇒ Float
The false rate for a given class label is the proportion of instances incorrectly classified as that label, out of all those instances not originally of that label.
-
#geometric_mean ⇒ Float
The geometric mean is the nth-root of the product of the true_rate for each label.
-
#initialize(*labels) ⇒ ConfusionMatrix
constructor
Creates a new, empty instance of a confusion matrix.
-
#kappa(label = @labels.first) ⇒ Float
The Kappa statistic compares the observed accuracy with an expected accuracy.
-
#labels ⇒ Array<String>
Returns a list of labels used in the matrix.
-
#matthews_correlation(label = @labels.first) ⇒ Float
Matthews Correlation Coefficient is a measure of the quality of binary classifications.
-
#overall_accuracy ⇒ Float
The overall accuracy is the proportion of instances which are correctly labelled.
-
#precision(label = @labels.first) ⇒ Float
The precision for a given class label is the proportion of instances classified as that class which are correct.
-
#prevalence(label = @labels.first) ⇒ Float
The prevalence for a given class label is the proportion of instances which were classified as of that label, out of the total.
-
#recall(label = @labels.first) ⇒ Float
The recall is another name for the true rate.
-
#sensitivity(label = @labels.first) ⇒ Float
Sensitivity is another name for the true rate.
-
#specificity(label = @labels.first) ⇒ Float
The specificity for a given class label is 1 - false_rate(label).
-
#to_s ⇒ String
Returns the table in a string format, representing the entries as a printable table.
-
#total ⇒ Integer
Returns the total number of instances referenced in the matrix.
-
#true_negative(label = @labels.first) ⇒ Integer
Returns the number of instances NOT of the given class label which are correctly classified.
-
#true_positive(label = @labels.first) ⇒ Integer
Returns the number of instances of the given class label which are correctly classified.
-
#true_rate(label = @labels.first) ⇒ Float
The true rate for a given class label is the proportion of instances of that class which are correctly classified.
Constructor Details
#initialize(*labels) ⇒ ConfusionMatrix
Creates a new, empty instance of a confusion matrix.
25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
# File 'lib/confusion_matrix.rb', line 25 def initialize(*labels) @matrix = {} @labels = labels.uniq if @labels.size == 1 raise ArgumentError.new("If labels are provided, there must be at least two.") else # preset the matrix Hash @labels.each do |actual| @matrix[actual] = {} @labels.each do |predicted| @matrix[actual][predicted] = 0 end end end end |
Instance Method Details
#add_for(actual, prediction, n = 1) ⇒ Object
Adds one result to the matrix for a given (actual, prediction) pair of labels. If the matrix was given a pre-defined list of labels on construction, then these given labels must be from the pre-defined list. If no pre-defined list of labels was used in constructing the matrix, then labels will be added to matrix.
Class labels may be any hashable value, though ideally they are strings or symbols.
99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
# File 'lib/confusion_matrix.rb', line 99 def add_for(actual, prediction, n = 1) validate_label actual, prediction if !@matrix.has_key?(actual) @matrix[actual] = {} end predictions = @matrix[actual] if !predictions.has_key?(prediction) predictions[prediction] = 0 end unless n.class == Integer and n.positive? raise ArgumentError.new("add_for requires n to be a positive Integer, but got #{n}") end @matrix[actual][prediction] += n end |
#count_for(actual, prediction) ⇒ Integer
Return the count for (actual,prediction) pair.
cm = ConfusionMatrix.new
cm.add_for(:pos, :neg)
cm.count_for(:pos, :neg) # => 1
77 78 79 80 81 |
# File 'lib/confusion_matrix.rb', line 77 def count_for(actual, prediction) validate_label actual, prediction predictions = @matrix.fetch(actual, {}) predictions.fetch(prediction, 0) end |
#f_measure(label = @labels.first) ⇒ Float
The F-measure for a given label is the harmonic mean of the precision and recall for that label.
F = 2*(precision*recall)/(precision+recall)
184 185 186 187 |
# File 'lib/confusion_matrix.rb', line 184 def f_measure(label = @labels.first) validate_label label 2*precision(label)*recall(label)/(precision(label) + recall(label)) end |
#false_negative(label = @labels.first) ⇒ Float
Returns the number of instances of the given class label which are incorrectly classified.
false_negative(:positive) = b
124 125 126 127 128 129 130 131 132 133 134 135 136 |
# File 'lib/confusion_matrix.rb', line 124 def false_negative(label = @labels.first) validate_label label predictions = @matrix.fetch(label, {}) total = 0 predictions.each_pair do |key, count| if key != label total += count end end total end |
#false_positive(label = @labels.first) ⇒ Float
Returns the number of instances incorrectly classified with the given class label.
false_positive(:positive) = c
146 147 148 149 150 151 152 153 154 155 156 157 |
# File 'lib/confusion_matrix.rb', line 146 def false_positive(label = @labels.first) validate_label label total = 0 @matrix.each_pair do |key, predictions| if key != label total += predictions.fetch(label, 0) end end total end |
#false_rate(label = @labels.first) ⇒ Float
The false rate for a given class label is the proportion of instances incorrectly classified as that label, out of all those instances not originally of that label.
false_rate(:positive) = c/(c+d)
168 169 170 171 172 173 174 |
# File 'lib/confusion_matrix.rb', line 168 def false_rate(label = @labels.first) validate_label label fp = false_positive(label) tn = true_negative(label) divide(fp, fp+tn) end |
#geometric_mean ⇒ Float
The geometric mean is the nth-root of the product of the true_rate for each label.
a1 = a/(a+b)
a2 = d/(c+d)
geometric_mean = Math.sqrt(a1*a2)
197 198 199 200 201 202 203 204 205 |
# File 'lib/confusion_matrix.rb', line 197 def geometric_mean product = 1 @matrix.each_key do |key| product *= true_rate(key) end product**(1.0/@matrix.size) end |
#kappa(label = @labels.first) ⇒ Float
The Kappa statistic compares the observed accuracy with an expected accuracy.
213 214 215 216 217 218 219 220 221 222 223 224 225 |
# File 'lib/confusion_matrix.rb', line 213 def kappa(label = @labels.first) validate_label label tp = true_positive(label) fn = false_negative(label) fp = false_positive(label) tn = true_negative(label) total = tp+fn+fp+tn total_accuracy = divide(tp+tn, tp+tn+fp+fn) random_accuracy = divide((tn+fp)*(tn+fn) + (fn+tp)*(fp+tp), total*total) divide(total_accuracy - random_accuracy, 1 - random_accuracy) end |
#labels ⇒ Array<String>
Returns a list of labels used in the matrix.
cm = ConfusionMatrix.new
cm.add_for(:pos, :neg)
cm.labels # => [:neg, :pos]
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
# File 'lib/confusion_matrix.rb', line 47 def labels if @labels.size >= 2 # if we defined some labels, return them @labels else result = [] @matrix.each_pair do |key, predictions| result << key predictions.each_key do |key| result << key end end result.uniq.sort end end |
#matthews_correlation(label = @labels.first) ⇒ Float
Matthews Correlation Coefficient is a measure of the quality of binary classifications.
mathews_correlation(:positive) = (a*d - c*b) / sqrt((a+c)(a+b)(d+c)(d+b))
235 236 237 238 239 240 241 242 243 |
# File 'lib/confusion_matrix.rb', line 235 def matthews_correlation(label = @labels.first) validate_label label tp = true_positive(label) fn = false_negative(label) fp = false_positive(label) tn = true_negative(label) divide(tp*tn - fp*fn, Math.sqrt((tp+fp)*(tp+fn)*(tn+fp)*(tn+fn))) end |
#overall_accuracy ⇒ Float
The overall accuracy is the proportion of instances which are correctly labelled.
overall_accuracy = (a+d)/(a+b+c+d)
251 252 253 254 255 256 257 258 259 |
# File 'lib/confusion_matrix.rb', line 251 def overall_accuracy total_correct = 0 @matrix.each_pair do |key, predictions| total_correct += true_positive(key) end divide(total_correct, total) end |
#precision(label = @labels.first) ⇒ Float
The precision for a given class label is the proportion of instances classified as that class which are correct.
precision(:positive) = a/(a+c)
269 270 271 272 273 274 275 |
# File 'lib/confusion_matrix.rb', line 269 def precision(label = @labels.first) validate_label label tp = true_positive(label) fp = false_positive(label) divide(tp, tp+fp) end |
#prevalence(label = @labels.first) ⇒ Float
The prevalence for a given class label is the proportion of instances which were classified as of that label, out of the total.
prevalence(:positive) = (a+c)/(a+b+c+d)
285 286 287 288 289 290 291 292 293 294 |
# File 'lib/confusion_matrix.rb', line 285 def prevalence(label = @labels.first) validate_label label tp = true_positive(label) fn = false_negative(label) fp = false_positive(label) tn = true_negative(label) total = tp+fn+fp+tn divide(tp+fn, total) end |
#recall(label = @labels.first) ⇒ Float
The recall is another name for the true rate.
302 303 304 305 |
# File 'lib/confusion_matrix.rb', line 302 def recall(label = @labels.first) validate_label label true_rate(label) end |
#sensitivity(label = @labels.first) ⇒ Float
Sensitivity is another name for the true rate.
313 314 315 316 |
# File 'lib/confusion_matrix.rb', line 313 def sensitivity(label = @labels.first) validate_label label true_rate(label) end |
#specificity(label = @labels.first) ⇒ Float
The specificity for a given class label is 1 - false_rate(label)
In two-class case, specificity = 1 - false_positive_rate
325 326 327 328 |
# File 'lib/confusion_matrix.rb', line 325 def specificity(label = @labels.first) validate_label label 1-false_rate(label) end |
#to_s ⇒ String
Returns the table in a string format, representing the entries as a printable table.
334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 |
# File 'lib/confusion_matrix.rb', line 334 def to_s ls = labels result = "" title_line = "Predicted " label_line = "" ls.each { |l| label_line << "#{l} " } label_line << " " while label_line.size < title_line.size title_line << " " while title_line.size < label_line.size result << title_line << "|\n" << label_line << "| Actual\n" result << "-"*title_line.size << "+-------\n" ls.each do |l| count_line = "" ls.each_with_index do |m, i| count_line << "#{count_for(l, m)}".rjust(labels[i].size) << " " end result << count_line.ljust(title_line.size) << "| #{l}\n" end result end |
#total ⇒ Integer
Returns the total number of instances referenced in the matrix.
total = a+b+c+d
362 363 364 365 366 367 368 369 370 371 372 |
# File 'lib/confusion_matrix.rb', line 362 def total total = 0 @matrix.each_value do |predictions| predictions.each_value do |count| total += count end end total end |
#true_negative(label = @labels.first) ⇒ Integer
Returns the number of instances NOT of the given class label which are correctly classified.
true_negative(:positive) = d
382 383 384 385 386 387 388 389 390 391 392 393 |
# File 'lib/confusion_matrix.rb', line 382 def true_negative(label = @labels.first) validate_label label total = 0 @matrix.each_pair do |key, predictions| if key != label total += predictions.fetch(key, 0) end end total end |
#true_positive(label = @labels.first) ⇒ Integer
Returns the number of instances of the given class label which are correctly classified.
true_positive(:positive) = a
403 404 405 406 407 |
# File 'lib/confusion_matrix.rb', line 403 def true_positive(label = @labels.first) validate_label label predictions = @matrix.fetch(label, {}) predictions.fetch(label, 0) end |
#true_rate(label = @labels.first) ⇒ Float
The true rate for a given class label is the proportion of instances of that class which are correctly classified.
true_rate(:positive) = a/(a+b)
417 418 419 420 421 422 423 |
# File 'lib/confusion_matrix.rb', line 417 def true_rate(label = @labels.first) validate_label label tp = true_positive(label) fn = false_negative(label) divide(tp, tp+fn) end |