Class: Idhja22::BinaryClassifier

Inherits:
Object
  • Object
show all
Defined in:
lib/idhja22/binary_classifier.rb

Direct Known Subclasses

Bayes, Tree

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.train(dataset, opts = {}) ⇒ Object

Trains a classifier using the provided Dataset.



6
7
8
9
10
11
# File 'lib/idhja22/binary_classifier.rb', line 6

def train(dataset, opts = {})
  attributes_to_use = (opts[:attributes] || dataset.attribute_labels)
  classifier = new
  classifier.train(dataset, attributes_to_use)
  return classifier
end

.train_and_validate(dataset, opts = {}) ⇒ Object

Takes a dataset and splits it randomly into training and validation data. Uses the training data to train a classifier whose perfomance then measured using the validation data.

Parameters:

  • Proportion (Float)

    of dataset to use for training. The rest will be used to validate the resulting classifier.



16
17
18
19
20
21
22
# File 'lib/idhja22/binary_classifier.rb', line 16

def train_and_validate(dataset, opts = {})
  opts[:"training-proportion"] ||= 0.5
  training_set, validation_set = dataset.split(opts[:"training-proportion"])
  tree = self.train(training_set, opts)
  validation_value = tree.validate(validation_set)
  return tree, validation_value
end

.train_and_validate_from_csv(filename, opts = {}) ⇒ Object

Note:

Takes a CSV filename rather than a Dataset

see #train_and_validate



33
34
35
36
# File 'lib/idhja22/binary_classifier.rb', line 33

def train_and_validate_from_csv(filename, opts={})
  ds = Dataset.from_csv(filename)
  train_and_validate(ds, opts)
end

.train_from_csv(filename, opts = {}) ⇒ Object

Note:

Takes a CSV filename rather than a Dataset

see #train



26
27
28
29
# File 'lib/idhja22/binary_classifier.rb', line 26

def train_from_csv(filename, opts={})
  ds = Dataset.from_csv(filename)
  train(ds, opts)
end

Instance Method Details

#validate(ds) ⇒ Object



39
40
41
42
43
44
45
46
47
48
49
50
51
52
# File 'lib/idhja22/binary_classifier.rb', line 39

def validate(ds)
  output = 0
  ds.data.each do |validation_point|
    begin
      prob = evaluate(validation_point)
      output += (validation_point.category == 'Y' ? prob : 1.0 - prob)
    rescue Idhja22::Dataset::Datum::UnknownAttributeValue
      # if don't recognised the attribute value in the example, then assume the worst:
      # will never classify this point correctly
      # equivalent to output += 0 but no point running this
    end
  end
  return output.to_f/ds.size.to_f
end