Class: Basset::AnomalyDetector

Inherits:
Classifier show all
Includes:
YamlSerialization
Defined in:
lib/basset/classifier.rb

Overview

A class for anomaly detection.

The purpose of this is to enable a statistical machine learning approach even when you can’t/don’t want to assume that “abnormal” documents will have certain features or fit nicely into classes.

An example use case is an anomaly based IDS where you don’t want to classify different kinds of attacks but instead want to find all events that deviate from an established baseline.

With the default NaiveBayes classification method, uses the log10 of the Bayesian probability of a document belonging to the normal behavior group as a distance measurement; any document with a distance measurement higher than a given threshold is considered anomalous.

Constant Summary

Constants inherited from Classifier

Classifier::DEFAULTS

Instance Attribute Summary

Attributes inherited from Classifier

#doctype, #engine

Instance Method Summary collapse

Methods included from YamlSerialization

included, #save_to_file

Methods inherited from Classifier

#==

Constructor Details

#initialize(opts = {}) ⇒ AnomalyDetector

Returns a new instance of AnomalyDetector.



111
112
113
114
115
# File 'lib/basset/classifier.rb', line 111

def initialize(opts={})
  @training_features=[]
  @updated = true
  super(opts)
end

Instance Method Details

#anomalous?(text) ⇒ Boolean

Returns:

  • (Boolean)


121
122
123
# File 'lib/basset/classifier.rb', line 121

def anomalous?(text)
  minimum_acceptable_score > similarity_score(text)
end

#anomaly_score(text) ⇒ Object

Gives the number of standard deviations from average



143
144
145
# File 'lib/basset/classifier.rb', line 143

def anomaly_score(text)
  -1 * similarity_score(text) / stddev_of_scores_of_training_set
end

#avg_score_of_training_setObject



155
156
157
# File 'lib/basset/classifier.rb', line 155

def avg_score_of_training_set
  scores_for_training_set.inject(0) { |sum, score| sum += score } / scores_for_training_set.length.to_f
end

#classify(text) ⇒ Object



117
118
119
# File 'lib/basset/classifier.rb', line 117

def classify(text)
  anomalous?(text) ? :anomalous : :normal
end

#minimum_acceptable_scoreObject



170
171
172
# File 'lib/basset/classifier.rb', line 170

def minimum_acceptable_score
  avg_score_of_training_set - (4 * stddev_of_scores_of_training_set)
end

#normal?(text) ⇒ Boolean

Returns:

  • (Boolean)


125
126
127
# File 'lib/basset/classifier.rb', line 125

def normal?(text)
  !anomalous?(text)
end

#reset_memoized_valuesObject



181
182
183
184
185
# File 'lib/basset/classifier.rb', line 181

def reset_memoized_values
  @memoized_vals_stale = true
  @stddev_of_scores_of_training_set = nil
  @scores_for_training_set = nil
end

#score_range_of_training_setObject



159
160
161
# File 'lib/basset/classifier.rb', line 159

def score_range_of_training_set
  scores_for_training_set.min .. scores_for_training_set.max
end

#scores_for_training_setObject



147
148
149
150
151
152
153
# File 'lib/basset/classifier.rb', line 147

def scores_for_training_set
  unless @scores_for_training_set
    @scores_for_training_set = @training_features.map { |feature_set| similarity_score_for_features(:normal, feature_set)}
    stddev_of_scores_of_training_set
  end
  @scores_for_training_set
end

#similarity_score(text) ⇒ Object



138
139
140
# File 'lib/basset/classifier.rb', line 138

def similarity_score(text)
  super(:normal, text)
end

#stddev_of_scores_of_training_setObject



163
164
165
166
167
168
# File 'lib/basset/classifier.rb', line 163

def stddev_of_scores_of_training_set
  unless @stddev_of_scores_of_training_set
    @stddev_of_scores_of_training_set = Math.stddev(scores_for_training_set)
  end
  @stddev_of_scores_of_training_set
end

#train(*texts) ⇒ Object



129
130
131
132
133
134
135
136
# File 'lib/basset/classifier.rb', line 129

def train(*texts)
  texts.flatten.each do |text|
    features = features_of(text)
    @training_features << features
    train_with_features(:normal, features)
  end
  reset_memoized_values
end

#train_iterative(text) ⇒ Object



174
175
176
177
178
179
# File 'lib/basset/classifier.rb', line 174

def train_iterative(text)
  (1 .. 5).each do
    train(text)
    break if normal?(text)
  end
end