Class: Basset::AnomalyDetector
- Inherits:
-
Classifier
- Object
- Classifier
- Basset::AnomalyDetector
- Includes:
- YamlSerialization
- Defined in:
- lib/basset/classifier.rb
Overview
A class for anomaly detection.
The purpose of this is to enable a statistical machine learning approach even when you can’t/don’t want to assume that “abnormal” documents will have certain features or fit nicely into classes.
An example use case is an anomaly based IDS where you don’t want to classify different kinds of attacks but instead want to find all events that deviate from an established baseline.
With the default NaiveBayes classification method, uses the log10 of the Bayesian probability of a document belonging to the normal behavior group as a distance measurement; any document with a distance measurement higher than a given threshold is considered anomalous.
Constant Summary
Constants inherited from Classifier
Instance Attribute Summary
Attributes inherited from Classifier
Instance Method Summary collapse
- #anomalous?(text) ⇒ Boolean
-
#anomaly_score(text) ⇒ Object
Gives the number of standard deviations from average.
- #avg_score_of_training_set ⇒ Object
- #classify(text) ⇒ Object
-
#initialize(opts = {}) ⇒ AnomalyDetector
constructor
A new instance of AnomalyDetector.
- #minimum_acceptable_score ⇒ Object
- #normal?(text) ⇒ Boolean
- #reset_memoized_values ⇒ Object
- #score_range_of_training_set ⇒ Object
- #scores_for_training_set ⇒ Object
- #similarity_score(text) ⇒ Object
- #stddev_of_scores_of_training_set ⇒ Object
- #train(*texts) ⇒ Object
- #train_iterative(text) ⇒ Object
Methods included from YamlSerialization
Methods inherited from Classifier
Constructor Details
#initialize(opts = {}) ⇒ AnomalyDetector
Returns a new instance of AnomalyDetector.
111 112 113 114 115 |
# File 'lib/basset/classifier.rb', line 111 def initialize(opts={}) @training_features=[] @updated = true super(opts) end |
Instance Method Details
#anomalous?(text) ⇒ Boolean
121 122 123 |
# File 'lib/basset/classifier.rb', line 121 def anomalous?(text) minimum_acceptable_score > similarity_score(text) end |
#anomaly_score(text) ⇒ Object
Gives the number of standard deviations from average
143 144 145 |
# File 'lib/basset/classifier.rb', line 143 def anomaly_score(text) -1 * similarity_score(text) / stddev_of_scores_of_training_set end |
#avg_score_of_training_set ⇒ Object
155 156 157 |
# File 'lib/basset/classifier.rb', line 155 def avg_score_of_training_set scores_for_training_set.inject(0) { |sum, score| sum += score } / scores_for_training_set.length.to_f end |
#classify(text) ⇒ Object
117 118 119 |
# File 'lib/basset/classifier.rb', line 117 def classify(text) anomalous?(text) ? :anomalous : :normal end |
#minimum_acceptable_score ⇒ Object
170 171 172 |
# File 'lib/basset/classifier.rb', line 170 def minimum_acceptable_score avg_score_of_training_set - (4 * stddev_of_scores_of_training_set) end |
#normal?(text) ⇒ Boolean
125 126 127 |
# File 'lib/basset/classifier.rb', line 125 def normal?(text) !anomalous?(text) end |
#reset_memoized_values ⇒ Object
181 182 183 184 185 |
# File 'lib/basset/classifier.rb', line 181 def reset_memoized_values @memoized_vals_stale = true @stddev_of_scores_of_training_set = nil @scores_for_training_set = nil end |
#score_range_of_training_set ⇒ Object
159 160 161 |
# File 'lib/basset/classifier.rb', line 159 def score_range_of_training_set scores_for_training_set.min .. scores_for_training_set.max end |
#scores_for_training_set ⇒ Object
147 148 149 150 151 152 153 |
# File 'lib/basset/classifier.rb', line 147 def scores_for_training_set unless @scores_for_training_set @scores_for_training_set = @training_features.map { |feature_set| similarity_score_for_features(:normal, feature_set)} stddev_of_scores_of_training_set end @scores_for_training_set end |
#similarity_score(text) ⇒ Object
138 139 140 |
# File 'lib/basset/classifier.rb', line 138 def similarity_score(text) super(:normal, text) end |
#stddev_of_scores_of_training_set ⇒ Object
163 164 165 166 167 168 |
# File 'lib/basset/classifier.rb', line 163 def stddev_of_scores_of_training_set unless @stddev_of_scores_of_training_set @stddev_of_scores_of_training_set = Math.stddev(scores_for_training_set) end @stddev_of_scores_of_training_set end |
#train(*texts) ⇒ Object
129 130 131 132 133 134 135 136 |
# File 'lib/basset/classifier.rb', line 129 def train(*texts) texts.flatten.each do |text| features = features_of(text) @training_features << features train_with_features(:normal, features) end reset_memoized_values end |
#train_iterative(text) ⇒ Object
174 175 176 177 178 179 |
# File 'lib/basset/classifier.rb', line 174 def train_iterative(text) (1 .. 5).each do train(text) break if normal?(text) end end |