Anomaly

Easy-to-use anomaly detection

Installation

Add this line to your application's Gemfile:

gem "anomaly"

And then execute:

bundle install

For max performance (~ 2x faster), also install the NArray gem:

gem "narray"

Anomaly will automatically detect it and use it.

How to Use

Say we have weather data for sunny days and we're trying to detect days that aren't sunny. The data looks like:

# Each row is a different day.
# [temperature (°F), humidity (%), pressure (in)]
weather_data = [
  [85, 68, 10.4],
  [88, 62, 12.1],
  [86, 64, 13.6],
  ...
]

Train the detector with only non-anomalies (sunny days in our case).

ad = Anomaly::Detector.new(weather_data)

That's it! Let's test for anomalies.

# 79°F, 66% humidity, 12.3 in. pressure
test_sample = [79, 66, 12.3]
ad.probability(test_sample)
# => 7.537174740907633e-08

Super-important: You must select a threshold for anomalies (which we denote with ε - "epsilon")

Probabilities less than ε are considered anomalies. If ε is higher, more things are considered anomalies.

ad.anomaly?(test_sample, 1e-10)
# => false
ad.anomaly?(test_sample, 1e-5)
# => true

The wiki has sample code to help you find the best ε for your application.

Persistence

You can easily persist the detector to a file or database - it's very tiny.

serialized_ad = Marshal.dump(ad)

# Save to a file
File.open("anomaly_detector.dump", "w") {|f| f.write(serialized_ad) }

# ...

# Read it later
ad2 = Marshal.load(File.open("anomaly_detector.dump", "r").read)

TODO

  • Train in chunks (for very large datasets)
  • Multivariate normal distribution (possibly)

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Added some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request

Thanks

A special thanks to Andrew Ng.