Anomaly

Easy-to-use anomaly detection

Installation

Add this line to your application's Gemfile:

gem "anomaly"

And then execute:

bundle install

For max performance (trains ~3x faster for large datasets), also install the NArray gem:

gem "narray"

Anomaly will automatically detect it and use it.

How to Use

Say we have weather data and we want to predict if it's sunny. In this example, sunny days are non-anomalies, and days with other types of weather (rain, snow, etc.) are anomalies. The data looks like:

# [temperature(°F), humidity(%), pressure(in), sunny?(y=0, n=1)]
weather_data = [
  [85, 68, 10.4, 0],
  [88, 62, 12.1, 0],
  [86, 64, 13.6, 0],
  [88, 90, 11.1, 1],
  ...
]

The last column must be 0 for non-anomalies, 1 for anomalies. Non-anomalies are used to train the detector, and both anomalies and non-anomalies are used to find the best value of ε.

To train the detector and test for anomalies, run:

ad = Anomaly::Detector.new(weather_data)

# 85°F, 42% humidity, 12.3 in. pressure
ad.anomaly?([85, 42, 12.3])
# => true

Anomaly automatically finds the best value for ε, which you can access with:

ad.eps

If you already know you want ε = 0.01, initialize the detector with:

ad = Anomaly::Detector.new(weather_data, {:eps => 0.01})

Persistence

You can easily persist the detector to a file or database - it's very tiny.

serialized_ad = Marshal.dump(ad)

# Save to a file
File.open("anomaly_detector.dump", "w") {|f| f.write(serialized_ad) }

# ...

# Read it later
ad2 = Marshal.load(File.open("anomaly_detector.dump", "r").read)

TODO

Train in chunks (for very large datasets)
Multivariate normal distribution (possibly)

Contributing

Fork it
Create your feature branch (git checkout -b my-new-feature)
Commit your changes (git commit -am 'Added some feature')
Push to the branch (git push origin my-new-feature)
Create new Pull Request

Thanks

A special thanks to Andrew Ng.