Anomaly
Easy-to-use anomaly detection
Installation
Add this line to your application's Gemfile:
gem "anomaly"
And then execute:
bundle install
For max performance (~ 2x faster), also install the NArray gem:
gem "narray"
Anomaly will automatically detect it and use it.
How to Use
Say we have weather data for sunny days and we're trying to detect days that aren't sunny. The data looks like:
# Each row is a different day.
# [temperature (°F), humidity (%), pressure (in)]
weather_data = [
[85, 68, 10.4],
[88, 62, 12.1],
[86, 64, 13.6],
...
]
Train the detector with only non-anomalies (sunny days in our case).
ad = Anomaly::Detector.new(weather_data)
That's it! Let's test for anomalies.
# 79°F, 66% humidity, 12.3 in. pressure
test_sample = [79, 66, 12.3]
ad.probability(test_sample)
# => 7.537174740907633e-08
Super-important: You must select a threshold for anomalies (which we denote with ε - "epsilon")
Probabilities less than ε are considered anomalies. If ε is higher, more things are considered anomalies.
ad.anomaly?(test_sample, 1e-10)
# => false
ad.anomaly?(test_sample, 1e-5)
# => true
The wiki has sample code to help you find the best ε for your application.
Persistence
You can easily persist the detector to a file or database - it's very tiny.
serialized_ad = Marshal.dump(ad)
# Save to a file
File.open("anomaly_detector.dump", "w") {|f| f.write(serialized_ad) }
# ...
# Read it later
ad2 = Marshal.load(File.open("anomaly_detector.dump", "r").read)
TODO
- Train in chunks (for very large datasets)
- Multivariate normal distribution (possibly)
Contributing
- Fork it
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Added some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request
Thanks
A special thanks to Andrew Ng.