OutlierTree Ruby

:deciduous_tree: OutlierTree - explainable outlier/anomaly detection - for Ruby

Produces human-readable explanations for why values are detected as outliers

Price (2.50) looks low given Department is Books and Sale is false

:evergreen_tree: Check out IsoTree for an alternative approach that uses Isolation Forest

Build Status

Installation

Add this line to your application’s Gemfile:

gem "outliertree"

Getting Started

Prep your data

data = [
  {department: "Books",  sale: false, price: 2.50},
  {department: "Books",  sale: true,  price: 3.00},
  {department: "Movies", sale: false, price: 5.00},
  # ...
]

Train a model

model = OutlierTree.new
model.fit(data)

Get outliers

model.outliers(data)

Parameters

Pass parameters - default values below

OutlierTree.new(
  max_depth: 4,
  min_gain: 0.01,
  z_norm: 2.67,
  z_outlier: 8.0,
  pct_outliers: 0.01,
  min_size_numeric: 25,
  min_size_categ: 50,
  categ_split: "binarize",
  categ_outliers: "tail",
  numeric_split: "raw",
  follow_all: false,
  gain_as_pct: true,
  nthreads: -1
)

See a detailed explanation

Data

Data can be an array of hashes

[
  {department: "Books",  sale: false, price: 2.50},
  {department: "Books",  sale: true,  price: 3.00},
  {department: "Movies", sale: false, price: 5.00}
]

Or a Rover data frame

Rover.read_csv("data.csv")

Performance

OutlierTree uses OpenMP when possible for best performance. To enable OpenMP on Mac, run:

brew install libomp

Then reinstall the gem.

gem uninstall outliertree --force
bundle install

Resources

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone --recursive https://github.com/ankane/outliertree-ruby.git
cd outliertree-ruby
bundle install
bundle exec rake compile
bundle exec rake test