Neighbor

Nearest neighbor search for Rails and Postgres

Build Status

Installation

Add this line to your application’s Gemfile:

gem 'neighbor'

Choose An Extension

Neighbor supports two extensions: cube and vector. cube ships with Postgres, while vector supports approximate nearest neighbor search.

For cube, run:

rails generate neighbor:cube
rails db:migrate

For vector, install pgvector and run:

rails generate neighbor:vector
rails db:migrate

Getting Started

Create a migration

class AddNeighborVectorToItems < ActiveRecord::Migration[6.1]
  def change
    add_column :items, :neighbor_vector, :cube
    # or
    add_column :items, :neighbor_vector, :vector, limit: 3
  end
end

Add to your model

class Item < ApplicationRecord
  has_neighbors
end

Update the vectors

item.update(neighbor_vector: [1.0, 1.2, 0.5])

Get the nearest neighbors to a record

item.nearest_neighbors(distance: "euclidean").first(5)

Get the nearest neighbors to a vector

Item.nearest_neighbors([0.9, 1.3, 1.1], distance: "euclidean").first(5)

Distance

Supported values are:

  • euclidean
  • cosine
  • taxicab (cube only)
  • chebyshev (cube only)
  • inner_product (vector only)

For cosine distance with cube, vectors must be normalized before being stored.

class Item < ApplicationRecord
  has_neighbors normalize: true
end

For inner product with cube, see this example.

Records returned from nearest_neighbors will have a neighbor_distance attribute

nearest_item = item.nearest_neighbors(distance: "euclidean").first
nearest_item.neighbor_distance

Dimensions

The cube data type is limited 100 dimensions by default. See the Postgres docs for how to increase this. The vector data type is limited to 1024 dimensions.

For cube, it’s a good idea to specify the number of dimensions to ensure all records have the same number.

class Movie < ApplicationRecord
  has_neighbors dimensions: 3
end

Indexing

For vector, add an approximate index to speed up queries. Create a migration with:

class AddIndexToItemsNeighborVector < ActiveRecord::Migration[6.1]
  def change
    add_index :items, :neighbor_vector, using: :ivfflat
  end
end

Add opclass: :vector_cosine_ops for cosine distance and opclass: :vector_ip_ops for inner product.

Set the number of probes

Item.connection.execute("SET ivfflat.probes = 3")

Example

You can use Neighbor for online item-based recommendations with Disco. We’ll use MovieLens data for this example.

Generate a model

rails generate model Movie name:string neighbor_vector:cube
rails db:migrate

And add has_neighbors

class Movie < ApplicationRecord
  has_neighbors dimensions: 20, normalize: true
end

Fit the recommender

data = Disco.load_movielens
recommender = Disco::Recommender.new(factors: 20)
recommender.fit(data)

Use item factors for the neighbor vector

recommender.item_ids.each do |item_id|
  Movie.create!(name: item_id, neighbor_vector: recommender.item_factors(item_id))
end

And get similar movies

movie = Movie.find_by(name: "Star Wars (1977)")
movie.nearest_neighbors(distance: "cosine").first(5).map(&:name)

Complete code

Upgrading

0.2.0

The distance option has been moved from has_neighbors to nearest_neighbors, and there is no longer a default. If you use cosine distance, set:

class Item < ApplicationRecord
  has_neighbors normalize: true
end

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/neighbor.git
cd neighbor
bundle install
bundle exec rake test