VSS - Vector Space Search
A simple vector space search engine with tf*idf ranking.
More info, and details of how it works.
Install
Just install the gem:
gem install vss
Usage
To perform a search on a collection of documents:
require "vss"
docs = ["hello", "goodbye", "hello and goodbye", "hello, hello!"]
engine = VSS::Engine.new(docs)
engine.search("hello") #=> ["hello", "hello, hello!", "hello and goodbye", "goodbye"]
Rails/ActiveRecord
If you want to search a collection of ActiveRecord
objects, you need to pass a documentizer proc
when initializing VSS::Engine
which will convert the objects into documents (which are simply strings). For example:
class Page < ActiveRecord::Base
#attrs: title, content
end
docs = Page.all
documentizer = proc { |record| record.title + " " + record.content }
engine = VSS::Engine.new(docs, documentizer)
Notes
This isn't designed to be used on huge collections of records. The original use case was for ranking a smallish set of ActiveRecord
results obtained via a query (using SearchLogic). So, essentially, the search consisted of 2 stages; getting the corpus via a SQL query, then doing the VSS on that.
Credits
Heavily inspired by Joesph Wilk's article on building a vector space search engine in Perl.
Written by Mark Dodwell (Design & Code)