Class: Minhash::Algorithm
- Inherits:
-
Object
- Object
- Minhash::Algorithm
- Defined in:
- lib/minhash.rb
Overview
The Minhash signature algorithm.
See section 3.3 of the www.mmds.org/ book: infolab.stanford.edu/~ullman/mmds/ch3.pdf
Simple XORs of random integer bit masks are used as the hash functions.
Instance Attribute Summary collapse
-
#masks ⇒ Object
readonly
Returns the bit masks used to implement the hash functions.
Class Method Summary collapse
-
.create(length) ⇒ Object
Creates a new instance of the algorithm with
lengthrandom bit masks.
Instance Method Summary collapse
-
#initialize(masks) ⇒ Algorithm
constructor
Creates a new instance of the algorithm, with the given bit masks.
-
#signature(tokens) ⇒ Object
Returns the minhash signature for a set of tokens.
Constructor Details
#initialize(masks) ⇒ Algorithm
Creates a new instance of the algorithm, with the given bit masks.
90 91 92 93 |
# File 'lib/minhash.rb', line 90 def initialize(masks) @masks = masks.freeze @hash_functions ||= @masks.map {|mask| lambda {|i| i ^ mask } } end |
Instance Attribute Details
#masks ⇒ Object (readonly)
Returns the bit masks used to implement the hash functions.
86 87 88 |
# File 'lib/minhash.rb', line 86 def masks @masks end |
Class Method Details
.create(length) ⇒ Object
Creates a new instance of the algorithm with length random bit masks.
97 98 99 |
# File 'lib/minhash.rb', line 97 def self.create(length) new length.times.map { rand(2 ** 32 -1) } end |
Instance Method Details
#signature(tokens) ⇒ Object
Returns the minhash signature for a set of tokens.
102 103 104 |
# File 'lib/minhash.rb', line 102 def signature(tokens) @hash_functions.map {|f| tokens.map(&f).min } end |