Class: Minhash::Minhash
- Inherits:
-
Object
- Object
- Minhash::Minhash
- Defined in:
- lib/doc_sim/minhash.rb
Overview
Class for generating Minhash signature
Constant Summary collapse
- HASH_MAX =
Hashes will always be <= 2**32
(2**32) + 1
Instance Attribute Summary collapse
-
#seed_root ⇒ Object
readonly
Returns the value of attribute seed_root.
Instance Method Summary collapse
-
#initialize(n_hashes = 1, seed_root = rand(2**32)) ⇒ Minhash
constructor
A new instance of Minhash.
-
#signature(set) ⇒ Array[Integer]
Produces the Minhash signature for a given Set.
Constructor Details
#initialize(n_hashes = 1, seed_root = rand(2**32)) ⇒ Minhash
Returns a new instance of Minhash.
13 14 15 16 17 18 |
# File 'lib/doc_sim/minhash.rb', line 13 def initialize(n_hashes = 1, seed_root = rand(2**32)) @seed_root = seed_root @hashes = Array.new(n_hashes) do |seed| ->(x) { MurmurHash3::V32.str_hash(x, seed_root + seed) } end end |
Instance Attribute Details
#seed_root ⇒ Object (readonly)
Returns the value of attribute seed_root.
8 9 10 |
# File 'lib/doc_sim/minhash.rb', line 8 def seed_root @seed_root end |
Instance Method Details
#signature(set) ⇒ Array[Integer]
Produces the Minhash signature for a given Set
25 26 27 28 29 30 31 32 33 |
# File 'lib/doc_sim/minhash.rb', line 25 def signature(set) counter = Array.new(@hashes.length, Minhash::HASH_MAX) set.each do |elem| @hashes.each_with_index do |hash_func, i| counter[i] = [counter[i], hash_func.call(elem)].min end end counter end |