Class: LooseTightDictionary::Blocking

Inherits:
Object
  • Object
show all
Defined in:
lib/loose_tight_dictionary/blocking.rb

Overview

“Record linkage typically involves two main steps: blocking and scoring…” en.wikipedia.org/wiki/Record_linkage

Blockings effectively divide up the haystack into groups that match a pattern

A blocking (as in a grouping) comes into effect when a str matches. Then the needle must also match the blocking’s regexp.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(regexp_or_str) ⇒ Blocking

Returns a new instance of Blocking.



12
13
14
# File 'lib/loose_tight_dictionary/blocking.rb', line 12

def initialize(regexp_or_str)
  @regexp = regexp_or_str.to_regexp
end

Instance Attribute Details

#regexpObject (readonly)

Returns the value of attribute regexp.



10
11
12
# File 'lib/loose_tight_dictionary/blocking.rb', line 10

def regexp
  @regexp
end

Instance Method Details

#join?(str1, str2) ⇒ Boolean

If a blocking “joins” two strings, that means they both fit into it.

Returns false if they certainly don’t fit this blocking. Returns nil if the blocking doesn’t apply, i.e. str2 doesn’t fit the blocking.

Returns:

  • (Boolean)


24
25
26
27
28
29
30
31
32
33
34
# File 'lib/loose_tight_dictionary/blocking.rb', line 24

def join?(str1, str2)
  if str2_match_data = regexp.match(str2)
    if str1_match_data = regexp.match(str1)
      str2_match_data.captures == str1_match_data.captures
    else
      false
    end
  else
    nil
  end
end

#match?(str) ⇒ Boolean

Returns:

  • (Boolean)


16
17
18
# File 'lib/loose_tight_dictionary/blocking.rb', line 16

def match?(str)
  !!(regexp.match(str))
end