Class: Company::Mapping::CompanyMapper

Inherits:
Object
  • Object
show all
Defined in:
lib/company/mapping/company_mapper.rb

Overview

CompanyMapper given a corpus of documents (that contains company names) can map a new document with an existing one if one exists

Instance Method Summary collapse

Constructor Details

#initialize(corpus) ⇒ CompanyMapper

Returns a new instance of CompanyMapper.



8
9
10
11
12
# File 'lib/company/mapping/company_mapper.rb', line 8

def initialize(corpus)
  @corpus = corpus
  @tfidf = TFIDF.new(@corpus)
  @tfidf.calculate
end

Instance Method Details

#map(company, threshold) ⇒ Object

maps a given company to a company exists to the given corpus. If the maximum name similarity found exceeds the given threshold then the company’s id is returned as a match



16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# File 'lib/company/mapping/company_mapper.rb', line 16

def map(company, threshold)
  if (company.is_a? String)
    content = company
    company = TextDocument.new
    company.contents = content
    company.id = "new_comp"
  end

  @tfidf.calculate_tfidf_weights_of_new_document(company)

  maxSim = 0.0
  mapped_company = ""
  @corpus.each do |d|
    similarity = @tfidf.similarity(d.id, company.id)
    next unless maxSim < similarity
    maxSim = similarity
    mapped_company = d.id
    break if maxSim == 1
  end

  return unless maxSim > threshold
  mapped_company.to_s.sub(/\_.*/, "").to_i
end