Class: JapaneseNames::Splitter

Inherits:
Object
  • Object
show all
Defined in:
lib/japanese_names/splitter.rb

Overview

Provides methods to split a full Japanese name strings into surname and given name.

Instance Method Summary collapse

Instance Method Details

#split(kanji, kana) ⇒ Object

Given a kanji and kana representation of a name splits into to family/given names.

The choice to prioritize family name is arbitrary. Further analysis is needed for whether given or family name should be prioritized.

Returns Array [[kanji_fam, kanji_giv], [kana_fam, kana_giv]] if there was a match. Returns nil if there was no match.



13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# File 'lib/japanese_names/splitter.rb', line 13

def split(kanji, kana)
  return nil unless kanji && kana
  kanji = kanji.strip
  kana  = kana.strip

  # Short-circuit: Return last name if it can match the full string
  if kanji.size <= 3 && kana.size <= 4
    full_match = finder.find(kanji).detect { |d| d[0] == kanji && d[1] =~ /\A#{hk kana}\z/ }
    return [[kanji, nil], [kana, nil]] if full_match
  end

  # Partition kanji into candidate n-grams
  kanji_ngrams = Util::Ngram.ngram_partition(kanji)

  # Find all possible matches of all kanji n-grams in dictionary
  dict = finder.find(kanji_ngrams.flatten.uniq)

  first_lhs_match = nil
  first_rhs_match = nil
  kanji_ngrams.each do |kanji_pair|
    lhs_dict = dict.select { |d| d[0] == kanji_pair[0] }
    rhs_dict = dict.select { |d| d[0] == kanji_pair[1] }

    lhs_match = detect_lhs(lhs_dict, kanji, kana)
    rhs_match = detect_rhs(rhs_dict, kanji, kana)

    return lhs_match if lhs_match && lhs_match == rhs_match

    first_lhs_match ||= lhs_match
    first_rhs_match ||= rhs_match
  end

  # As a fallback, return single-sided match prioritizing surname match first
  first_lhs_match || first_rhs_match
end