Class: JapaneseNames::Splitter
- Inherits:
-
Object
- Object
- JapaneseNames::Splitter
- Defined in:
- lib/japanese_names/splitter.rb
Overview
Provides methods to split a full Japanese name strings into surname and given name.
Instance Method Summary collapse
-
#split(kanji, kana) ⇒ Object
Given a kanji and kana representation of a name splits into to family/given names.
Instance Method Details
#split(kanji, kana) ⇒ Object
Given a kanji and kana representation of a name splits into to family/given names.
The choice to prioritize family name is arbitrary. Further analysis is needed for whether given or family name should be prioritized.
Returns Array [[kanji_fam, kanji_giv], [kana_fam, kana_giv]] if there was a match. Returns nil if there was no match.
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
# File 'lib/japanese_names/splitter.rb', line 13 def split(kanji, kana) return nil unless kanji && kana kanji = kanji.strip kana = kana.strip # Short-circuit: Return last name if it can match the full string if kanji.size <= 3 && kana.size <= 4 full_match = finder.find(kanji).detect { |d| d[0] == kanji && d[1] =~ /\A#{hk kana}\z/ } return [[kanji, nil], [kana, nil]] if full_match end # Partition kanji into candidate n-grams kanji_ngrams = Util::Ngram.ngram_partition(kanji) # Find all possible matches of all kanji n-grams in dictionary dict = finder.find(kanji_ngrams.flatten.uniq) first_lhs_match = nil first_rhs_match = nil kanji_ngrams.each do |kanji_pair| lhs_dict = dict.select { |d| d[0] == kanji_pair[0] } rhs_dict = dict.select { |d| d[0] == kanji_pair[1] } lhs_match = detect_lhs(lhs_dict, kanji, kana) rhs_match = detect_rhs(rhs_dict, kanji, kana) return lhs_match if lhs_match && lhs_match == rhs_match first_lhs_match ||= lhs_match first_rhs_match ||= rhs_match end # As a fallback, return single-sided match prioritizing surname match first first_lhs_match || first_rhs_match end |