Method: UnicodeUtils.compatibility_decomposition

Defined in:
lib/unicode_utils/compatibility_decomposition.rb

.compatibility_decomposition(str) ⇒ Object

Get the compatibility decomposition of the given string, also called Normalization Form KD or short NFKD.

Compatibility decomposition decomposes more code points than canonical decomposition and contrary to Normalization Form D and C, this normalization can alter how a string is displayed.

Example:

require "unicode_utils/compatibility_decomposition"
# LATIN SMALL LIGATURE FI => LATIN SMALL LETTER F, LATIN SMALL LETTER I
UnicodeUtils.compatibility_decomposition("") => "fi"

See also: UnicodeUtils.nfkd



26
27
28
29
30
31
32
33
34
35
36
# File 'lib/unicode_utils/compatibility_decomposition.rb', line 26

def compatibility_decomposition(str)
  res = String.new.force_encoding(str.encoding)
  str.each_codepoint { |cp|
    if cp >= 0xAC00 && cp <= 0xD7A3 # hangul syllable
      Impl.append_hangul_syllable_decomposition(res, cp)
    else
      Impl.append_recursive_compatibility_decomposition_mapping(res, cp)
    end
  }
  Impl.put_into_canonical_order(res)
end