Method: Janeway::Lexer#convert_surrogate_pair_to_codepoint

Defined in:: lib/janeway/lexer.rb

#convert_surrogate_pair_to_codepoint(high_surrogate_hex, low_surrogate_hex) ⇒ `String`

Convert a valid UTF-16 surrogate pair into a UTF-8 string containing a single code point.

Parameters:

high_surrogate_hex (String) —

string of hex digits, eg. “D83D”
low_surrogate_hex (String) —

string of hex digits, eg. “DE09”

Returns:

(String) —

UTF-8 string containing a single multi-byte unicode character, eg. “😉”

# File 'lib/janeway/lexer.rb', line 286

def convert_surrogate_pair_to_codepoint(high_surrogate_hex, low_surrogate_hex)
  [high_surrogate_hex, low_surrogate_hex].each do |hex_str|
    raise ArgumentError, "expect 4 hex digits, got #{hex_string.inspect}" unless hex_str.size == 4
  end

  # Calculate the code point from the surrogate pair values
  # algorithm from https://russellcottrell.com/greek/utilities/SurrogatePairCalculator.htm
  high = high_surrogate_hex.hex
  low = low_surrogate_hex.hex
  codepoint = ((high - 0xD800) * 0x400) + (low - 0xDC00) + 0x10000
  [codepoint].pack('U') # convert integer codepoint to single character string
end

Method: Janeway::Lexer#convert_surrogate_pair_to_codepoint

#convert_surrogate_pair_to_codepoint(high_surrogate_hex, low_surrogate_hex) ⇒ String

#convert_surrogate_pair_to_codepoint(high_surrogate_hex, low_surrogate_hex) ⇒ `String`