Method: Janeway::Lexer#convert_surrogate_pair_to_codepoint
- Defined in:
- lib/janeway/lexer.rb
#convert_surrogate_pair_to_codepoint(high_surrogate_hex, low_surrogate_hex) ⇒ String
Convert a valid UTF-16 surrogate pair into a UTF-8 string containing a single code point.
286 287 288 289 290 291 292 293 294 295 296 297 |
# File 'lib/janeway/lexer.rb', line 286 def convert_surrogate_pair_to_codepoint(high_surrogate_hex, low_surrogate_hex) [high_surrogate_hex, low_surrogate_hex].each do |hex_str| raise ArgumentError, "expect 4 hex digits, got #{hex_string.inspect}" unless hex_str.size == 4 end # Calculate the code point from the surrogate pair values # algorithm from https://russellcottrell.com/greek/utilities/SurrogatePairCalculator.htm high = high_surrogate_hex.hex low = low_surrogate_hex.hex codepoint = ((high - 0xD800) * 0x400) + (low - 0xDC00) + 0x10000 [codepoint].pack('U') # convert integer codepoint to single character string end |