Module: Lightstreamer::UTF16

Defined in:
lib/lightstreamer/utf16.rb

Overview

This module supports the decoding of UTF-16 escape sequences

Class Method Summary collapse

Class Method Details

.decode_escape_sequences(string) ⇒ String

Decodes any UTF-16 escape sequences in the form ‘uXXXX’ in the passed string. Invalid escape sequences are removed.

Parameters:

  • string (String)

    The string to decode.

Returns:

  • (String)


12
13
14
15
16
17
18
19
20
21
22
# File 'lib/lightstreamer/utf16.rb', line 12

def decode_escape_sequences(string)
  string = decode_surrogate_pairs_escape_sequences string

  # Match all escape sequences
  string.gsub(/\\u[A-F\d]{4}/i) do |escape_sequence|
    codepoint = escape_sequence[2..-1].hex

    # Codepoints greater than 0xD7FF are invalid are ignored
    codepoint < 0xD800 ? [codepoint].pack('U') : ''
  end
end

.decode_surrogate_pairs_escape_sequences(string) ⇒ String

Decodes any UTF-16 surrogate pairs escape sequences in the form ‘uXXXXuYYYY’ in the passed string.

Parameters:

  • string (String)

    The string to decode.

Returns:

  • (String)


29
30
31
32
33
34
35
36
37
38
# File 'lib/lightstreamer/utf16.rb', line 29

def decode_surrogate_pairs_escape_sequences(string)
  string.gsub(/\\uD[89AB][A-F\d]{2}\\uD[C-F][A-F\d]{2}/i) do |escape_sequence|
    high_surrogate = escape_sequence[2...6].hex
    low_surrogate = escape_sequence[8...12].hex

    codepoint = 0x10000 + ((high_surrogate - 0xD800) << 10) + (low_surrogate - 0xDC00)

    [codepoint].pack 'U'
  end
end