Module: Lightstreamer::UTF16
- Defined in:
- lib/lightstreamer/utf16.rb
Overview
This module supports the decoding of UTF-16 escape sequences
Class Method Summary collapse
-
.decode_escape_sequences(string) ⇒ Object
Decodes any UTF-16 escape sequences in the form ‘uXXXX’ into a new string.
-
.decode_surrogate_pairs_escape_sequences(string) ⇒ Object
Converts any UTF-16 surrogate pairs escape sequences in the form ‘uXXXXuYYYY’ into UTF-8.
Class Method Details
.decode_escape_sequences(string) ⇒ Object
Decodes any UTF-16 escape sequences in the form ‘uXXXX’ into a new string. Invalid escape sequences are removed.
7 8 9 10 11 12 13 14 15 16 17 |
# File 'lib/lightstreamer/utf16.rb', line 7 def decode_escape_sequences(string) string = decode_surrogate_pairs_escape_sequences string # Match all remaining escape sequences string.gsub(/\\u[A-F\d]{4}/i) do |escape_sequence| codepoint = escape_sequence[2..-1].hex # Codepoints greater than 0xD7FF are invalid codepoint < 0xD800 ? [codepoint].pack('U') : '' end end |
.decode_surrogate_pairs_escape_sequences(string) ⇒ Object
Converts any UTF-16 surrogate pairs escape sequences in the form ‘uXXXXuYYYY’ into UTF-8.
20 21 22 23 24 25 26 27 28 29 |
# File 'lib/lightstreamer/utf16.rb', line 20 def decode_surrogate_pairs_escape_sequences(string) string.gsub(/\\uD[89AB][A-F\d]{2}\\uD[C-F][A-F\d]{2}/i) do |escape_sequence| high_surrogate = escape_sequence[2...6].hex low_surrogate = escape_sequence[8...12].hex codepoint = 0x10000 + ((high_surrogate - 0xD800) << 10) + (low_surrogate - 0xDC00) [codepoint].pack 'U' end end |