Module: UTF8Encoding
- Includes:
- ControlCharacters, ForceBinary, ObjectSupport
- Included in:
- NdrSupport::YAML::SerializationMigration
- Defined in:
- lib/ndr_support/utf8_encoding.rb,
lib/ndr_support/utf8_encoding/force_binary.rb,
lib/ndr_support/utf8_encoding/object_support.rb,
lib/ndr_support/utf8_encoding/control_characters.rb
Overview
Allows any object (if supported) to have all related strings encoded in place to UTF-8.
Defined Under Namespace
Modules: ControlCharacters, ForceBinary, ObjectSupport Classes: UTF8CoercionError
Constant Summary collapse
- AUTO_ENCODINGS =
Our known source encodings, in order of preference:
%w( UTF-8 UTF-16 Windows-1252 )
- REPLACEMENT_SCHEME =
How should unmappable characters be escaped, when forcing encoding?
lambda { |char| '0x' + char.ord.to_s(16).rjust(2, '0') }
Constants included from ControlCharacters
ControlCharacters::ALLOWED_CONTROL_CHARACTERS, ControlCharacters::CONTROL_CHARACTERS
Instance Method Summary collapse
-
#coerce_utf8(string, source_encoding = nil) ⇒ Object
Returns a UTF-8 version of ‘string`, escaping any unmappable characters.
-
#coerce_utf8!(string, source_encoding = nil) ⇒ Object
Coerces ‘string` to UTF-8, in place, escaping any unmappable characters.
-
#ensure_utf8(string, source_encoding = nil) ⇒ Object
Returns a new string with valid UTF-8 encoding, or raises an exception if encoding fails.
-
#ensure_utf8!(string, source_encoding = nil) ⇒ Object
Attempts to encode ‘string` to UTF-8, in place.
Methods included from ObjectSupport
#ensure_utf8_array!, #ensure_utf8_hash!, #ensure_utf8_object!
Methods included from ForceBinary
Methods included from ControlCharacters
#escape_control_chars, #escape_control_chars!, #escape_control_chars_in_array!, #escape_control_chars_in_hash!, #escape_control_chars_in_object!
Instance Method Details
#coerce_utf8(string, source_encoding = nil) ⇒ Object
Returns a UTF-8 version of ‘string`, escaping any unmappable characters.
44 45 46 |
# File 'lib/ndr_support/utf8_encoding.rb', line 44 def coerce_utf8(string, source_encoding = nil) coerce_utf8!(string.dup, source_encoding) end |
#coerce_utf8!(string, source_encoding = nil) ⇒ Object
Coerces ‘string` to UTF-8, in place, escaping any unmappable characters.
49 50 51 52 53 54 55 |
# File 'lib/ndr_support/utf8_encoding.rb', line 49 def coerce_utf8!(string, source_encoding = nil) # Try normally first... ensure_utf8!(string, source_encoding) rescue UTF8CoercionError # ...before going back-to-basics, and replacing things that don't map: string.encode!('UTF-8', 'BINARY', :fallback => REPLACEMENT_SCHEME) end |
#ensure_utf8(string, source_encoding = nil) ⇒ Object
Returns a new string with valid UTF-8 encoding, or raises an exception if encoding fails.
21 22 23 |
# File 'lib/ndr_support/utf8_encoding.rb', line 21 def ensure_utf8(string, source_encoding = nil) ensure_utf8!(string.dup, source_encoding) end |
#ensure_utf8!(string, source_encoding = nil) ⇒ Object
Attempts to encode ‘string` to UTF-8, in place. Returns `string`, or raises an exception.
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
# File 'lib/ndr_support/utf8_encoding.rb', line 27 def ensure_utf8!(string, source_encoding = nil) # A list of encodings we should try from: candidates = source_encoding ? Array.wrap(source_encoding) : AUTO_ENCODINGS # Attempt to coerce the string to UTF-8, from one of the source # candidates (in order of preference): apply_candidates!(string, candidates) unless string.valid_encoding? # None of our candidate source encodings worked, so fail: fail(UTF8CoercionError, "Attempted to use: #{candidates}") end string end |