Module: ActiveSupport::Multibyte

Defined in:
lib/active_support/multibyte/chars.rb,
lib/active_support/multibyte.rb,
lib/active_support/multibyte/utils.rb,
lib/active_support/multibyte/exceptions.rb,
lib/active_support/multibyte/unicode_database.rb

Overview

:nodoc:

Defined Under Namespace

Classes: Chars, Codepoint, EncodingError, UnicodeDatabase

Constant Summary collapse

NORMALIZATION_FORMS =

A list of all available normalization forms. See www.unicode.org/reports/tr15/tr15-29.html for more information about normalization.

[:c, :kc, :d, :kd]
UNICODE_VERSION =

The Unicode version that is supported by the implementation

'5.1.0'
VALID_CHARACTER =

Regular expressions that describe valid byte sequences for a character

{
  # Borrowed from the Kconv library by Shinji KONO - (also as seen on the W3C site)
  'UTF-8' => /\A(?:
              [\x00-\x7f]                                         |
              [\xc2-\xdf] [\x80-\xbf]                             |
              \xe0        [\xa0-\xbf] [\x80-\xbf]                 |
              [\xe1-\xef] [\x80-\xbf] [\x80-\xbf]                 |
              \xf0        [\x90-\xbf] [\x80-\xbf] [\x80-\xbf]     |
              [\xf1-\xf3] [\x80-\xbf] [\x80-\xbf] [\x80-\xbf]     |
              \xf4        [\x80-\x8f] [\x80-\xbf] [\x80-\xbf])\z /xn,
  # Quick check for valid Shift-JIS characters, disregards the odd-even pairing
  'Shift_JIS' => /\A(?:
              [\x00-\x7e \xa1-\xdf]                                     |
              [\x81-\x9f \xe0-\xef] [\x40-\x7e \x80-\x9e \x9f-\xfc])\z /xn
}
UCD =

UniCode Database

UnicodeDatabase.new

Class Method Summary collapse

Class Method Details

.clean(string) ⇒ Object

Removes all invalid characters from the string.

Note: this method is a no-op in Ruby 1.9



46
47
48
# File 'lib/active_support/multibyte/utils.rb', line 46

def self.clean(string)
  string
end

.proxy_classObject

Returns the currect proxy class



31
32
33
# File 'lib/active_support/multibyte.rb', line 31

def self.proxy_class
  @proxy_class ||= ActiveSupport::Multibyte::Chars
end

.proxy_class=(klass) ⇒ Object

The proxy class returned when calling mb_chars. You can use this accessor to configure your own proxy class so you can support other encodings. See the ActiveSupport::Multibyte::Chars implementation for an example how to do this.

Example:

ActiveSupport::Multibyte.proxy_class = CharsForUTF32


26
27
28
# File 'lib/active_support/multibyte.rb', line 26

def self.proxy_class=(klass)
  @proxy_class = klass
end

.valid_characterObject

Returns a regular expression that matches valid characters in the current encoding



7
8
9
# File 'lib/active_support/multibyte/utils.rb', line 7

def self.valid_character
  VALID_CHARACTER[Encoding.default_external.to_s]
end

.verify(string) ⇒ Object

Verifies the encoding of a string



23
24
25
# File 'lib/active_support/multibyte/utils.rb', line 23

def self.verify(string)
  string.valid_encoding?
end

.verify!(string) ⇒ Object

Verifies the encoding of the string and raises an exception when it’s not valid

Raises:



38
39
40
# File 'lib/active_support/multibyte/utils.rb', line 38

def self.verify!(string)
  raise EncodingError.new("Found characters with invalid encoding") unless verify(string)
end