Class: UTF8Utils::Char
- Inherits:
-
Array
- Object
- Array
- UTF8Utils::Char
- Defined in:
- lib/utf8_utils/char.rb
Instance Method Summary collapse
-
#expected_length ⇒ Object
Given the first byte, how many bytes long should this character be?.
-
#invalid? ⇒ Boolean
Is the character invalid?.
-
#tidy ⇒ Object
Attempt to rescue a valid UTF-8 character from a malformed character.
- #to_codepoint ⇒ Object
-
#to_s ⇒ Object
Get a multibyte character from the bytes.
- #valid? ⇒ Boolean
Instance Method Details
#expected_length ⇒ Object
Given the first byte, how many bytes long should this character be?
6 7 8 |
# File 'lib/utf8_utils/char.rb', line 6 def expected_length (first.continuations rescue 0) + 1 end |
#invalid? ⇒ Boolean
Is the character invalid?
11 12 13 |
# File 'lib/utf8_utils/char.rb', line 11 def invalid? !valid? end |
#tidy ⇒ Object
Attempt to rescue a valid UTF-8 character from a malformed character. It will first attempt to convert from CP1251, and if this isn’t possible, it prepends a valid leading byte, treating the character as the last byte in a two-byte character. Note that much of the logic here is taken from ActiveSupport; the difference is that this works for Ruby 1.8.6 - 1.9.1.
20 21 22 23 24 25 26 27 28 29 30 |
# File 'lib/utf8_utils/char.rb', line 20 def tidy return self if valid? byte = first.to_i if UTF8Utils::CP1251.key? byte self.class.new [UTF8Utils::CP1251[byte]] elsif byte < 192 self.class.new [194, byte] else self.class.new [195, byte - 64] end end |
#to_codepoint ⇒ Object
37 38 39 |
# File 'lib/utf8_utils/char.rb', line 37 def to_codepoint flatten.map {|b| b.to_i }.pack("C*").unpack("U*")[0] end |
#to_s ⇒ Object
Get a multibyte character from the bytes.
33 34 35 |
# File 'lib/utf8_utils/char.rb', line 33 def to_s flatten.map {|b| b.to_i }.pack("C*").unpack("U*").pack("U*") end |
#valid? ⇒ Boolean
41 42 43 44 45 46 47 48 49 |
# File 'lib/utf8_utils/char.rb', line 41 def valid? return false if length != expected_length each_with_index do |byte, index| return false if byte.invalid? return false if index == 0 and byte.continuation? return false if index > 0 and !byte.continuation? end true end |