Module: RUnicode::UTF8
- Defined in:
- lib/runicode/utf8.rb
Overview
lib/runicode/utf8.rb
Copyright © Tom Adams 2006
This programme is free software.
You can distribute/modify this program under
the terms of the Ruby License.
Defined Under Namespace
Classes: ThisShouldNotHappen
Constant Summary collapse
- INVALID_BYTES =
((0xC0..0xC1).to_a + (0xF5..0xFE).to_a + [0b11000000, 0b11100000, 0b11110000])
Class Method Summary collapse
-
.bytes_to_char(bytes) ⇒ Object
Converts one set of bytes making up a character into an integer codepoint.
-
.rest(bytes) ⇒ Object
Gives the total value from 10xxxxxx in each byte.
Class Method Details
.bytes_to_char(bytes) ⇒ Object
Converts one set of bytes making up a character into an integer codepoint
21 22 23 24 25 26 27 28 29 30 31 32 33 |
# File 'lib/runicode/utf8.rb', line 21 def self.bytes_to_char(bytes) case bytes.size when 1 # 0xxxxxxx return bytes.first & 0b01111111 when 2 # 110xxxxx 10xxxxxx return ((bytes.first & 0b00011111) << 6) + rest(bytes[1..-1]) when 3 # 1110xxxx 10xxxxxx 10xxxxxx return ((bytes.first & 0b00001111) << 12) + rest(bytes[1..-1]) when 4 # 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx return ((bytes.first & 0b00000111) << 18) + rest(bytes[1..-1]) else raise ThisShouldNotHappen end end |
.rest(bytes) ⇒ Object
Gives the total value from 10xxxxxx in each byte
36 37 38 39 40 41 42 43 |
# File 'lib/runicode/utf8.rb', line 36 def self.rest(bytes) val = 0b0 bytes.each {|b| val = (val << 6) val += (b & 0b00111111) } val end |