Module: RUnicode::UTF8

Defined in:
lib/runicode/utf8.rb

Overview

lib/runicode/utf8.rb

Copyright © Tom Adams 2006

This programme is free software.
You can distribute/modify this program under
the terms of the Ruby License.

Defined Under Namespace

Classes: ThisShouldNotHappen

Constant Summary collapse

INVALID_BYTES =
((0xC0..0xC1).to_a + (0xF5..0xFE).to_a +
[0b11000000, 0b11100000, 0b11110000])

Class Method Summary collapse

Class Method Details

.bytes_to_char(bytes) ⇒ Object

Converts one set of bytes making up a character into an integer codepoint



21
22
23
24
25
26
27
28
29
30
31
32
33
# File 'lib/runicode/utf8.rb', line 21

def self.bytes_to_char(bytes)
  case bytes.size
  when 1 # 0xxxxxxx
    return bytes.first & 0b01111111
  when 2 # 110xxxxx 10xxxxxx
    return ((bytes.first & 0b00011111) << 6) + rest(bytes[1..-1])
  when 3 # 1110xxxx 10xxxxxx 10xxxxxx
    return ((bytes.first & 0b00001111) << 12) + rest(bytes[1..-1])
  when 4 # 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
    return ((bytes.first & 0b00000111) << 18) + rest(bytes[1..-1])
  else raise ThisShouldNotHappen
  end
end

.rest(bytes) ⇒ Object

Gives the total value from 10xxxxxx in each byte



36
37
38
39
40
41
42
43
# File 'lib/runicode/utf8.rb', line 36

def self.rest(bytes)
  val = 0b0
  bytes.each {|b|
    val = (val << 6)
    val += (b & 0b00111111)
  }
  val
end