Module: Babosa::UTF8::Proxy

Included in:
DumbProxy, JavaProxy, UnicodeProxy
Defined in:
lib/babosa/utf8/proxy.rb

Overview

A UTF-8 proxy for Babosa can be any object which responds to the methods in this module. The following proxies are provided by Babosa: ActiveSupportProxy, DumbProxy, JavaProxy, and UnicodeProxy.

Constant Summary collapse

CP1252 =
{
  128 => [226, 130, 172],
  129 => nil,
  130 => [226, 128, 154],
  131 => [198, 146],
  132 => [226, 128, 158],
  133 => [226, 128, 166],
  134 => [226, 128, 160],
  135 => [226, 128, 161],
  136 => [203, 134],
  137 => [226, 128, 176],
  138 => [197, 160],
  139 => [226, 128, 185],
  140 => [197, 146],
  141 => nil,
  142 => [197, 189],
  143 => nil,
  144 => nil,
  145 => [226, 128, 152],
  146 => [226, 128, 153],
  147 => [226, 128, 156],
  148 => [226, 128, 157],
  149 => [226, 128, 162],
  150 => [226, 128, 147],
  151 => [226, 128, 148],
  152 => [203, 156],
  153 => [226, 132, 162],
  154 => [197, 161],
  155 => [226, 128, 186],
  156 => [197, 147],
  157 => nil,
  158 => [197, 190],
  159 => [197, 184]
}

Instance Method Summary collapse

Instance Method Details

#downcase(string) ⇒ Object

This is a stub for a method that should return a Unicode-aware downcased version of the given string.

Raises:

  • (NotImplementedError)


49
50
51
# File 'lib/babosa/utf8/proxy.rb', line 49

def downcase(string)
  raise NotImplementedError
end

#normalize_utf8(string) ⇒ Object

This is a stub for a method that should return the Unicode NFC normalization of the given string.

Raises:

  • (NotImplementedError)


61
62
63
# File 'lib/babosa/utf8/proxy.rb', line 61

def normalize_utf8(string)
  raise NotImplementedError
end

#tidy_bytes(string) ⇒ Object

Attempt to replace invalid UTF-8 bytes with valid ones. This method naively assumes if you have invalid UTF8 bytes, they are either Windows CP-1252 or ISO8859-1. In practice this isn’t a bad assumption, but may not always work.



70
71
72
73
74
# File 'lib/babosa/utf8/proxy.rb', line 70

def tidy_bytes(string)
  string.scrub do |bad|
    tidy_byte(*bad.bytes).flatten.compact.pack('C*').unpack('U*').pack('U*')
  end
end

#upcase(string) ⇒ Object

This is a stub for a method that should return a Unicode-aware upcased version of the given string.

Raises:

  • (NotImplementedError)


55
56
57
# File 'lib/babosa/utf8/proxy.rb', line 55

def upcase(string)
  raise NotImplementedError
end