Method: PDF::Reader::Encoding#to_utf8

Defined in:
lib/pdf/reader/encoding.rb

#to_utf8(str) ⇒ Object

convert the specified string to utf8

  • unpack raw bytes into codepoints

  • replace any that have entries in the differences table with a glyph name

  • convert codepoints from source encoding to Unicode codepoints

  • convert any glyph names to Unicode codepoints

  • replace characters that didn’t convert to Unicode nicely with something valid

  • pack the final array of Unicode codepoints into a utf-8 string

  • mark the string as utf-8 if we’re running on a M17N aware VM



103
104
105
106
107
108
109
# File 'lib/pdf/reader/encoding.rb', line 103

def to_utf8(str)
  if utf8_conversion_impossible?
    little_boxes(str.unpack(unpack).size)
  else
    convert_to_utf8(str)
  end
end