Class: HexaPDF::Font::CMap
- Inherits:
-
Object
- Object
- HexaPDF::Font::CMap
- Defined in:
- lib/hexapdf/font/cmap.rb,
lib/hexapdf/font/cmap/parser.rb,
lib/hexapdf/font/cmap/writer.rb
Overview
Represents a CMap, a mapping from character codes to CIDs (character IDs) or to their Unicode value.
See: PDF1.7 s9.7.5, s9.10.3; Adobe Technical Notes #5014 and #5411
Defined Under Namespace
Constant Summary collapse
- CMAP_DIR =
:nodoc:
File.join(HexaPDF.data_dir, 'cmap')
Instance Attribute Summary collapse
-
#name ⇒ Object
The name of the CMap.
-
#ordering ⇒ Object
The ordering part of the CMap version.
-
#registry ⇒ Object
The registry part of the CMap version.
-
#supplement ⇒ Object
The supplement part of the CMap version.
-
#wmode ⇒ Object
The writing mode of the CMap: 0 for horizontal, 1 for vertical writing.
Class Method Summary collapse
-
.create_to_unicode_cmap(mapping) ⇒ Object
Returns a string containing a ToUnicode CMap that represents the given code to Unicode codepoint mapping.
-
.for_name(name) ⇒ Object
Creates a new CMap object by parsing a predefined CMap with the given name.
-
.parse(string) ⇒ Object
Creates a new CMap object from the given string which needs to contain a valid CMap file.
-
.predefined?(name) ⇒ Boolean
Returns
true
if the given name specifies a predefined CMap.
Instance Method Summary collapse
-
#add_cid_mapping(code, cid) ⇒ Object
Adds an individual mapping from character code to CID.
-
#add_cid_range(start_code, end_code, start_cid) ⇒ Object
Adds a CID range, mapping characters codes from
start_code
toend_code
to CIDs starting withstart_cid
. -
#add_codespace_range(first, *rest) ⇒ Object
Add a codespace range using an array of ranges for the individual bytes.
-
#add_unicode_mapping(code, string) ⇒ Object
Adds a mapping from character code to Unicode string in UTF-8 encoding.
-
#initialize ⇒ CMap
constructor
Creates a new CMap object.
-
#read_codes(string) ⇒ Object
Parses the string and returns all character codes.
-
#to_cid(code) ⇒ Object
Returns the CID for the given character code, or 0 if no mapping was found.
-
#to_unicode(code) ⇒ Object
Returns the Unicode string in UTF-8 encoding for the given character code, or
nil
if no mapping was found. -
#use_cmap(cmap) ⇒ Object
Add all mappings from the given CMap to this CMap.
Constructor Details
#initialize ⇒ CMap
Creates a new CMap object.
108 109 110 111 112 113 |
# File 'lib/hexapdf/font/cmap.rb', line 108 def initialize @codespace_ranges = [] @cid_mapping = {} @cid_range_mappings = [] @unicode_mapping = {} end |
Instance Attribute Details
#name ⇒ Object
The name of the CMap.
96 97 98 |
# File 'lib/hexapdf/font/cmap.rb', line 96 def name @name end |
#ordering ⇒ Object
The ordering part of the CMap version.
90 91 92 |
# File 'lib/hexapdf/font/cmap.rb', line 90 def ordering @ordering end |
#registry ⇒ Object
The registry part of the CMap version.
87 88 89 |
# File 'lib/hexapdf/font/cmap.rb', line 87 def registry @registry end |
#supplement ⇒ Object
The supplement part of the CMap version.
93 94 95 |
# File 'lib/hexapdf/font/cmap.rb', line 93 def supplement @supplement end |
#wmode ⇒ Object
The writing mode of the CMap: 0 for horizontal, 1 for vertical writing.
99 100 101 |
# File 'lib/hexapdf/font/cmap.rb', line 99 def wmode @wmode end |
Class Method Details
.create_to_unicode_cmap(mapping) ⇒ Object
Returns a string containing a ToUnicode CMap that represents the given code to Unicode codepoint mapping.
See: Writer#create_to_unicode_cmap
81 82 83 |
# File 'lib/hexapdf/font/cmap.rb', line 81 def self.create_to_unicode_cmap(mapping) Writer.new.create_to_unicode_cmap(mapping) end |
.for_name(name) ⇒ Object
Creates a new CMap object by parsing a predefined CMap with the given name.
Raises an error if the given CMap is not found.
61 62 63 64 65 66 67 68 69 70 |
# File 'lib/hexapdf/font/cmap.rb', line 61 def self.for_name(name) return @cmap_cache[name] if @cmap_cache.key?(name) file = File.join(CMAP_DIR, name) if File.exist?(file) @cmap_cache[name] = parse(File.read(file, encoding: ::Encoding::UTF_8)) else raise HexaPDF::Error, "No CMap named '#{name}' found" end end |
.parse(string) ⇒ Object
Creates a new CMap object from the given string which needs to contain a valid CMap file.
73 74 75 |
# File 'lib/hexapdf/font/cmap.rb', line 73 def self.parse(string) Parser.new.parse(string) end |
.predefined?(name) ⇒ Boolean
Returns true
if the given name specifies a predefined CMap.
54 55 56 |
# File 'lib/hexapdf/font/cmap.rb', line 54 def self.predefined?(name) File.exist?(File.join(CMAP_DIR, name)) end |
Instance Method Details
#add_cid_mapping(code, cid) ⇒ Object
Adds an individual mapping from character code to CID.
168 169 170 |
# File 'lib/hexapdf/font/cmap.rb', line 168 def add_cid_mapping(code, cid) @cid_mapping[code] = cid end |
#add_cid_range(start_code, end_code, start_cid) ⇒ Object
Adds a CID range, mapping characters codes from start_code
to end_code
to CIDs starting with start_cid
.
174 175 176 |
# File 'lib/hexapdf/font/cmap.rb', line 174 def add_cid_range(start_code, end_code, start_cid) @cid_range_mappings << [start_code..end_code, start_cid] end |
#add_codespace_range(first, *rest) ⇒ Object
Add a codespace range using an array of ranges for the individual bytes.
This means that the first range is checked against the first byte, the second range against the second byte and so on.
127 128 129 |
# File 'lib/hexapdf/font/cmap.rb', line 127 def add_codespace_range(first, *rest) @codespace_ranges << [first, rest] end |
#add_unicode_mapping(code, string) ⇒ Object
Adds a mapping from character code to Unicode string in UTF-8 encoding.
193 194 195 |
# File 'lib/hexapdf/font/cmap.rb', line 193 def add_unicode_mapping(code, string) @unicode_mapping[code] = string end |
#read_codes(string) ⇒ Object
Parses the string and returns all character codes.
An error is raised if the string contains invalid bytes.
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 |
# File 'lib/hexapdf/font/cmap.rb', line 134 def read_codes(string) codes = [] bytes = string.each_byte loop do byte = bytes.next code = 0 found = @codespace_ranges.any? do |first_byte_range, rest_ranges| next unless first_byte_range.cover?(byte) code = (code << 8) + byte valid = rest_ranges.all? do |range| begin byte = bytes.next rescue StopIteration raise HexaPDF::Error, "Missing bytes while reading codes via CMap" end code = (code << 8) + byte range.cover?(byte) end codes << code if valid end unless found raise HexaPDF::Error, "Invalid byte while reading codes via CMap: #{byte}" end end codes end |
#to_cid(code) ⇒ Object
Returns the CID for the given character code, or 0 if no mapping was found.
179 180 181 182 183 184 185 186 187 188 189 190 |
# File 'lib/hexapdf/font/cmap.rb', line 179 def to_cid(code) cid = @cid_mapping.fetch(code, -1) if cid == -1 @cid_range_mappings.reverse_each do |range, start_cid| if range.cover?(code) cid = start_cid + code - range.first break end end end (cid == -1 ? 0 : cid) end |
#to_unicode(code) ⇒ Object
Returns the Unicode string in UTF-8 encoding for the given character code, or nil
if no mapping was found.
199 200 201 |
# File 'lib/hexapdf/font/cmap.rb', line 199 def to_unicode(code) unicode_mapping[code] end |
#use_cmap(cmap) ⇒ Object
Add all mappings from the given CMap to this CMap.
116 117 118 119 120 121 |
# File 'lib/hexapdf/font/cmap.rb', line 116 def use_cmap(cmap) @codespace_ranges.concat(cmap.codespace_ranges) @cid_mapping.merge!(cmap.cid_mapping) @cid_range_mappings.concat(cmap.cid_range_mappings) @unicode_mapping.merge!(cmap.unicode_mapping) end |