Class: PDF::Reader::ObjectHash
- Inherits:
-
Object
- Object
- PDF::Reader::ObjectHash
- Includes:
- Enumerable
- Defined in:
- lib/pdf/reader/object_hash.rb
Overview
Provides low level access to the objects in a PDF file via a hash-like object.
A PDF file can be viewed as a large hash map. It is a series of objects stored at precise byte offsets, and a table that maps object IDs to byte offsets. Given an object ID, looking up an object is an O(1) operation.
Each PDF object can be mapped to a ruby object, so by passing an object ID to the [] method, a ruby representation of that object will be retrieved.
The class behaves much like a standard Ruby hash, including the use of the Enumerable mixin. The key difference is no []= method - the hash is read only.
Basic Usage
h = PDF::Reader::ObjectHash.new("somefile.pdf")
h[1]
=> 3469
h[PDF::Reader::Reference.new(1,0)]
=> 3469
Instance Attribute Summary collapse
-
#default ⇒ Object
Returns the value of attribute default.
-
#pdf_version ⇒ Object
readonly
Returns the value of attribute pdf_version.
-
#sec_handler ⇒ Object
readonly
Returns the value of attribute sec_handler.
-
#trailer ⇒ Object
readonly
Returns the value of attribute trailer.
Instance Method Summary collapse
-
#[](key) ⇒ Object
Access an object from the PDF.
-
#deref!(key) ⇒ Object
Recursively dereferences the object refered to be
key
. -
#deref_array(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it.
- #deref_array!(key) ⇒ Object
-
#deref_array_of_numbers(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it.
-
#deref_hash(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it.
- #deref_hash!(key) ⇒ Object
-
#deref_integer(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it.
-
#deref_name(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it.
-
#deref_name_or_array(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it.
-
#deref_number(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it.
-
#deref_stream(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it.
-
#deref_stream_or_array(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it.
-
#deref_string(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it.
-
#each(&block) ⇒ Object
(also: #each_pair)
iterate over each key, value.
-
#each_key(&block) ⇒ Object
iterate over each key.
-
#each_value(&block) ⇒ Object
iterate over each value.
-
#empty? ⇒ Boolean
return true if there are no objects in this file.
- #encrypted? ⇒ Boolean
-
#fetch(key, local_default = nil) ⇒ Object
Access an object from the PDF.
-
#has_key?(check_key) ⇒ Boolean
(also: #include?, #key?, #member?, #value?)
return true if the specified key exists in the file.
-
#has_value?(value) ⇒ Boolean
return true if the specifiedvalue exists in the file.
-
#initialize(input, opts = {}) ⇒ ObjectHash
constructor
Creates a new ObjectHash object.
-
#keys ⇒ Object
return an array of all keys in the file.
-
#obj_type(ref) ⇒ Object
returns the type of object a ref points to.
-
#object(key) ⇒ Object
(also: #deref)
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it.
-
#page_references ⇒ Object
returns an array of PDF::Reader::References.
- #sec_handler? ⇒ Boolean
-
#size ⇒ Object
(also: #length)
return the number of objects in the file.
-
#stream?(ref) ⇒ Boolean
returns true if the supplied references points to an object with a stream.
-
#to_a ⇒ Object
return an array of arrays.
- #to_s ⇒ Object
-
#values ⇒ Object
return an array of all values in the file.
-
#values_at(*ids) ⇒ Object
return an array of all values from the specified keys.
Constructor Details
#initialize(input, opts = {}) ⇒ ObjectHash
Creates a new ObjectHash object. Input can be a string with a valid filename or an IO-like object.
Valid options:
:password - the user password to decrypt the source PDF
44 45 46 47 48 49 50 51 52 53 54 55 56 |
# File 'lib/pdf/reader/object_hash.rb', line 44 def initialize(input, opts = {}) @io = extract_io_from(input) @xref = PDF::Reader::XRef.new(@io) @pdf_version = read_version @trailer = @xref.trailer @cache = opts[:cache] || PDF::Reader::ObjectCache.new @sec_handler = NullSecurityHandler.new @sec_handler = SecurityHandlerFactory.build( deref(trailer[:Encrypt]), deref(trailer[:ID]), opts[:password] ) end |
Instance Attribute Details
#default ⇒ Object
Returns the value of attribute default.
33 34 35 |
# File 'lib/pdf/reader/object_hash.rb', line 33 def default @default end |
#pdf_version ⇒ Object (readonly)
Returns the value of attribute pdf_version.
34 35 36 |
# File 'lib/pdf/reader/object_hash.rb', line 34 def pdf_version @pdf_version end |
#sec_handler ⇒ Object (readonly)
Returns the value of attribute sec_handler.
35 36 37 |
# File 'lib/pdf/reader/object_hash.rb', line 35 def sec_handler @sec_handler end |
#trailer ⇒ Object (readonly)
Returns the value of attribute trailer.
34 35 36 |
# File 'lib/pdf/reader/object_hash.rb', line 34 def trailer @trailer end |
Instance Method Details
#[](key) ⇒ Object
Access an object from the PDF. key can be an int or a PDF::Reader::Reference object.
If an int is used, the object with that ID and a generation number of 0 will be returned.
If a PDF::Reader::Reference object is used the exact ID and generation number can be specified.
79 80 81 82 83 84 85 86 87 88 89 |
# File 'lib/pdf/reader/object_hash.rb', line 79 def [](key) return default if key.to_i <= 0 unless key.is_a?(PDF::Reader::Reference) key = PDF::Reader::Reference.new(key.to_i, 0) end @cache[key] ||= fetch_object(key) || fetch_object_stream(key) rescue InvalidObjectError return default end |
#deref!(key) ⇒ Object
Recursively dereferences the object refered to be key
. If key
is not a PDF::Reader::Reference, the key is returned unchanged.
314 315 316 |
# File 'lib/pdf/reader/object_hash.rb', line 314 def deref!(key) deref_internal!(key, {}) end |
#deref_array(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.
Guaranteed to only return an Array or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting an Array and no other type will do.
105 106 107 108 109 110 111 112 113 |
# File 'lib/pdf/reader/object_hash.rb', line 105 def deref_array(key) obj = deref(key) return obj if obj.nil? obj.tap { |obj| raise MalformedPDFError, "expected object to be an Array or nil" if !obj.is_a?(Array) } end |
#deref_array!(key) ⇒ Object
318 319 320 321 322 323 324 |
# File 'lib/pdf/reader/object_hash.rb', line 318 def deref_array!(key) deref!(key).tap { |obj| if !obj.nil? && !obj.is_a?(Array) raise MalformedPDFError, "expected object (#{obj.inspect}) to be an Array or nil" end } end |
#deref_array_of_numbers(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.
Guaranteed to only return an Array of Numerics or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting an Array and no other type will do.
Some effort to cast array elements to a number is made for any non-numeric elements.
123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
# File 'lib/pdf/reader/object_hash.rb', line 123 def deref_array_of_numbers(key) arr = deref(key) return arr if arr.nil? raise MalformedPDFError, "expected object to be an Array" unless arr.is_a?(Array) arr.map { |item| if item.is_a?(Numeric) item elsif item.respond_to?(:to_f) item.to_f elsif item.respond_to?(:to_i) item.to_i else raise MalformedPDFError, "expected object to be a number" end } end |
#deref_hash(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.
Guaranteed to only return a Hash or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting an Array and no other type will do.
149 150 151 152 153 154 155 156 157 |
# File 'lib/pdf/reader/object_hash.rb', line 149 def deref_hash(key) obj = deref(key) return obj if obj.nil? obj.tap { |obj| raise MalformedPDFError, "expected object to be a Hash or nil" if !obj.is_a?(Hash) } end |
#deref_hash!(key) ⇒ Object
326 327 328 329 330 331 332 |
# File 'lib/pdf/reader/object_hash.rb', line 326 def deref_hash!(key) deref!(key).tap { |obj| if !obj.nil? && !obj.is_a?(Hash) raise MalformedPDFError, "expected object (#{obj.inspect}) to be a Hash or nil" end } end |
#deref_integer(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.
Guaranteed to only return an Integer or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting an Array and no other type will do.
Some effort to cast to an int is made when the reference points to a non-integer.
191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 |
# File 'lib/pdf/reader/object_hash.rb', line 191 def deref_integer(key) obj = deref(key) return obj if obj.nil? if !obj.is_a?(Integer) if obj.respond_to?(:to_i) obj = obj.to_i else raise MalformedPDFError, "expected object to be an Integer" end end obj end |
#deref_name(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.
Guaranteed to only return a PDF name (Symbol) or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting an Array and no other type will do.
Some effort to cast to a symbol is made when the reference points to a non-symbol.
167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 |
# File 'lib/pdf/reader/object_hash.rb', line 167 def deref_name(key) obj = deref(key) return obj if obj.nil? if !obj.is_a?(Symbol) if obj.respond_to?(:to_sym) obj = obj.to_sym else raise MalformedPDFError, "expected object to be a Name" end end obj end |
#deref_name_or_array(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.
Guaranteed to only return a PDF Name (symbol), Array or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting a Name or Array and no other type will do.
281 282 283 284 285 286 287 288 289 290 291 |
# File 'lib/pdf/reader/object_hash.rb', line 281 def deref_name_or_array(key) obj = deref(key) return obj if obj.nil? obj.tap { |obj| if !obj.is_a?(Symbol) && !obj.is_a?(Array) raise MalformedPDFError, "expected object to be an Array or Name" end } end |
#deref_number(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.
Guaranteed to only return a Numeric or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting an Array and no other type will do.
Some effort to cast to a number is made when the reference points to a non-number.
215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 |
# File 'lib/pdf/reader/object_hash.rb', line 215 def deref_number(key) obj = deref(key) return obj if obj.nil? if !obj.is_a?(Numeric) if obj.respond_to?(:to_f) obj = obj.to_f elsif obj.respond_to?(:to_i) obj.to_i else raise MalformedPDFError, "expected object to be a number" end end obj end |
#deref_stream(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.
Guaranteed to only return a PDF::Reader::Stream or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting a stream and no other type will do.
239 240 241 242 243 244 245 246 247 248 249 |
# File 'lib/pdf/reader/object_hash.rb', line 239 def deref_stream(key) obj = deref(key) return obj if obj.nil? obj.tap { |obj| if !obj.is_a?(PDF::Reader::Stream) raise MalformedPDFError, "expected object to be a Stream or nil" end } end |
#deref_stream_or_array(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.
Guaranteed to only return a PDF::Reader::Stream, Array or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting a stream or Array and no other type will do.
299 300 301 302 303 304 305 306 307 308 309 |
# File 'lib/pdf/reader/object_hash.rb', line 299 def deref_stream_or_array(key) obj = deref(key) return obj if obj.nil? obj.tap { |obj| if !obj.is_a?(PDF::Reader::Stream) && !obj.is_a?(Array) raise MalformedPDFError, "expected object to be an Array or Stream" end } end |
#deref_string(key) ⇒ Object
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.
Guaranteed to only return a String or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting a string and no other type will do.
Some effort to cast to a string is made when the reference points to a non-string.
259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 |
# File 'lib/pdf/reader/object_hash.rb', line 259 def deref_string(key) obj = deref(key) return obj if obj.nil? if !obj.is_a?(String) if obj.respond_to?(:to_s) obj = obj.to_s else raise MalformedPDFError, "expected object to be a string" end end obj end |
#each(&block) ⇒ Object Also known as: each_pair
iterate over each key, value. Just like a ruby hash.
359 360 361 362 363 |
# File 'lib/pdf/reader/object_hash.rb', line 359 def each(&block) @xref.each do |ref| yield ref, self[ref] end end |
#each_key(&block) ⇒ Object
iterate over each key. Just like a ruby hash.
368 369 370 371 372 |
# File 'lib/pdf/reader/object_hash.rb', line 368 def each_key(&block) each do |id, obj| yield id end end |
#each_value(&block) ⇒ Object
iterate over each value. Just like a ruby hash.
376 377 378 379 380 |
# File 'lib/pdf/reader/object_hash.rb', line 376 def each_value(&block) each do |id, obj| yield obj end end |
#empty? ⇒ Boolean
return true if there are no objects in this file
391 392 393 |
# File 'lib/pdf/reader/object_hash.rb', line 391 def empty? size == 0 ? true : false end |
#encrypted? ⇒ Boolean
474 475 476 |
# File 'lib/pdf/reader/object_hash.rb', line 474 def encrypted? trailer.has_key?(:Encrypt) end |
#fetch(key, local_default = nil) ⇒ Object
Access an object from the PDF. key can be an int or a PDF::Reader::Reference object.
If an int is used, the object with that ID and a generation number of 0 will be returned.
If a PDF::Reader::Reference object is used the exact ID and generation number can be specified.
local_default is the object that will be returned if the requested key doesn’t exist.
346 347 348 349 350 351 352 353 354 355 |
# File 'lib/pdf/reader/object_hash.rb', line 346 def fetch(key, local_default = nil) obj = self[key] if obj return obj elsif local_default return local_default else raise IndexError, "#{key} is invalid" if key.to_i <= 0 end end |
#has_key?(check_key) ⇒ Boolean Also known as: include?, key?, member?, value?
return true if the specified key exists in the file. key can be an int or a PDF::Reader::Reference
398 399 400 401 402 403 404 405 406 407 408 |
# File 'lib/pdf/reader/object_hash.rb', line 398 def has_key?(check_key) # TODO update from O(n) to O(1) each_key do |key| if check_key.kind_of?(PDF::Reader::Reference) return true if check_key == key else return true if check_key.to_i == key.id end end return false end |
#has_value?(value) ⇒ Boolean
return true if the specifiedvalue exists in the file
415 416 417 418 419 420 421 |
# File 'lib/pdf/reader/object_hash.rb', line 415 def has_value?(value) # TODO update from O(n) to O(1) each_value do |obj| return true if obj == value end return false end |
#keys ⇒ Object
return an array of all keys in the file
430 431 432 433 434 |
# File 'lib/pdf/reader/object_hash.rb', line 430 def keys ret = [] each_key { |k| ret << k } ret end |
#obj_type(ref) ⇒ Object
returns the type of object a ref points to
59 60 61 62 63 |
# File 'lib/pdf/reader/object_hash.rb', line 59 def obj_type(ref) self[ref].class.to_s.to_sym rescue nil end |
#object(key) ⇒ Object Also known as: deref
If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.
94 95 96 |
# File 'lib/pdf/reader/object_hash.rb', line 94 def object(key) key.is_a?(PDF::Reader::Reference) ? self[key] : key end |
#page_references ⇒ Object
returns an array of PDF::Reader::References. Each reference in the array points a Page object, one for each page in the PDF. The first reference is page 1, second reference is page 2, etc.
Useful for apps that want to extract data from specific pages.
466 467 468 469 470 471 472 |
# File 'lib/pdf/reader/object_hash.rb', line 466 def page_references root = fetch(trailer[:Root]) @page_references ||= begin pages_root = deref_hash(root[:Pages]) || {} get_page_objects(pages_root) end end |
#sec_handler? ⇒ Boolean
478 479 480 |
# File 'lib/pdf/reader/object_hash.rb', line 478 def sec_handler? !!sec_handler end |
#size ⇒ Object Also known as: length
return the number of objects in the file. An object with multiple generations is counted once.
384 385 386 |
# File 'lib/pdf/reader/object_hash.rb', line 384 def size xref.size end |
#stream?(ref) ⇒ Boolean
returns true if the supplied references points to an object with a stream
66 67 68 |
# File 'lib/pdf/reader/object_hash.rb', line 66 def stream?(ref) self.has_key?(ref) && self[ref].is_a?(PDF::Reader::Stream) end |
#to_a ⇒ Object
return an array of arrays. Each sub array contains a key/value pair.
452 453 454 455 456 457 458 |
# File 'lib/pdf/reader/object_hash.rb', line 452 def to_a ret = [] each do |id, obj| ret << [id, obj] end ret end |
#to_s ⇒ Object
424 425 426 |
# File 'lib/pdf/reader/object_hash.rb', line 424 def to_s "<PDF::Reader::ObjectHash size: #{self.size}>" end |
#values ⇒ Object
return an array of all values in the file
438 439 440 441 442 |
# File 'lib/pdf/reader/object_hash.rb', line 438 def values ret = [] each_value { |v| ret << v } ret end |
#values_at(*ids) ⇒ Object
return an array of all values from the specified keys
446 447 448 |
# File 'lib/pdf/reader/object_hash.rb', line 446 def values_at(*ids) ids.map { |id| self[id] } end |