Class: PDF::Reader::ObjectHash

Inherits:
Object
  • Object
show all
Includes:
Enumerable
Defined in:
lib/pdf/reader/object_hash.rb

Overview

Provides low level access to the objects in a PDF file via a hash-like object.

A PDF file can be viewed as a large hash map. It is a series of objects stored at precise byte offsets, and a table that maps object IDs to byte offsets. Given an object ID, looking up an object is an O(1) operation.

Each PDF object can be mapped to a ruby object, so by passing an object ID to the [] method, a ruby representation of that object will be retrieved.

The class behaves much like a standard Ruby hash, including the use of the Enumerable mixin. The key difference is no []= method - the hash is read only.

Basic Usage

h = PDF::Reader::ObjectHash.new("somefile.pdf")
h[1]
=> 3469

h[PDF::Reader::Reference.new(1,0)]
=> 3469

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(input, opts = {}) ⇒ ObjectHash

Creates a new ObjectHash object. Input can be a string with a valid filename or an IO-like object.

Valid options:

:password - the user password to decrypt the source PDF


44
45
46
47
48
49
50
51
52
53
54
55
56
# File 'lib/pdf/reader/object_hash.rb', line 44

def initialize(input, opts = {})
  @io          = extract_io_from(input)
  @xref        = PDF::Reader::XRef.new(@io)
  @pdf_version = read_version
  @trailer     = @xref.trailer
  @cache       = opts[:cache] || PDF::Reader::ObjectCache.new
  @sec_handler = NullSecurityHandler.new
  @sec_handler = SecurityHandlerFactory.build(
    deref(trailer[:Encrypt]),
    deref(trailer[:ID]),
    opts[:password]
  )
end

Instance Attribute Details

#defaultObject

Returns the value of attribute default.



33
34
35
# File 'lib/pdf/reader/object_hash.rb', line 33

def default
  @default
end

#pdf_versionObject (readonly)

Returns the value of attribute pdf_version.



34
35
36
# File 'lib/pdf/reader/object_hash.rb', line 34

def pdf_version
  @pdf_version
end

#sec_handlerObject (readonly)

Returns the value of attribute sec_handler.



35
36
37
# File 'lib/pdf/reader/object_hash.rb', line 35

def sec_handler
  @sec_handler
end

#trailerObject (readonly)

Returns the value of attribute trailer.



34
35
36
# File 'lib/pdf/reader/object_hash.rb', line 34

def trailer
  @trailer
end

Instance Method Details

#[](key) ⇒ Object

Access an object from the PDF. key can be an int or a PDF::Reader::Reference object.

If an int is used, the object with that ID and a generation number of 0 will be returned.

If a PDF::Reader::Reference object is used the exact ID and generation number can be specified.



79
80
81
82
83
84
85
86
87
88
89
# File 'lib/pdf/reader/object_hash.rb', line 79

def [](key)
  return default if key.to_i <= 0

  unless key.is_a?(PDF::Reader::Reference)
    key = PDF::Reader::Reference.new(key.to_i, 0)
  end

  @cache[key] ||= fetch_object(key) || fetch_object_stream(key)
rescue InvalidObjectError
  return default
end

#deref!(key) ⇒ Object

Recursively dereferences the object refered to be key. If key is not a PDF::Reader::Reference, the key is returned unchanged.



314
315
316
# File 'lib/pdf/reader/object_hash.rb', line 314

def deref!(key)
  deref_internal!(key, {})
end

#deref_array(key) ⇒ Object

If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.

Guaranteed to only return an Array or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting an Array and no other type will do.



105
106
107
108
109
110
111
112
113
# File 'lib/pdf/reader/object_hash.rb', line 105

def deref_array(key)
  obj = deref(key)

  return obj if obj.nil?

  obj.tap { |obj|
    raise MalformedPDFError, "expected object to be an Array or nil" if !obj.is_a?(Array)
  }
end

#deref_array!(key) ⇒ Object



318
319
320
321
322
323
324
# File 'lib/pdf/reader/object_hash.rb', line 318

def deref_array!(key)
  deref!(key).tap { |obj|
    if !obj.nil? && !obj.is_a?(Array)
      raise MalformedPDFError, "expected object (#{obj.inspect}) to be an Array or nil"
    end
  }
end

#deref_array_of_numbers(key) ⇒ Object

If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.

Guaranteed to only return an Array of Numerics or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting an Array and no other type will do.

Some effort to cast array elements to a number is made for any non-numeric elements.

Raises:



123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
# File 'lib/pdf/reader/object_hash.rb', line 123

def deref_array_of_numbers(key)
  arr = deref(key)

  return arr if arr.nil?

  raise MalformedPDFError, "expected object to be an Array" unless arr.is_a?(Array)

  arr.map { |item|
    if item.is_a?(Numeric)
      item
    elsif item.respond_to?(:to_f)
      item.to_f
    elsif item.respond_to?(:to_i)
      item.to_i
    else
      raise MalformedPDFError, "expected object to be a number"
    end
  }
end

#deref_hash(key) ⇒ Object

If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.

Guaranteed to only return a Hash or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting an Array and no other type will do.



149
150
151
152
153
154
155
156
157
# File 'lib/pdf/reader/object_hash.rb', line 149

def deref_hash(key)
  obj = deref(key)

  return obj if obj.nil?

  obj.tap { |obj|
    raise MalformedPDFError, "expected object to be a Hash or nil" if !obj.is_a?(Hash)
  }
end

#deref_hash!(key) ⇒ Object



326
327
328
329
330
331
332
# File 'lib/pdf/reader/object_hash.rb', line 326

def deref_hash!(key)
  deref!(key).tap { |obj|
    if !obj.nil? && !obj.is_a?(Hash)
      raise MalformedPDFError, "expected object (#{obj.inspect}) to be a Hash or nil"
    end
  }
end

#deref_integer(key) ⇒ Object

If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.

Guaranteed to only return an Integer or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting an Array and no other type will do.

Some effort to cast to an int is made when the reference points to a non-integer.



191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
# File 'lib/pdf/reader/object_hash.rb', line 191

def deref_integer(key)
  obj = deref(key)

  return obj if obj.nil?

  if !obj.is_a?(Integer)
    if obj.respond_to?(:to_i)
      obj = obj.to_i
    else
      raise MalformedPDFError, "expected object to be an Integer"
    end
  end

  obj
end

#deref_name(key) ⇒ Object

If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.

Guaranteed to only return a PDF name (Symbol) or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting an Array and no other type will do.

Some effort to cast to a symbol is made when the reference points to a non-symbol.



167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
# File 'lib/pdf/reader/object_hash.rb', line 167

def deref_name(key)
  obj = deref(key)

  return obj if obj.nil?

  if !obj.is_a?(Symbol)
    if obj.respond_to?(:to_sym)
      obj = obj.to_sym
    else
      raise MalformedPDFError, "expected object to be a Name"
    end
  end

  obj
end

#deref_name_or_array(key) ⇒ Object

If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.

Guaranteed to only return a PDF Name (symbol), Array or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting a Name or Array and no other type will do.



281
282
283
284
285
286
287
288
289
290
291
# File 'lib/pdf/reader/object_hash.rb', line 281

def deref_name_or_array(key)
  obj = deref(key)

  return obj if obj.nil?

  obj.tap { |obj|
    if !obj.is_a?(Symbol) && !obj.is_a?(Array)
      raise MalformedPDFError, "expected object to be an Array or Name"
    end
  }
end

#deref_number(key) ⇒ Object

If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.

Guaranteed to only return a Numeric or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting an Array and no other type will do.

Some effort to cast to a number is made when the reference points to a non-number.



215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
# File 'lib/pdf/reader/object_hash.rb', line 215

def deref_number(key)
  obj = deref(key)

  return obj if obj.nil?

  if !obj.is_a?(Numeric)
    if obj.respond_to?(:to_f)
      obj = obj.to_f
    elsif obj.respond_to?(:to_i)
      obj.to_i
    else
      raise MalformedPDFError, "expected object to be a number"
    end
  end

  obj
end

#deref_stream(key) ⇒ Object

If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.

Guaranteed to only return a PDF::Reader::Stream or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting a stream and no other type will do.



239
240
241
242
243
244
245
246
247
248
249
# File 'lib/pdf/reader/object_hash.rb', line 239

def deref_stream(key)
  obj = deref(key)

  return obj if obj.nil?

  obj.tap { |obj|
    if !obj.is_a?(PDF::Reader::Stream)
      raise MalformedPDFError, "expected object to be a Stream or nil"
    end
  }
end

#deref_stream_or_array(key) ⇒ Object

If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.

Guaranteed to only return a PDF::Reader::Stream, Array or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting a stream or Array and no other type will do.



299
300
301
302
303
304
305
306
307
308
309
# File 'lib/pdf/reader/object_hash.rb', line 299

def deref_stream_or_array(key)
  obj = deref(key)

  return obj if obj.nil?

  obj.tap { |obj|
    if !obj.is_a?(PDF::Reader::Stream) && !obj.is_a?(Array)
      raise MalformedPDFError, "expected object to be an Array or Stream"
    end
  }
end

#deref_string(key) ⇒ Object

If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.

Guaranteed to only return a String or nil. If the dereference results in any other type then a MalformedPDFError exception will raise. Useful when expecting a string and no other type will do.

Some effort to cast to a string is made when the reference points to a non-string.



259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
# File 'lib/pdf/reader/object_hash.rb', line 259

def deref_string(key)
  obj = deref(key)

  return obj if obj.nil?

  if !obj.is_a?(String)
    if obj.respond_to?(:to_s)
      obj = obj.to_s
    else
      raise MalformedPDFError, "expected object to be a string"
    end
  end

  obj
end

#each(&block) ⇒ Object Also known as: each_pair

iterate over each key, value. Just like a ruby hash.



359
360
361
362
363
# File 'lib/pdf/reader/object_hash.rb', line 359

def each(&block)
  @xref.each do |ref|
    yield ref, self[ref]
  end
end

#each_key(&block) ⇒ Object

iterate over each key. Just like a ruby hash.



368
369
370
371
372
# File 'lib/pdf/reader/object_hash.rb', line 368

def each_key(&block)
  each do |id, obj|
    yield id
  end
end

#each_value(&block) ⇒ Object

iterate over each value. Just like a ruby hash.



376
377
378
379
380
# File 'lib/pdf/reader/object_hash.rb', line 376

def each_value(&block)
  each do |id, obj|
    yield obj
  end
end

#empty?Boolean

return true if there are no objects in this file

Returns:

  • (Boolean)


391
392
393
# File 'lib/pdf/reader/object_hash.rb', line 391

def empty?
  size == 0 ? true : false
end

#encrypted?Boolean

Returns:

  • (Boolean)


474
475
476
# File 'lib/pdf/reader/object_hash.rb', line 474

def encrypted?
  trailer.has_key?(:Encrypt)
end

#fetch(key, local_default = nil) ⇒ Object

Access an object from the PDF. key can be an int or a PDF::Reader::Reference object.

If an int is used, the object with that ID and a generation number of 0 will be returned.

If a PDF::Reader::Reference object is used the exact ID and generation number can be specified.

local_default is the object that will be returned if the requested key doesn’t exist.



346
347
348
349
350
351
352
353
354
355
# File 'lib/pdf/reader/object_hash.rb', line 346

def fetch(key, local_default = nil)
  obj = self[key]
  if obj
    return obj
  elsif local_default
    return local_default
  else
    raise IndexError, "#{key} is invalid" if key.to_i <= 0
  end
end

#has_key?(check_key) ⇒ Boolean Also known as: include?, key?, member?, value?

return true if the specified key exists in the file. key can be an int or a PDF::Reader::Reference

Returns:

  • (Boolean)


398
399
400
401
402
403
404
405
406
407
408
# File 'lib/pdf/reader/object_hash.rb', line 398

def has_key?(check_key)
  # TODO update from O(n) to O(1)
  each_key do |key|
    if check_key.kind_of?(PDF::Reader::Reference)
      return true if check_key == key
    else
      return true if check_key.to_i == key.id
    end
  end
  return false
end

#has_value?(value) ⇒ Boolean

return true if the specifiedvalue exists in the file

Returns:

  • (Boolean)


415
416
417
418
419
420
421
# File 'lib/pdf/reader/object_hash.rb', line 415

def has_value?(value)
  # TODO update from O(n) to O(1)
  each_value do |obj|
    return true if obj == value
  end
  return false
end

#keysObject

return an array of all keys in the file



430
431
432
433
434
# File 'lib/pdf/reader/object_hash.rb', line 430

def keys
  ret = []
  each_key { |k| ret << k }
  ret
end

#obj_type(ref) ⇒ Object

returns the type of object a ref points to



59
60
61
62
63
# File 'lib/pdf/reader/object_hash.rb', line 59

def obj_type(ref)
  self[ref].class.to_s.to_sym
rescue
  nil
end

#object(key) ⇒ Object Also known as: deref

If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.



94
95
96
# File 'lib/pdf/reader/object_hash.rb', line 94

def object(key)
  key.is_a?(PDF::Reader::Reference) ? self[key] : key
end

#page_referencesObject

returns an array of PDF::Reader::References. Each reference in the array points a Page object, one for each page in the PDF. The first reference is page 1, second reference is page 2, etc.

Useful for apps that want to extract data from specific pages.



466
467
468
469
470
471
472
# File 'lib/pdf/reader/object_hash.rb', line 466

def page_references
  root  = fetch(trailer[:Root])
  @page_references ||= begin
                         pages_root = deref_hash(root[:Pages]) || {}
                         get_page_objects(pages_root)
                       end
end

#sec_handler?Boolean

Returns:

  • (Boolean)


478
479
480
# File 'lib/pdf/reader/object_hash.rb', line 478

def sec_handler?
  !!sec_handler
end

#sizeObject Also known as: length

return the number of objects in the file. An object with multiple generations is counted once.



384
385
386
# File 'lib/pdf/reader/object_hash.rb', line 384

def size
  xref.size
end

#stream?(ref) ⇒ Boolean

returns true if the supplied references points to an object with a stream

Returns:

  • (Boolean)


66
67
68
# File 'lib/pdf/reader/object_hash.rb', line 66

def stream?(ref)
  self.has_key?(ref) && self[ref].is_a?(PDF::Reader::Stream)
end

#to_aObject

return an array of arrays. Each sub array contains a key/value pair.



452
453
454
455
456
457
458
# File 'lib/pdf/reader/object_hash.rb', line 452

def to_a
  ret = []
  each do |id, obj|
    ret << [id, obj]
  end
  ret
end

#to_sObject



424
425
426
# File 'lib/pdf/reader/object_hash.rb', line 424

def to_s
  "<PDF::Reader::ObjectHash size: #{self.size}>"
end

#valuesObject

return an array of all values in the file



438
439
440
441
442
# File 'lib/pdf/reader/object_hash.rb', line 438

def values
  ret = []
  each_value { |v| ret << v }
  ret
end

#values_at(*ids) ⇒ Object

return an array of all values from the specified keys



446
447
448
# File 'lib/pdf/reader/object_hash.rb', line 446

def values_at(*ids)
  ids.map { |id| self[id] }
end