Class: PDF::Hash

Inherits:
Object
  • Object
show all
Includes:
Enumerable
Defined in:
lib/pdf/hash.rb

Overview

Provides low level access to the objects in a PDF file via a hash-like object.

A PDF file can be viewed as a large hash map. It is a series of objects stored at an exact byte offsets, and a table that maps object IDs to byte offsets. Given an object ID, looking up an object is an O(1) operation.

Each PDF object can be mapped to a ruby object, so by passing an object ID to the [] method, a ruby representation of that object will be retrieved.

The class behaves much like a standard Ruby hash, including the use of the Enumerable mixin. The key difference is no []= method - the hash is read only.

Basic Usage

h = PDF::Hash.new("somefile.pdf")
h[1]
=> 3469

h[PDF::Reader::Reference.new(1,0)]
=> 3469

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(input) ⇒ Hash

Creates a new PDF:Hash object. input can be a string with a valid filename, a string containing a PDF file, or an IO object.



35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# File 'lib/pdf/hash.rb', line 35

def initialize(input)
  if input.kind_of?(IO) || input.kind_of?(StringIO)
    io = input
  elsif File.file?(input.to_s)
    if File.respond_to?(:binread)
      input = File.binread(input.to_s)
    else
      input = File.read(input.to_s)
    end
    io = StringIO.new(input)
  else
    raise ArgumentError, "input must be an IO-like object or a filename"
  end
  @version = read_version(io)
  @xref  = PDF::Reader::XRef.new(io)
  @trailer = @xref.load
end

Instance Attribute Details

#defaultObject

Returns the value of attribute default.



29
30
31
# File 'lib/pdf/hash.rb', line 29

def default
  @default
end

#trailerObject (readonly)

Returns the value of attribute trailer.



30
31
32
# File 'lib/pdf/hash.rb', line 30

def trailer
  @trailer
end

#versionObject (readonly)

Returns the value of attribute version.



30
31
32
# File 'lib/pdf/hash.rb', line 30

def version
  @version
end

Instance Method Details

#[](key) ⇒ Object

Access an object from the PDF. key can be an int or a PDF::Reader::Reference object.

If an int is used, the object with that ID and a generation number of 0 will be returned.

If a PDF::Reader::Reference object is used the exact ID and generation number can be specified.



62
63
64
65
66
67
68
69
70
71
72
73
# File 'lib/pdf/hash.rb', line 62

def [](key)
  return default if key.to_i <= 0

  begin
    unless key.kind_of?(PDF::Reader::Reference)
      key = PDF::Reader::Reference.new(key.to_i, 0)
    end
    @xref.object(key)
  rescue
    return default
  end
end

#each(&block) ⇒ Object Also known as: each_pair

iterate over each key, value. Just like a ruby hash.



100
101
102
103
104
# File 'lib/pdf/hash.rb', line 100

def each(&block)
  @xref.each do |ref, obj|
    yield ref, obj
  end
end

#each_key(&block) ⇒ Object

iterate over each key. Just like a ruby hash.



109
110
111
112
113
# File 'lib/pdf/hash.rb', line 109

def each_key(&block)
  each do |id, obj|
    yield id
  end
end

#each_value(&block) ⇒ Object

iterate over each value. Just like a ruby hash.



117
118
119
120
121
# File 'lib/pdf/hash.rb', line 117

def each_value(&block)
  each do |id, obj|
    yield obj
  end
end

#empty?Boolean

return true if there are no objects in this file

Returns:

  • (Boolean)


132
133
134
# File 'lib/pdf/hash.rb', line 132

def empty?
  size == 0 ? true : false
end

#fetch(key, local_default = nil) ⇒ Object

Access an object from the PDF. key can be an int or a PDF::Reader::Reference object.

If an int is used, the object with that ID and a generation number of 0 will be returned.

If a PDF::Reader::Reference object is used the exact ID and generation number can be specified.

local_deault is the object that will be returned if the requested key doesn’t exist.



87
88
89
90
91
92
93
94
95
96
# File 'lib/pdf/hash.rb', line 87

def fetch(key, local_default = nil)
  obj = self[key]
  if obj
    return obj
  elsif local_default
    return local_default
  else
    raise IndexError, "#{key} is invalid" if key.to_i <= 0
  end
end

#has_key?(check_key) ⇒ Boolean Also known as: include?, key?, member?, value?

return true if the specified key exists in the file. key can be an int or a PDF::Reader::Reference

Returns:

  • (Boolean)


139
140
141
142
143
144
145
146
147
148
149
# File 'lib/pdf/hash.rb', line 139

def has_key?(check_key)
  # TODO update from O(n) to O(1)
  each_key do |key|
    if check_key.kind_of?(PDF::Reader::Reference)
      return true if check_key == key
    else
      return true if check_key.to_i == key.id
    end
  end
  return false
end

#has_value?(value) ⇒ Boolean

return true if the specifiedvalue exists in the file

Returns:

  • (Boolean)


156
157
158
159
160
161
162
# File 'lib/pdf/hash.rb', line 156

def has_value?(value)
  # TODO update from O(n) to O(1)
  each_value do |obj|
    return true if obj == value
  end
  return false
end

#keysObject

return an array of all keys in the file



171
172
173
174
175
# File 'lib/pdf/hash.rb', line 171

def keys
  ret = []
  each_key { |k| ret << k }
  ret
end

#sizeObject Also known as: length

return the number of objects in the file. An object with multiple generations is counted once.



125
126
127
# File 'lib/pdf/hash.rb', line 125

def size
  @xref.size
end

#to_aObject

return an array of arrays. Each sub array contains a key/value pair.



193
194
195
196
197
198
199
# File 'lib/pdf/hash.rb', line 193

def to_a
  ret = []
  each do |id, obj|
    ret << [id, obj]
  end
  ret
end

#to_sObject



165
166
167
# File 'lib/pdf/hash.rb', line 165

def to_s
  "<PDF::Hash size: #{self.size}>"
end

#valuesObject

return an array of all values in the file



179
180
181
182
183
# File 'lib/pdf/hash.rb', line 179

def values
  ret = []
  each_value { |v| ret << v }
  ret
end

#values_at(*ids) ⇒ Object

return an array of all values from the specified keys



187
188
189
# File 'lib/pdf/hash.rb', line 187

def values_at(*ids)
  ids.map { |id| self[id] }
end