Class: PDF::Reader::XRef

Inherits:
Object
  • Object
show all
Defined in:
lib/pdf/reader/xref.rb

Overview

An internal PDF::Reader class that represents the Xref table in a PDF file An Xref table is a map of object identifiers and byte offsets. Any time a particular object needs to be found, the Xref table is used to find where it is stored in the file.

Instance Method Summary collapse

Constructor Details

#initialize(buffer) ⇒ XRef

create a new Xref table based on the contents of the supplied PDF::Reader::Buffer object



35
36
37
38
# File 'lib/pdf/reader/xref.rb', line 35

def initialize (buffer)
  @buffer = buffer
  @xref = {}
end

Instance Method Details

#load(offset = nil) ⇒ Object

Read the xref table from the underlying buffer. If offset is specified the table will be loaded from there, otherwise the default offset will be located and used.

Will fail silently if there is no xref table at the requested offset.



54
55
56
57
58
59
60
61
62
63
64
65
66
# File 'lib/pdf/reader/xref.rb', line 54

def load (offset = nil)
  offset ||= @buffer.find_first_xref_offset
  @buffer.seek(offset)
  token = @buffer.token

  if token == "xref" || token == "ref"
    load_xref_table
  elsif token.to_i >= 0 && @buffer.token.to_i >= 0 && @buffer.token == "obj"
    raise PDF::Reader::UnsupportedFeatureError, "XRef streams are not supported in PDF::Reader yet"
  else
    raise PDF::Reader::MalformedPDFError, "xref table not found at offset #{offset} (#{token} != xref)"
  end
end

#load_xref_tableObject

Assumes the underlying buffer is positioned at the start of an Xref table and processes it into memory.

Raises:



83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
# File 'lib/pdf/reader/xref.rb', line 83

def load_xref_table
  tok_one = tok_two = nil

  begin
    # loop over all subsections of the xref table
    # In a well formed PDF, the 'trailer' token will indicate
    # the end of the table. However we need to be careful in case
    # we're processing a malformed pdf that is missing the trailer.
    loop do
      tok_one, tok_two = @buffer.token, @buffer.token
      if tok_one != "trailer" && !tok_one.match(/\d+/)
        raise MalformedPDFError, "PDF malformed, missing trailer after cross reference"
      end
      break if tok_one == "trailer" or tok_one.nil?
      objid, count = tok_one.to_i, tok_two.to_i

      count.times do
        offset = @buffer.token.to_i
        generation = @buffer.token.to_i
        state = @buffer.token

        store(objid, generation, offset) if state == "n"
        objid += 1
      end
    end
  rescue EOFError => e
    raise MalformedPDFError, "PDF malformed, missing trailer after cross reference"
  end

  raise MalformedPDFError, "PDF malformed, trailer should be a dictionary" unless tok_two == "<<"

  trailer = Parser.new(@buffer, self).dictionary
  load(trailer[:Prev].to_i) if trailer.has_key?(:Prev)

  trailer
end

#obj_type(ref) ⇒ Object

returns the type of object a ref points to



120
121
122
123
# File 'lib/pdf/reader/xref.rb', line 120

def obj_type(ref)
  obj = object(ref)
  obj.class.to_s.to_sym
end

#object(ref, save_pos = true) ⇒ Object

Return a string containing the contents of an entire PDF object. The object is requested by specifying a PDF::Reader::Reference object that contains the objects ID and revision number

If the object is a stream, that is returned as well



73
74
75
76
77
78
79
# File 'lib/pdf/reader/xref.rb', line 73

def object (ref, save_pos = true)
  return ref unless ref.kind_of?(Reference)
  pos = @buffer.pos if save_pos
  obj = Parser.new(@buffer.seek(offset_for(ref)), self).object(ref.id, ref.gen)
  @buffer.seek(pos) if save_pos
  return obj
end

#offset_for(ref) ⇒ Object

returns the byte offset for the specified PDF object.

ref - a PDF::Reader::Reference object containing an object ID and revision number



133
134
135
136
137
# File 'lib/pdf/reader/xref.rb', line 133

def offset_for (ref)
  @xref[ref.id][ref.gen]
rescue 
  raise InvalidObjectError, "Object #{ref.id}, Generation #{ref.gen} is invalid"
end

#pdf_versionObject

returns the PDF version of the current document. Technically this isn’t part of the XRef table, but it is one of the lowest level data items in the file, so we’ve lumped it in with the cross reference code.

Raises:



43
44
45
46
47
48
# File 'lib/pdf/reader/xref.rb', line 43

def pdf_version
  @buffer.seek(0)
  m, version = *@buffer.read(8).match(/%PDF-(\d.\d)/)
  raise MalformedPDFError, 'invalid PDF version' if version.nil?
  return version.to_f
end

#store(id, gen, offset) ⇒ Object

Stores an offset value for a particular PDF object ID and revision number



140
141
142
# File 'lib/pdf/reader/xref.rb', line 140

def store (id, gen, offset)
  (@xref[id] ||= {})[gen] ||= offset
end

#stream?(ref) ⇒ Boolean

returns true if the supplied references points to an object with a stream

Returns:

  • (Boolean)


125
126
127
128
# File 'lib/pdf/reader/xref.rb', line 125

def stream?(ref)
  obj, stream = @xref.object(ref)
  stream ? true : false
end