Class: PDF::Reader::Parser

Inherits:
Object
  • Object
show all
Defined in:
lib/pdf/reader/parser.rb

Overview

An internal PDF::Reader class that reads objects from the PDF file and converts them into useable ruby objects (hash’s, arrays, true, false, etc)

Constant Summary collapse

TOKEN_STRATEGY =
proc { |parser, token| Token.new(token) }
STRATEGIES =
{
  "/"  => proc { |parser, token| parser.send(:pdf_name) },
  "<<" => proc { |parser, token| parser.send(:dictionary) },
  "["  => proc { |parser, token| parser.send(:array) },
  "("  => proc { |parser, token| parser.send(:string) },
  "<"  => proc { |parser, token| parser.send(:hex_string) },

  nil     => proc { nil },
  "true"  => proc { true },
  "false" => proc { false },
  "null"  => proc { nil },

  "obj"       => TOKEN_STRATEGY,
  "endobj"    => TOKEN_STRATEGY,
  "stream"    => TOKEN_STRATEGY,
  "endstream" => TOKEN_STRATEGY,
  ">>"        => TOKEN_STRATEGY,
  "]"         => TOKEN_STRATEGY,
  ">"         => TOKEN_STRATEGY,
  ")"         => TOKEN_STRATEGY
}

Instance Method Summary collapse

Constructor Details

#initialize(buffer, objects = nil) ⇒ Parser

Create a new parser around a PDF::Reader::Buffer object

buffer - a PDF::Reader::Buffer object that contains PDF data objects - a PDF::Reader::ObjectHash object that can return objects from the PDF file



65
66
67
68
# File 'lib/pdf/reader/parser.rb', line 65

def initialize(buffer, objects=nil)
  @buffer = buffer
  @objects  = objects
end

Instance Method Details

#object(id, gen) ⇒ Object

Reads an entire PDF object from the buffer and returns it as a Ruby String. If the object is a content stream, returns both the stream and the dictionary that describes it

id - the object ID to return gen - the object revision number to return



98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
# File 'lib/pdf/reader/parser.rb', line 98

def object(id, gen)
  idCheck = parse_token

  # Sometimes the xref table is corrupt and points to an offset slightly too early in the file.
  # check the next token, maybe we can find the start of the object we're looking for
  if idCheck != id
    Error.assert_equal(parse_token, id)
  end
  Error.assert_equal(parse_token, gen)
  Error.str_assert(parse_token, "obj")

  obj = parse_token
  post_obj = parse_token

  if obj.is_a?(Hash) && post_obj == "stream"
    stream(obj)
  else
    obj
  end
end

#parse_token(operators = {}) ⇒ Object

Reads the next token from the underlying buffer and convets it to an appropriate object

operators - a hash of supported operators to read from the underlying buffer.



74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
# File 'lib/pdf/reader/parser.rb', line 74

def parse_token(operators={})
  token = @buffer.token

  if STRATEGIES.has_key? token
    STRATEGIES[token].call(self, token)
  elsif token.is_a? PDF::Reader::Reference
    token
  elsif operators.has_key? token
    Token.new(token)
  elsif token.frozen?
    token
  elsif token =~ /\d*\.\d/
    token.to_f
  else
    token.to_i
  end
end