Class: PDF::Reader::Buffer

Inherits:
Object
  • Object
show all
Defined in:
lib/pdf/reader/buffer.rb

Overview

An internal PDF::Reader class that mediates access to the underlying PDF File or IO Stream

Instance Method Summary collapse

Constructor Details

#initialize(io) ⇒ Buffer

Creates a new buffer around the specified IO object



32
33
34
35
# File 'lib/pdf/reader/buffer.rb', line 32

def initialize (io)
  @io = io
  @buffer = nil
end

Instance Method Details

#eof?Boolean

returns true if the underlying IO object is at end and the internal buffer is empty

Returns:

  • (Boolean)


61
62
63
64
65
66
67
# File 'lib/pdf/reader/buffer.rb', line 61

def eof?
  if @buffer
    @buffer.empty? && @io.eof?
  else
    @io.eof?
  end
end

#find_first_xref_offsetObject

The Xref table in a PDF file acts as an aid for finding the location of various objects in the file. This method attempts to locate the byte offset of the xref table in the underlying IO stream.

Raises:



117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
# File 'lib/pdf/reader/buffer.rb', line 117

def find_first_xref_offset
  @io.seek(-1024, IO::SEEK_END) rescue seek(0)
  data = @io.read(1024)

  # the PDF 1.7 spec (section #3.4) says that EOL markers can be either \r, \n, or both.
  # To ensure we find the xref offset correctly, change all possible options to a 
  # standard format
  data = data.gsub("\r\n","\n").gsub("\n\r","\n").gsub("\r","\n")
  lines = data.split(/\n/).reverse

  eof_index = nil

  lines.each_with_index do |line, index|
    if line =~ /^%%EOF\r?$/
      eof_index = index
      break
    end
  end

  raise MalformedPDFError, "PDF does not contain EOF marker" if eof_index.nil?
  raise MalformedPDFError, "PDF EOF marker does not follow offset" if eof_index >= lines.size-1
  lines[eof_index+1].to_i
end

#head(chars, with_strip = true) ⇒ Object



102
103
104
105
106
107
# File 'lib/pdf/reader/buffer.rb', line 102

def head (chars, with_strip=true)
  val = @buffer[0, chars]
  @buffer = @buffer[chars .. -1] || ""
  @buffer.lstrip! if with_strip
  val
end

#posObject



69
70
71
# File 'lib/pdf/reader/buffer.rb', line 69

def pos
  @io.pos
end

#rawObject

return the internal buffer used by this class when reading from the IO stream.



110
111
112
# File 'lib/pdf/reader/buffer.rb', line 110

def raw
  @buffer
end

#read(length) ⇒ Object

reads the requested number of bytes from the underlying IO stream.

length should be a positive integer.



47
48
49
50
51
52
53
54
55
56
57
# File 'lib/pdf/reader/buffer.rb', line 47

def read (length)
  out = ""

  if @buffer and !@buffer.empty?
    out << head(length)
    length -= out.length
  end

  out << @io.read(length) if length > 0
  out
end

#ready_token(with_strip = true, skip_blanks = true) ⇒ Object

PDF files are processed by tokenising the content into a series of objects and commands. This prepares the buffer for use by rerading the next line of tokens into memory.



75
76
77
78
79
80
81
82
83
# File 'lib/pdf/reader/buffer.rb', line 75

def ready_token (with_strip=true, skip_blanks=true)
  while @buffer.nil? or @buffer.empty?
    @buffer = @io.readline
    @buffer.sub!(/%.*$/, '')
    @buffer.chomp!
    @buffer.lstrip! if with_strip
    break unless skip_blanks
  end
end

#seek(offset) ⇒ Object

Seek to the requested byte in the IO stream.



38
39
40
41
42
# File 'lib/pdf/reader/buffer.rb', line 38

def seek (offset)
  @io.seek(offset, IO::SEEK_SET)
  @buffer = nil
  self
end

#tokenObject

return the next token from the underlying IO stream



86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
# File 'lib/pdf/reader/buffer.rb', line 86

def token
  ready_token

  i = @buffer.index(/[\[\]()<>{}\s\/]/) || @buffer.size

  token_chars = 
    if i == 0 and @buffer[i,2] == "<<"    then 2
    elsif i == 0 and @buffer[i,2] == ">>" then 2
    elsif i == 0                          then 1
    else                                    i
    end

  strip_space = !(i == 0 and @buffer[0,1] == '(')
  head(token_chars, strip_space)
end