Class: PDF::Reader::Buffer

Inherits:
Object
  • Object
show all
Defined in:
lib/pdf/reader/buffer.rb

Overview

An internal PDF::Reader class that mediates access to the underlying PDF File or IO Stream

Instance Method Summary collapse

Constructor Details

#initialize(io) ⇒ Buffer

Creates a new buffer around the specified IO object



32
33
34
35
# File 'lib/pdf/reader/buffer.rb', line 32

def initialize (io)
  @io = io
  @buffer = nil
end

Instance Method Details

#eof?Boolean

returns true if the underlying IO object is at end and the internal buffer is empty

Returns:

  • (Boolean)


88
89
90
91
92
93
94
95
# File 'lib/pdf/reader/buffer.rb', line 88

def eof?
  ready_token
  if @buffer
    @buffer.empty? && @io.eof?
  else
    @io.eof?
  end
end

#find_first_xref_offsetObject

The Xref table in a PDF file acts as an aid for finding the location of various objects in the file. This method attempts to locate the byte offset of the xref table in the underlying IO stream.

Raises:



159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
# File 'lib/pdf/reader/buffer.rb', line 159

def find_first_xref_offset
  @io.seek(-1024, IO::SEEK_END) rescue seek(0)
  data = @io.read(1024)

  # the PDF 1.7 spec (section #3.4) says that EOL markers can be either \r, \n, or both.
  # To ensure we find the xref offset correctly, change all possible options to a
  # standard format
  data = data.gsub("\r\n","\n").gsub("\n\r","\n").gsub("\r","\n")
  lines = data.split(/\n/).reverse

  eof_index = nil

  lines.each_with_index do |line, index|
    if line =~ /^%%EOF\r?$/
      eof_index = index
      break
    end
  end

  raise MalformedPDFError, "PDF does not contain EOF marker" if eof_index.nil?
  raise MalformedPDFError, "PDF EOF marker does not follow offset" if eof_index >= lines.size-1
  lines[eof_index+1].to_i
end

#head(chars, with_strip = true) ⇒ Object



144
145
146
147
148
149
# File 'lib/pdf/reader/buffer.rb', line 144

def head (chars, with_strip=true)
  val = @buffer[0, chars]
  @buffer = @buffer[chars .. -1] || ""
  @buffer.lstrip! if with_strip
  val
end

#posObject



97
98
99
# File 'lib/pdf/reader/buffer.rb', line 97

def pos
  @io.pos
end

#pos_without_bufObject



101
102
103
# File 'lib/pdf/reader/buffer.rb', line 101

def pos_without_buf
  @io.pos - @buffer.to_s.size
end

#rawObject

return the internal buffer used by this class when reading from the IO stream.



152
153
154
# File 'lib/pdf/reader/buffer.rb', line 152

def raw
  @buffer
end

#read(length) ⇒ Object

reads the requested number of bytes from the underlying IO stream.

length should be a positive integer.



47
48
49
50
51
52
53
54
55
56
57
# File 'lib/pdf/reader/buffer.rb', line 47

def read (length)
  out = ""

  if @buffer and !@buffer.empty?
    out << head(length)
    length -= out.length
  end

  out << @io.read(length) if length > 0
  out
end

#read_until(bytes) ⇒ Object

Reads from the buffer until the specified token is found, or the end of the buffer

bytes - the bytes to search for.



62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
# File 'lib/pdf/reader/buffer.rb', line 62

def read_until(bytes)
  out = ""
  size = bytes.size

  if @buffer && !@buffer.empty?
    if @buffer.include?(bytes)
      offset = @buffer.index(bytes) + size
      return head(offset)
    else
      out << head(@buffer.size)
    end
  end

  loop do
    out << @io.read(1)
    if out[-1 * size,size].eql?(bytes)
      out = out[0, out.size - size]
      seek(pos - size)
      break
    end
  end
  out
end

#ready_token(with_strip = true, skip_blanks = true) ⇒ Object

PDF files are processed by tokenising the content into a series of objects and commands. This prepares the buffer for use by reading the next line of tokens into memory.



107
108
109
110
111
112
113
114
115
116
# File 'lib/pdf/reader/buffer.rb', line 107

def ready_token (with_strip=true, skip_blanks=true)
  while (@buffer.nil? or @buffer.empty?) && !@io.eof?
    @buffer = @io.readline
    @buffer.force_encoding("BINARY") if @buffer.respond_to?(:force_encoding)
    #@buffer.sub!(/%.*$/, '') if strip_comments
    @buffer.chomp!
    break unless skip_blanks
  end
  @buffer.lstrip! if with_strip
end

#seek(offset) ⇒ Object

Seek to the requested byte in the IO stream.



38
39
40
41
42
# File 'lib/pdf/reader/buffer.rb', line 38

def seek (offset)
  @io.seek(offset, IO::SEEK_SET)
  @buffer = nil
  self
end

#tokenObject

return the next token from the underlying IO stream



119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
# File 'lib/pdf/reader/buffer.rb', line 119

def token
  ready_token

  i = @buffer.index(/[\[\]()<>{}\s\/]/) || @buffer.size

  token_chars =
    if i == 0 and @buffer[i,2] == "<<"    then 2
    elsif i == 0 and @buffer[i,2] == ">>" then 2
    elsif i == 0                          then 1
    else                                    i
    end

  strip_space = !(i == 0 and @buffer[0,1] == '(')
  tok = head(token_chars, strip_space)

  if tok == ""
    nil
  elsif tok[0,1] == "%"
    @buffer = ""
    token
  else
    tok
  end
end