Class: PDF::Reader::Buffer

Inherits:

Object

Object
PDF::Reader::Buffer

show all

Defined in:: lib/pdf/reader/buffer.rb

Overview

An internal PDF::Reader class that mediates access to the underlying PDF File or IO Stream

Instance Method Summary collapse

#eof? ⇒ Boolean

returns true if the underlying IO object is at end and the internal buffer is empty.
#find_first_xref_offset ⇒ Object

The Xref table in a PDF file acts as an aid for finding the location of various objects in the file.
#head(chars, with_strip = true) ⇒ Object
#initialize(io) ⇒ Buffer constructor

Creates a new buffer around the specified IO object.
#pos ⇒ Object
#raw ⇒ Object

return the internal buffer used by this class when reading from the IO stream.
#read(length) ⇒ Object

reads the requested number of bytes from the underlying IO stream.
#ready_token(with_strip = true, skip_blanks = true) ⇒ Object

PDF files are processed by tokenising the content into a series of objects and commands.
#seek(offset) ⇒ Object

Seek to the requested byte in the IO stream.
#token ⇒ Object

return the next token from the underlying IO stream.

Constructor Details

#initialize(io) ⇒ `Buffer`

Creates a new buffer around the specified IO object

# File 'lib/pdf/reader/buffer.rb', line 32

def initialize (io)
  @io = io
  @buffer = nil
end

Instance Method Details

#eof? ⇒ `Boolean`

returns true if the underlying IO object is at end and the internal buffer is empty

Returns:

(Boolean)

# File 'lib/pdf/reader/buffer.rb', line 61

def eof?
  if @buffer
    @buffer.empty? && @io.eof?
  else
    @io.eof?
  end
end

#find_first_xref_offset ⇒ `Object`

The Xref table in a PDF file acts as an aid for finding the location of various objects in the file. This method attempts to locate the byte offset of the xref table in the underlying IO stream.

Raises:

(MalformedPDFError)

# File 'lib/pdf/reader/buffer.rb', line 117

def find_first_xref_offset
  @io.seek(-1024, IO::SEEK_END) rescue seek(0)
  data = @io.read(1024)

  # the PDF 1.7 spec (section #3.4) says that EOL markers can be either \r, \n, or both.
  # To ensure we find the xref offset correctly, change all possible options to a 
  # standard format
  data = data.gsub("\r\n","\n").gsub("\n\r","\n").gsub("\r","\n")
  lines = data.split(/\n/).reverse

  eof_index = nil

  lines.each_with_index do |line, index|
    if line =~ /^%%EOF\r?$/
      eof_index = index
      break
    end
  end

  raise MalformedPDFError, "PDF does not contain EOF marker" if eof_index.nil?
  raise MalformedPDFError, "PDF EOF marker does not follow offset" if eof_index >= lines.size-1
  lines[eof_index+1].to_i
end

#head(chars, with_strip = true) ⇒ `Object`

# File 'lib/pdf/reader/buffer.rb', line 102

def head (chars, with_strip=true)
  val = @buffer[0, chars]
  @buffer = @buffer[chars .. -1] || ""
  @buffer.lstrip! if with_strip
  val
end

#pos ⇒ `Object`



69
70
71

# File 'lib/pdf/reader/buffer.rb', line 69

def pos
  @io.pos
end

#raw ⇒ `Object`

return the internal buffer used by this class when reading from the IO stream.



110
111
112

# File 'lib/pdf/reader/buffer.rb', line 110

def raw
  @buffer
end

#read(length) ⇒ `Object`

reads the requested number of bytes from the underlying IO stream.

length should be a positive integer.

# File 'lib/pdf/reader/buffer.rb', line 47

def read (length)
  out = ""

  if @buffer and !@buffer.empty?
    out << head(length)
    length -= out.length
  end

  out << @io.read(length) if length > 0
  out
end

#ready_token(with_strip = true, skip_blanks = true) ⇒ `Object`

PDF files are processed by tokenising the content into a series of objects and commands. This prepares the buffer for use by rerading the next line of tokens into memory.

# File 'lib/pdf/reader/buffer.rb', line 75

def ready_token (with_strip=true, skip_blanks=true)
  while @buffer.nil? or @buffer.empty?
    @buffer = @io.readline
    @buffer.sub!(/%.*$/, '')
    @buffer.chomp!
    @buffer.lstrip! if with_strip
    break unless skip_blanks
  end
end

#seek(offset) ⇒ `Object`

Seek to the requested byte in the IO stream.

# File 'lib/pdf/reader/buffer.rb', line 38

def seek (offset)
  @io.seek(offset, IO::SEEK_SET)
  @buffer = nil
  self
end

#token ⇒ `Object`

return the next token from the underlying IO stream

# File 'lib/pdf/reader/buffer.rb', line 86

def token
  ready_token

  i = @buffer.index(/[\[\]()<>{}\s\/]/) || @buffer.size

  token_chars = 
    if i == 0 and @buffer[i,2] == "<<"    then 2
    elsif i == 0 and @buffer[i,2] == ">>" then 2
    elsif i == 0                          then 1
    else                                    i
    end

  strip_space = !(i == 0 and @buffer[0,1] == '(')
  head(token_chars, strip_space)
end

Class: PDF::Reader::Buffer

Overview

Instance Method Summary collapse

Constructor Details

#initialize(io) ⇒ Buffer

Instance Method Details

#eof? ⇒ Boolean

#find_first_xref_offset ⇒ Object

#head(chars, with_strip = true) ⇒ Object

#pos ⇒ Object

#raw ⇒ Object

#read(length) ⇒ Object

#ready_token(with_strip = true, skip_blanks = true) ⇒ Object

#seek(offset) ⇒ Object

#token ⇒ Object