Class: PDF::Reader::Buffer

Inherits:

Object

Object
PDF::Reader::Buffer

show all

Defined in:: lib/pdf/reader/buffer.rb

Overview

An internal PDF::Reader class that mediates access to the underlying PDF File or IO Stream

Instance Method Summary collapse

#eof? ⇒ Boolean

returns true if the underlying IO object is at end and the internal buffer is empty.
#find_first_xref_offset ⇒ Object

The Xref table in a PDF file acts as an aid for finding the location of various objects in the file.
#head(chars, with_strip = true) ⇒ Object
#initialize(io) ⇒ Buffer constructor

Creates a new buffer around the specified IO object.
#pos ⇒ Object
#pos_without_buf ⇒ Object
#raw ⇒ Object

return the internal buffer used by this class when reading from the IO stream.
#read(length) ⇒ Object

reads the requested number of bytes from the underlying IO stream.
#read_until(bytes) ⇒ Object

Reads from the buffer until the specified token is found, or the end of the buffer.
#ready_token(with_strip = true, skip_blanks = true) ⇒ Object

PDF files are processed by tokenising the content into a series of objects and commands.
#seek(offset) ⇒ Object

Seek to the requested byte in the IO stream.
#token ⇒ Object

return the next token from the underlying IO stream.

Constructor Details

#initialize(io) ⇒ `Buffer`

Creates a new buffer around the specified IO object

# File 'lib/pdf/reader/buffer.rb', line 32

def initialize (io)
  @io = io
  @buffer = nil
end

Instance Method Details

#eof? ⇒ `Boolean`

returns true if the underlying IO object is at end and the internal buffer is empty

Returns:

(Boolean)

# File 'lib/pdf/reader/buffer.rb', line 88

def eof?
  ready_token
  if @buffer
    @buffer.empty? && @io.eof?
  else
    @io.eof?
  end
end

#find_first_xref_offset ⇒ `Object`

The Xref table in a PDF file acts as an aid for finding the location of various objects in the file. This method attempts to locate the byte offset of the xref table in the underlying IO stream.

Raises:

(MalformedPDFError)

# File 'lib/pdf/reader/buffer.rb', line 159

def find_first_xref_offset
  @io.seek(-1024, IO::SEEK_END) rescue seek(0)
  data = @io.read(1024)

  # the PDF 1.7 spec (section #3.4) says that EOL markers can be either \r, \n, or both.
  # To ensure we find the xref offset correctly, change all possible options to a
  # standard format
  data = data.gsub("\r\n","\n").gsub("\n\r","\n").gsub("\r","\n")
  lines = data.split(/\n/).reverse

  eof_index = nil

  lines.each_with_index do |line, index|
    if line =~ /^%%EOF\r?$/
      eof_index = index
      break
    end
  end

  raise MalformedPDFError, "PDF does not contain EOF marker" if eof_index.nil?
  raise MalformedPDFError, "PDF EOF marker does not follow offset" if eof_index >= lines.size-1
  lines[eof_index+1].to_i
end

#head(chars, with_strip = true) ⇒ `Object`

# File 'lib/pdf/reader/buffer.rb', line 144

def head (chars, with_strip=true)
  val = @buffer[0, chars]
  @buffer = @buffer[chars .. -1] || ""
  @buffer.lstrip! if with_strip
  val
end

#pos ⇒ `Object`



97
98
99

# File 'lib/pdf/reader/buffer.rb', line 97

def pos
  @io.pos
end

#pos_without_buf ⇒ `Object`



101
102
103

# File 'lib/pdf/reader/buffer.rb', line 101

def pos_without_buf
  @io.pos - @buffer.to_s.size
end

#raw ⇒ `Object`

return the internal buffer used by this class when reading from the IO stream.



152
153
154

# File 'lib/pdf/reader/buffer.rb', line 152

def raw
  @buffer
end

#read(length) ⇒ `Object`

reads the requested number of bytes from the underlying IO stream.

length should be a positive integer.

# File 'lib/pdf/reader/buffer.rb', line 47

def read (length)
  out = ""

  if @buffer and !@buffer.empty?
    out << head(length)
    length -= out.length
  end

  out << @io.read(length) if length > 0
  out
end

#read_until(bytes) ⇒ `Object`

Reads from the buffer until the specified token is found, or the end of the buffer

bytes - the bytes to search for.

# File 'lib/pdf/reader/buffer.rb', line 62

def read_until(bytes)
  out = ""
  size = bytes.size

  if @buffer && !@buffer.empty?
    if @buffer.include?(bytes)
      offset = @buffer.index(bytes) + size
      return head(offset)
    else
      out << head(@buffer.size)
    end
  end

  loop do
    out << @io.read(1)
    if out[-1 * size,size].eql?(bytes)
      out = out[0, out.size - size]
      seek(pos - size)
      break
    end
  end
  out
end

#ready_token(with_strip = true, skip_blanks = true) ⇒ `Object`

PDF files are processed by tokenising the content into a series of objects and commands. This prepares the buffer for use by reading the next line of tokens into memory.

# File 'lib/pdf/reader/buffer.rb', line 107

def ready_token (with_strip=true, skip_blanks=true)
  while (@buffer.nil? or @buffer.empty?) && !@io.eof?
    @buffer = @io.readline
    @buffer.force_encoding("BINARY") if @buffer.respond_to?(:force_encoding)
    #@buffer.sub!(/%.*$/, '') if strip_comments
    @buffer.chomp!
    break unless skip_blanks
  end
  @buffer.lstrip! if with_strip
end

#seek(offset) ⇒ `Object`

Seek to the requested byte in the IO stream.

# File 'lib/pdf/reader/buffer.rb', line 38

def seek (offset)
  @io.seek(offset, IO::SEEK_SET)
  @buffer = nil
  self
end

#token ⇒ `Object`

return the next token from the underlying IO stream

# File 'lib/pdf/reader/buffer.rb', line 119

def token
  ready_token

  i = @buffer.index(/[\[\]()<>{}\s\/]/) || @buffer.size

  token_chars =
    if i == 0 and @buffer[i,2] == "<<"    then 2
    elsif i == 0 and @buffer[i,2] == ">>" then 2
    elsif i == 0                          then 1
    else                                    i
    end

  strip_space = !(i == 0 and @buffer[0,1] == '(')
  tok = head(token_chars, strip_space)

  if tok == ""
    nil
  elsif tok[0,1] == "%"
    @buffer = ""
    token
  else
    tok
  end
end

Class: PDF::Reader::Buffer

Overview

Instance Method Summary collapse

Constructor Details

#initialize(io) ⇒ Buffer

Instance Method Details

#eof? ⇒ Boolean

#find_first_xref_offset ⇒ Object

#head(chars, with_strip = true) ⇒ Object

#pos ⇒ Object

#pos_without_buf ⇒ Object

#raw ⇒ Object

#read(length) ⇒ Object

#read_until(bytes) ⇒ Object

#ready_token(with_strip = true, skip_blanks = true) ⇒ Object

#seek(offset) ⇒ Object

#token ⇒ Object