Class: PDF::Reader::Buffer
- Inherits:
-
Object
- Object
- PDF::Reader::Buffer
- Defined in:
- lib/pdf/reader/buffer.rb
Overview
An internal PDF::Reader class that mediates access to the underlying PDF File or IO Stream
Instance Method Summary collapse
-
#eof? ⇒ Boolean
returns true if the underlying IO object is at end and the internal buffer is empty.
-
#find_first_xref_offset ⇒ Object
The Xref table in a PDF file acts as an aid for finding the location of various objects in the file.
- #head(chars, with_strip = true) ⇒ Object
-
#initialize(io) ⇒ Buffer
constructor
Creates a new buffer around the specified IO object.
- #pos ⇒ Object
-
#raw ⇒ Object
return the internal buffer used by this class when reading from the IO stream.
-
#read(length) ⇒ Object
reads the requested number of bytes from the underlying IO stream.
-
#ready_token(with_strip = true, skip_blanks = true) ⇒ Object
PDF files are processed by tokenising the content into a series of objects and commands.
-
#seek(offset) ⇒ Object
Seek to the requested byte in the IO stream.
-
#token ⇒ Object
return the next token from the underlying IO stream.
Constructor Details
#initialize(io) ⇒ Buffer
Creates a new buffer around the specified IO object
32 33 34 35 |
# File 'lib/pdf/reader/buffer.rb', line 32 def initialize (io) @io = io @buffer = nil end |
Instance Method Details
#eof? ⇒ Boolean
returns true if the underlying IO object is at end and the internal buffer is empty
61 62 63 64 65 66 67 |
# File 'lib/pdf/reader/buffer.rb', line 61 def eof? if @buffer @buffer.empty? && @io.eof? else @io.eof? end end |
#find_first_xref_offset ⇒ Object
The Xref table in a PDF file acts as an aid for finding the location of various objects in the file. This method attempts to locate the byte offset of the xref table in the underlying IO stream.
117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
# File 'lib/pdf/reader/buffer.rb', line 117 def find_first_xref_offset @io.seek(-1024, IO::SEEK_END) rescue seek(0) data = @io.read(1024) # the PDF 1.7 spec (section #3.4) says that EOL markers can be either \r, \n, or both. # To ensure we find the xref offset correctly, change all possible options to a # standard format data = data.gsub("\r\n","\n").gsub("\n\r","\n").gsub("\r","\n") lines = data.split(/\n/).reverse eof_index = nil lines.each_with_index do |line, index| if line =~ /^%%EOF\r?$/ eof_index = index break end end raise MalformedPDFError, "PDF does not contain EOF marker" if eof_index.nil? raise MalformedPDFError, "PDF EOF marker does not follow offset" if eof_index >= lines.size-1 lines[eof_index+1].to_i end |
#head(chars, with_strip = true) ⇒ Object
102 103 104 105 106 107 |
# File 'lib/pdf/reader/buffer.rb', line 102 def head (chars, with_strip=true) val = @buffer[0, chars] @buffer = @buffer[chars .. -1] || "" @buffer.lstrip! if with_strip val end |
#pos ⇒ Object
69 70 71 |
# File 'lib/pdf/reader/buffer.rb', line 69 def pos @io.pos end |
#raw ⇒ Object
return the internal buffer used by this class when reading from the IO stream.
110 111 112 |
# File 'lib/pdf/reader/buffer.rb', line 110 def raw @buffer end |
#read(length) ⇒ Object
reads the requested number of bytes from the underlying IO stream.
length should be a positive integer.
47 48 49 50 51 52 53 54 55 56 57 |
# File 'lib/pdf/reader/buffer.rb', line 47 def read (length) out = "" if @buffer and !@buffer.empty? out << head(length) length -= out.length end out << @io.read(length) if length > 0 out end |
#ready_token(with_strip = true, skip_blanks = true) ⇒ Object
PDF files are processed by tokenising the content into a series of objects and commands. This prepares the buffer for use by rerading the next line of tokens into memory.
75 76 77 78 79 80 81 82 83 |
# File 'lib/pdf/reader/buffer.rb', line 75 def ready_token (with_strip=true, skip_blanks=true) while @buffer.nil? or @buffer.empty? @buffer = @io.readline @buffer.sub!(/%.*$/, '') @buffer.chomp! @buffer.lstrip! if with_strip break unless skip_blanks end end |
#seek(offset) ⇒ Object
Seek to the requested byte in the IO stream.
38 39 40 41 42 |
# File 'lib/pdf/reader/buffer.rb', line 38 def seek (offset) @io.seek(offset, IO::SEEK_SET) @buffer = nil self end |
#token ⇒ Object
return the next token from the underlying IO stream
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
# File 'lib/pdf/reader/buffer.rb', line 86 def token ready_token i = @buffer.index(/[\[\]()<>{}\s\/]/) || @buffer.size token_chars = if i == 0 and @buffer[i,2] == "<<" then 2 elsif i == 0 and @buffer[i,2] == ">>" then 2 elsif i == 0 then 1 else i end strip_space = !(i == 0 and @buffer[0,1] == '(') head(token_chars, strip_space) end |