Class: PDF::Reader::Buffer
- Inherits:
-
Object
- Object
- PDF::Reader::Buffer
- Defined in:
- lib/pdf/reader/buffer.rb
Overview
An internal PDF::Reader class that mediates access to the underlying PDF File or IO Stream
Instance Method Summary collapse
-
#eof? ⇒ Boolean
returns true if the underlying IO object is at end and the internal buffer is empty.
-
#find_first_xref_offset ⇒ Object
The Xref table in a PDF file acts as an aid for finding the location of various objects in the file.
- #head(chars, with_strip = true) ⇒ Object
-
#initialize(io) ⇒ Buffer
constructor
Creates a new buffer around the specified IO object.
- #pos ⇒ Object
- #pos_without_buf ⇒ Object
-
#raw ⇒ Object
return the internal buffer used by this class when reading from the IO stream.
-
#read(length) ⇒ Object
reads the requested number of bytes from the underlying IO stream.
-
#read_until(bytes) ⇒ Object
Reads from the buffer until the specified token is found, or the end of the buffer.
-
#ready_token(with_strip = true, skip_blanks = true) ⇒ Object
PDF files are processed by tokenising the content into a series of objects and commands.
-
#seek(offset) ⇒ Object
Seek to the requested byte in the IO stream.
-
#token ⇒ Object
return the next token from the underlying IO stream.
Constructor Details
#initialize(io) ⇒ Buffer
Creates a new buffer around the specified IO object
32 33 34 35 |
# File 'lib/pdf/reader/buffer.rb', line 32 def initialize (io) @io = io @buffer = nil end |
Instance Method Details
#eof? ⇒ Boolean
returns true if the underlying IO object is at end and the internal buffer is empty
88 89 90 91 92 93 94 95 |
# File 'lib/pdf/reader/buffer.rb', line 88 def eof? ready_token if @buffer @buffer.empty? && @io.eof? else @io.eof? end end |
#find_first_xref_offset ⇒ Object
The Xref table in a PDF file acts as an aid for finding the location of various objects in the file. This method attempts to locate the byte offset of the xref table in the underlying IO stream.
159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 |
# File 'lib/pdf/reader/buffer.rb', line 159 def find_first_xref_offset @io.seek(-1024, IO::SEEK_END) rescue seek(0) data = @io.read(1024) # the PDF 1.7 spec (section #3.4) says that EOL markers can be either \r, \n, or both. # To ensure we find the xref offset correctly, change all possible options to a # standard format data = data.gsub("\r\n","\n").gsub("\n\r","\n").gsub("\r","\n") lines = data.split(/\n/).reverse eof_index = nil lines.each_with_index do |line, index| if line =~ /^%%EOF\r?$/ eof_index = index break end end raise MalformedPDFError, "PDF does not contain EOF marker" if eof_index.nil? raise MalformedPDFError, "PDF EOF marker does not follow offset" if eof_index >= lines.size-1 lines[eof_index+1].to_i end |
#head(chars, with_strip = true) ⇒ Object
144 145 146 147 148 149 |
# File 'lib/pdf/reader/buffer.rb', line 144 def head (chars, with_strip=true) val = @buffer[0, chars] @buffer = @buffer[chars .. -1] || "" @buffer.lstrip! if with_strip val end |
#pos ⇒ Object
97 98 99 |
# File 'lib/pdf/reader/buffer.rb', line 97 def pos @io.pos end |
#pos_without_buf ⇒ Object
101 102 103 |
# File 'lib/pdf/reader/buffer.rb', line 101 def pos_without_buf @io.pos - @buffer.to_s.size end |
#raw ⇒ Object
return the internal buffer used by this class when reading from the IO stream.
152 153 154 |
# File 'lib/pdf/reader/buffer.rb', line 152 def raw @buffer end |
#read(length) ⇒ Object
reads the requested number of bytes from the underlying IO stream.
length should be a positive integer.
47 48 49 50 51 52 53 54 55 56 57 |
# File 'lib/pdf/reader/buffer.rb', line 47 def read (length) out = "" if @buffer and !@buffer.empty? out << head(length) length -= out.length end out << @io.read(length) if length > 0 out end |
#read_until(bytes) ⇒ Object
Reads from the buffer until the specified token is found, or the end of the buffer
bytes - the bytes to search for.
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
# File 'lib/pdf/reader/buffer.rb', line 62 def read_until(bytes) out = "" size = bytes.size if @buffer && !@buffer.empty? if @buffer.include?(bytes) offset = @buffer.index(bytes) + size return head(offset) else out << head(@buffer.size) end end loop do out << @io.read(1) if out[-1 * size,size].eql?(bytes) out = out[0, out.size - size] seek(pos - size) break end end out end |
#ready_token(with_strip = true, skip_blanks = true) ⇒ Object
PDF files are processed by tokenising the content into a series of objects and commands. This prepares the buffer for use by reading the next line of tokens into memory.
107 108 109 110 111 112 113 114 115 116 |
# File 'lib/pdf/reader/buffer.rb', line 107 def ready_token (with_strip=true, skip_blanks=true) while (@buffer.nil? or @buffer.empty?) && !@io.eof? @buffer = @io.readline @buffer.force_encoding("BINARY") if @buffer.respond_to?(:force_encoding) #@buffer.sub!(/%.*$/, '') if strip_comments @buffer.chomp! break unless skip_blanks end @buffer.lstrip! if with_strip end |
#seek(offset) ⇒ Object
Seek to the requested byte in the IO stream.
38 39 40 41 42 |
# File 'lib/pdf/reader/buffer.rb', line 38 def seek (offset) @io.seek(offset, IO::SEEK_SET) @buffer = nil self end |
#token ⇒ Object
return the next token from the underlying IO stream
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 |
# File 'lib/pdf/reader/buffer.rb', line 119 def token ready_token i = @buffer.index(/[\[\]()<>{}\s\/]/) || @buffer.size token_chars = if i == 0 and @buffer[i,2] == "<<" then 2 elsif i == 0 and @buffer[i,2] == ">>" then 2 elsif i == 0 then 1 else i end strip_space = !(i == 0 and @buffer[0,1] == '(') tok = head(token_chars, strip_space) if tok == "" nil elsif tok[0,1] == "%" @buffer = "" token else tok end end |