Class: PDF::Reader::Buffer
- Inherits:
-
Object
- Object
- PDF::Reader::Buffer
- Defined in:
- lib/pdf/reader/buffer.rb
Overview
A string tokeniser that recognises PDF grammar. When passed an IO stream or a string, repeated calls to token() will return the next token from the source.
This is very low level, and getting the raw tokens is not very useful in itself.
This will usually be used in conjunction with PDF:Reader::Parser, which converts the raw tokens into objects we can work with (strings, ints, arrays, etc)
Instance Attribute Summary collapse
-
#pos ⇒ Object
readonly
Returns the value of attribute pos.
Instance Method Summary collapse
-
#empty? ⇒ Boolean
return true if there are no more tokens left.
-
#find_first_xref_offset ⇒ Object
return the byte offset where the first XRef table in th source can be found.
-
#initialize(io, opts = {}) ⇒ Buffer
constructor
Creates a new buffer.
-
#read(bytes, opts = {}) ⇒ Object
return raw bytes from the underlying IO stream.
-
#read_until(needle) ⇒ Object
return raw bytes from the underlying IO stream.
-
#token ⇒ Object
return the next token from the source.
Constructor Details
#initialize(io, opts = {}) ⇒ Buffer
Creates a new buffer.
Params:
io - an IO stream or string with the raw data to tokenise
options:
:seek - a byte offset to seek to before starting to tokenise
52 53 54 55 56 57 58 59 |
# File 'lib/pdf/reader/buffer.rb', line 52 def initialize (io, opts = {}) @io = io @tokens = [] @options = opts @io.seek(opts[:seek]) if opts[:seek] @pos = @io.pos end |
Instance Attribute Details
#pos ⇒ Object (readonly)
Returns the value of attribute pos.
40 41 42 |
# File 'lib/pdf/reader/buffer.rb', line 40 def pos @pos end |
Instance Method Details
#empty? ⇒ Boolean
return true if there are no more tokens left
63 64 65 66 67 |
# File 'lib/pdf/reader/buffer.rb', line 63 def empty? prepare_tokens if @tokens.size < 3 @tokens.empty? end |
#find_first_xref_offset ⇒ Object
return the byte offset where the first XRef table in th source can be found.
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 |
# File 'lib/pdf/reader/buffer.rb', line 139 def find_first_xref_offset @io.seek(-1024, IO::SEEK_END) rescue @io.seek(0) data = @io.read(1024) # the PDF 1.7 spec (section #3.4) says that EOL markers can be either \r, \n, or both. # To ensure we find the xref offset correctly, change all possible options to a # standard format data = data.gsub("\r\n","\n").gsub("\n\r","\n").gsub("\r","\n") lines = data.split(/\n/).reverse eof_index = nil lines.each_with_index do |line, index| if line =~ /^%%EOF\r?$/ eof_index = index break end end raise MalformedPDFError, "PDF does not contain EOF marker" if eof_index.nil? raise MalformedPDFError, "PDF EOF marker does not follow offset" if eof_index >= lines.size-1 lines[eof_index+1].to_i end |
#read(bytes, opts = {}) ⇒ Object
return raw bytes from the underlying IO stream.
bytes - the number of bytes to read
options:
:skip_eol - if true, the IO stream is advanced past any LF or CR
bytes before it reads any data. This is to handle
content streams, which have a CRLF or LF after the stream
token.
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
# File 'lib/pdf/reader/buffer.rb', line 80 def read(bytes, opts = {}) reset_pos if opts[:skip_eol] done = false while !done chr = @io.read(1) if chr.nil? return nil elsif chr != "\n" && chr != "\r" @io.seek(-1, IO::SEEK_CUR) done = true end end end bytes = @io.read(bytes) save_pos bytes end |
#read_until(needle) ⇒ Object
return raw bytes from the underlying IO stream. All bytes up to the first occurrence of needle will be returned. The match (if any) is not returned. The IO stream cursor is left on the first byte of the match.
needle - a string to search the IO stream for
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
# File 'lib/pdf/reader/buffer.rb', line 107 def read_until(needle) reset_pos out = "" size = needle.size while out[size * -1, size] != needle && !@io.eof? out << @io.read(1) end if out[size * -1, size] == needle out = out[0, out.size - size] @io.seek(size * -1, IO::SEEK_CUR) end save_pos out end |
#token ⇒ Object
return the next token from the source. Returns a string if a token is found, nil if there are no tokens left.
128 129 130 131 132 133 134 135 |
# File 'lib/pdf/reader/buffer.rb', line 128 def token reset_pos prepare_tokens if @tokens.size < 3 merge_indirect_reference prepare_tokens if @tokens.size < 3 @tokens.shift end |