Class: ForkingRegexpLexer
Overview
NOTE: If more performance is needed it might be good to use one char of lookahead to group tokens and reduce the number of tokens that needs to be tested.
Direct Known Subclasses
Constant Summary collapse
Instance Attribute Summary collapse
-
#eof_token ⇒ Object
readonly
Returns the value of attribute eof_token.
-
#position ⇒ Object
Returns the value of attribute position.
-
#scanner ⇒ Object
readonly
Returns the value of attribute scanner.
-
#tokens ⇒ Object
readonly
Returns the value of attribute tokens.
Instance Method Summary collapse
- #init(aString) ⇒ Object
-
#initialize(tokens, eofToken = nil) ⇒ ForkingRegexpLexer
constructor
A new instance of ForkingRegexpLexer.
- #inspect ⇒ Object
-
#peek ⇒ Object
Refactor! Complex interactions when tokens are skipped since the next_lexer update “our” scanner.
Constructor Details
#initialize(tokens, eofToken = nil) ⇒ ForkingRegexpLexer
Returns a new instance of ForkingRegexpLexer.
264 265 266 267 268 |
# File 'lib/rpdf2txt-rockit/token.rb', line 264 def initialize(tokens, eofToken = nil) @tokens = tokens @eof_token = tokens.detect {|t| t.kind_of?(EofToken)} @tokens.delete_if {|t| t.kind_of?(EofToken)} end |
Instance Attribute Details
#eof_token ⇒ Object (readonly)
Returns the value of attribute eof_token.
261 262 263 |
# File 'lib/rpdf2txt-rockit/token.rb', line 261 def eof_token @eof_token end |
#position ⇒ Object
Returns the value of attribute position.
260 261 262 |
# File 'lib/rpdf2txt-rockit/token.rb', line 260 def position @position end |
#scanner ⇒ Object (readonly)
Returns the value of attribute scanner.
261 262 263 |
# File 'lib/rpdf2txt-rockit/token.rb', line 261 def scanner @scanner end |
#tokens ⇒ Object (readonly)
Returns the value of attribute tokens.
261 262 263 |
# File 'lib/rpdf2txt-rockit/token.rb', line 261 def tokens @tokens end |
Instance Method Details
#init(aString) ⇒ Object
272 273 274 275 276 277 278 279 280 281 282 283 |
# File 'lib/rpdf2txt-rockit/token.rb', line 272 def init(aString) @position, @current_tokens = LexerPosition.new, nil @scanner = StringScanner.new(aString) # We speed things up by only having one lexer at each position. Since there # are typically only a small number of positions we use a BoundedLruCache # of size 20 to keep them in. The cache throws out oldest (least recently # used, NOTE! accessed in the cache not used in the parser) lexer when # new one inserted. This is to keep the memory consumption down. # @lexer_cache = BoundedLruCache.new(20) end |
#inspect ⇒ Object
313 314 315 |
# File 'lib/rpdf2txt-rockit/token.rb', line 313 def inspect "Lexer(#{@position.inspect})" end |
#peek ⇒ Object
Refactor! Complex interactions when tokens are skipped since the next_lexer update “our” scanner. Find cleaner way of expressing this!
287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 |
# File 'lib/rpdf2txt-rockit/token.rb', line 287 def peek return @current_tokens if @current_tokens scanner.pointer = @position.char_position @current_tokens = Array.new tokens.each do |token| if (match = scanner.check(token.regexp)) if token.skip # Token to be skipped => return tokens matching after the skipped one @current_tokens.concat next_lexer(match).peek scanner.pointer = @position.char_position else @current_tokens.push LexerToken.new(match, token, next_lexer(match), @position) end end end if @current_tokens.length == 0 @string_length = scanner.string.length unless @string_length if @position.char_position >= @string_length @current_tokens.push LexerToken.new(nil, eof_token || @@eof_token, nil, @position) end end return @current_tokens end |