Class: ForkingRegexpLexer

Inherits:
Object show all
Defined in:
lib/rpdf2txt-rockit/token.rb

Overview

NOTE: If more performance is needed it might be good to use one char of lookahead to group tokens and reduce the number of tokens that needs to be tested.

Direct Known Subclasses

ReferencingRegexpLexer

Constant Summary collapse

@@eof_token =
EofToken.new

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(tokens, eofToken = nil) ⇒ ForkingRegexpLexer

Returns a new instance of ForkingRegexpLexer.



264
265
266
267
268
# File 'lib/rpdf2txt-rockit/token.rb', line 264

def initialize(tokens, eofToken = nil)
  @tokens = tokens
  @eof_token = tokens.detect {|t| t.kind_of?(EofToken)}
  @tokens.delete_if {|t| t.kind_of?(EofToken)}
end

Instance Attribute Details

#eof_tokenObject (readonly)

Returns the value of attribute eof_token.



261
262
263
# File 'lib/rpdf2txt-rockit/token.rb', line 261

def eof_token
  @eof_token
end

#positionObject

Returns the value of attribute position.



260
261
262
# File 'lib/rpdf2txt-rockit/token.rb', line 260

def position
  @position
end

#scannerObject (readonly)

Returns the value of attribute scanner.



261
262
263
# File 'lib/rpdf2txt-rockit/token.rb', line 261

def scanner
  @scanner
end

#tokensObject (readonly)

Returns the value of attribute tokens.



261
262
263
# File 'lib/rpdf2txt-rockit/token.rb', line 261

def tokens
  @tokens
end

Instance Method Details

#init(aString) ⇒ Object



272
273
274
275
276
277
278
279
280
281
282
283
# File 'lib/rpdf2txt-rockit/token.rb', line 272

def init(aString)
  @position, @current_tokens = LexerPosition.new, nil
  @scanner = StringScanner.new(aString)

  # We speed things up by only having one lexer at each position. Since there
  # are typically only a small number of positions we use a BoundedLruCache
  # of size 20 to keep them in. The cache throws out oldest (least recently
  # used, NOTE! accessed in the cache not used in the parser) lexer when
  # new one inserted. This is to keep the memory consumption down.
  #
  @lexer_cache = BoundedLruCache.new(20)
end

#inspectObject



313
314
315
# File 'lib/rpdf2txt-rockit/token.rb', line 313

def inspect
  "Lexer(#{@position.inspect})"
end

#peekObject

Refactor! Complex interactions when tokens are skipped since the next_lexer update “our” scanner. Find cleaner way of expressing this!



287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
# File 'lib/rpdf2txt-rockit/token.rb', line 287

def peek
  return @current_tokens if @current_tokens
  scanner.pointer = @position.char_position
  @current_tokens = Array.new
  tokens.each do |token|
    if (match = scanner.check(token.regexp))
	if token.skip
 # Token to be skipped => return tokens matching after the skipped one
 @current_tokens.concat next_lexer(match).peek
 scanner.pointer = @position.char_position
	else
 @current_tokens.push LexerToken.new(match, token, 
			      next_lexer(match), @position)
	end
    end
  end
  if @current_tokens.length == 0
    @string_length = scanner.string.length unless @string_length
    if @position.char_position >= @string_length
	@current_tokens.push LexerToken.new(nil, eof_token || @@eof_token, 
			    nil, @position) 
    end
  end
  return @current_tokens
end