Class: ForkingRegexpLexer

Inherits:

Object

Object
ForkingRegexpLexer

show all

Defined in:: lib/rpdf2txt-rockit/token.rb

Overview

NOTE: If more performance is needed it might be good to use one char of lookahead to group tokens and reduce the number of tokens that needs to be tested.

Direct Known Subclasses

ReferencingRegexpLexer

Constant Summary collapse

@@eof_token =

EofToken.new

Instance Attribute Summary collapse

#eof_token ⇒ Object readonly

Returns the value of attribute eof_token.
#position ⇒ Object

Returns the value of attribute position.
#scanner ⇒ Object readonly

Returns the value of attribute scanner.
#tokens ⇒ Object readonly

Returns the value of attribute tokens.

Instance Method Summary collapse

#init(aString) ⇒ Object
#initialize(tokens, eofToken = nil) ⇒ ForkingRegexpLexer constructor

A new instance of ForkingRegexpLexer.
#inspect ⇒ Object
#peek ⇒ Object

Refactor! Complex interactions when tokens are skipped since the next_lexer update “our” scanner.

Constructor Details

#initialize(tokens, eofToken = nil) ⇒ `ForkingRegexpLexer`

Returns a new instance of ForkingRegexpLexer.

# File 'lib/rpdf2txt-rockit/token.rb', line 264

def initialize(tokens, eofToken = nil)
  @tokens = tokens
  @eof_token = tokens.detect {|t| t.kind_of?(EofToken)}
  @tokens.delete_if {|t| t.kind_of?(EofToken)}
end

Instance Attribute Details

#eof_token ⇒ `Object` (readonly)

Returns the value of attribute eof_token.



261
262
263

# File 'lib/rpdf2txt-rockit/token.rb', line 261

def eof_token
  @eof_token
end

#position ⇒ `Object`

Returns the value of attribute position.



260
261
262

# File 'lib/rpdf2txt-rockit/token.rb', line 260

def position
  @position
end

#scanner ⇒ `Object` (readonly)

Returns the value of attribute scanner.



261
262
263

# File 'lib/rpdf2txt-rockit/token.rb', line 261

def scanner
  @scanner
end

#tokens ⇒ `Object` (readonly)

Returns the value of attribute tokens.



261
262
263

# File 'lib/rpdf2txt-rockit/token.rb', line 261

def tokens
  @tokens
end

Instance Method Details

#init(aString) ⇒ `Object`

# File 'lib/rpdf2txt-rockit/token.rb', line 272

def init(aString)
  @position, @current_tokens = LexerPosition.new, nil
  @scanner = StringScanner.new(aString)

  # We speed things up by only having one lexer at each position. Since there
  # are typically only a small number of positions we use a BoundedLruCache
  # of size 20 to keep them in. The cache throws out oldest (least recently
  # used, NOTE! accessed in the cache not used in the parser) lexer when
  # new one inserted. This is to keep the memory consumption down.
  #
  @lexer_cache = BoundedLruCache.new(20)
end

#inspect ⇒ `Object`



313
314
315

# File 'lib/rpdf2txt-rockit/token.rb', line 313

def inspect
  "Lexer(#{@position.inspect})"
end

#peek ⇒ `Object`

Refactor! Complex interactions when tokens are skipped since the next_lexer update “our” scanner. Find cleaner way of expressing this!

# File 'lib/rpdf2txt-rockit/token.rb', line 287

def peek
  return @current_tokens if @current_tokens
  scanner.pointer = @position.char_position
  @current_tokens = Array.new
  tokens.each do |token|
    if (match = scanner.check(token.regexp))
  if token.skip
 # Token to be skipped => return tokens matching after the skipped one
 @current_tokens.concat next_lexer(match).peek
 scanner.pointer = @position.char_position
  else
 @current_tokens.push LexerToken.new(match, token, 
            next_lexer(match), @position)
  end
    end
  end
  if @current_tokens.length == 0
    @string_length = scanner.string.length unless @string_length
    if @position.char_position >= @string_length
  @current_tokens.push LexerToken.new(nil, eof_token || @@eof_token, 
          nil, @position) 
    end
  end
  return @current_tokens
end

Class: ForkingRegexpLexer

Overview

Direct Known Subclasses

Constant Summary collapse

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(tokens, eofToken = nil) ⇒ ForkingRegexpLexer

Instance Attribute Details

#eof_token ⇒ Object (readonly)

#position ⇒ Object

#scanner ⇒ Object (readonly)

#tokens ⇒ Object (readonly)