Class: ANTLR3::Recognizer

Inherits:

Object

Object
ANTLR3::Recognizer

show all

Extended by:: ClassMacros

Includes:: Constants, Error, TokenFactory

Defined in:: lib/antlr3/recognizers.rb

Overview

Recognizer

As the base class of all ANTLR-generated recognizers, Recognizer provides much of the shared functionality and structure used in the recognition process. For all effective purposes, the class and its immediate subclasses Lexer, Parser, and TreeParser are abstract classes. They can be instantiated, but they’re pretty useless on their own. Instead, to make useful code, you write an ANTLR grammar and ANTLR will generate classes which inherit from one of the recognizer base classes, providing the implementation of the grammar rules itself. this group of classes to implement necessary tasks. Recognizer defines methods related to:

token and character matching
prediction and recognition strategy
recovering from errors
reporting errors
memoization
simple rule tracing and debugging

Direct Known Subclasses

AST::TreeParser, Lexer, Parser

Constant Summary

Constants included from Constants

Constants::BUILT_IN_TOKEN_NAMES, Constants::DEFAULT, Constants::DOWN, Constants::EOF, Constants::EOF_TOKEN, Constants::EOR_TOKEN_TYPE, Constants::HIDDEN, Constants::INVALID, Constants::INVALID_NODE, Constants::INVALID_TOKEN, Constants::MEMO_RULE_FAILED, Constants::MEMO_RULE_UNKNOWN, Constants::MIN_TOKEN_TYPE, Constants::SKIP_TOKEN, Constants::UP

Class Attribute Summary collapse

.antlr_version ⇒ Object readonly

Returns the value of attribute antlr_version.
.antlr_version_string ⇒ Object readonly

Returns the value of attribute antlr_version_string.
.default_rule ⇒ Object

Returns the value of attribute default_rule.
.grammar_file_name ⇒ Object readonly

Returns the value of attribute grammar_file_name.
.grammar_home ⇒ Object readonly

Returns the value of attribute grammar_home.
.library_version_string ⇒ Object readonly

Returns the value of attribute library_version_string.
.token_scheme ⇒ Object

Returns the value of attribute token_scheme.

Instance Attribute Summary collapse

#input ⇒ Object

Returns the value of attribute input.
#state ⇒ Object readonly

Returns the value of attribute state.

Attributes included from TokenFactory

#token_class

Class Method Summary collapse

.debug? ⇒ Boolean
.define_return_scope(*members) ⇒ Object

this method is used to generate return-value structures for rules with multiple return values.
.generated_using(grammar_file, antlr_version, library_version = nil) ⇒ Object

generated recognizer code uses this method to stamp the code with the name of the grammar file and the current version of ANTLR being used to generate the code.
.generic_return_scope ⇒ Object

sets up and returns the generic rule return scope for a recognizer.
.imported_grammars ⇒ Object
.imports(*grammar_names) ⇒ Object
.master ⇒ Object
.master_grammars ⇒ Object
.masters(*grammar_names) ⇒ Object
.profile? ⇒ Boolean
.return_scope_members ⇒ Object

used as a hook to add additional default members to default return value structures For example, all AST-building parsers override this method to add an extra :tree field to all rule return structures.
.rules ⇒ Object
.Scope(*declarations, &body) ⇒ Object
.token_class ⇒ Object

Instance Method Summary collapse

#already_parsed_rule?(rule) ⇒ Boolean
#antlr_version ⇒ Object
#antlr_version_string ⇒ Object
#backtrack ⇒ Object
#backtracking? ⇒ Boolean

Returns true if the recognizer is currently in a decision for which backtracking has been enabled.
#backtracking_level ⇒ Object (also: #backtracking)
#backtracking_level=(n) ⇒ Object (also: #backtracking=)
#begin_resync ⇒ Object

overridable hook method that is executed at the start of the resyncing procedure in recover.
#combine_follows(exact) ⇒ Object
#compute_context_sensitive_rule_follow ⇒ Object

Compute the context-sensitive FOLLOW set for current rule.
#compute_error_recovery_set ⇒ Object

(The following explanation has been lifted directly from the source code documentation of the ANTLR Java runtime library).
#consume_until(types) ⇒ Object

Consume input symbols until one matches a type within types.
#current_symbol ⇒ Object

Match needs to return the current input symbol, which gets put into the label for the associated token ref; e.g., x=ID.
#display_recognition_error(e = $!) ⇒ Object

error reporting hook for presenting the information The default implementation builds appropriate error message text using error_header and error_message, and calls emit_error_message to write the error message out to some source.
#each_delegate ⇒ Object
#emit_error_message(message) ⇒ Object

Write the error report data out to some source.
#end_resync ⇒ Object

overridable hook method that is after the resyncing procedure has completed.
#error_header(e = $!) ⇒ Object

used to add a tag to the error message that indicates the location of the input stream when the error occurred.
#error_message(e = $!) ⇒ Object

used to construct an appropriate error message based on the specific type of error and the error’s attributes.
#grammar_file_name ⇒ Object
#initialize(options = {}) ⇒ Recognizer constructor

Create a new recognizer.
#match(type, follow) ⇒ Object

Attempt to match the current input symbol the token type specified by type.
#match_any ⇒ Object

match anything – i.e.
#memoize(rule, start_index, success) ⇒ Object
#mismatch_is_missing_token?(follow) ⇒ Boolean
#mismatch_is_unwanted_token?(type) ⇒ Boolean
#missing_symbol(error, expected_token_type, follow) ⇒ Object

Conjure up a missing token during error recovery.
#number_of_syntax_errors ⇒ Object

factor out what to do upon token mismatch so tree parsers can behave differently.
#recover(error = $!) ⇒ Object

Error Recovery ########################################.
#recover_from_mismatched_element(e, follow) ⇒ Object
#recover_from_mismatched_set(e, follow) ⇒ Object
#recover_from_mismatched_token(type, follow) ⇒ Object
#report_error(e = $!) ⇒ Object

When a recognition error occurs, this method is the main hook for carrying out the error reporting process.
#reset ⇒ Object

Resets the recognizer’s state data to initial values.
#resync ⇒ Object
#rule_memoization(rule, start_index) ⇒ Object
#syntactic_predicate?(name) ⇒ Boolean
#syntax_errors? ⇒ Boolean
#token_error_display(token) ⇒ Object

formats a token object appropriately for inspection within an error message.
#trace_in(rule_name, rule_index, input_symbol) ⇒ Object
#trace_out(rule_name, rule_index, input_symbol) ⇒ Object

Methods included from TokenFactory

#create_token

Methods included from Error

EarlyExit, FailedPredicate, MismatchedNotSet, MismatchedRange, MismatchedSet, MismatchedToken, MismatchedTreeNode, MissingToken, NoViableAlternative, RewriteCardinalityError, RewriteEarlyExit, RewriteEmptyStream, UnwantedToken

Constructor Details

#initialize(options = {}) ⇒ `Recognizer`

Create a new recognizer. The constructor simply ensures that all recognizers are initialized with a shared state object. See the main recognizer subclasses for more specific information about creating recognizer objects like lexers and parsers.

# File 'lib/antlr3/recognizers.rb', line 360

def initialize( options = {} )
  @state  = options[ :state ] || RecognizerSharedState.new
  @error_output = options.fetch( :error_output, $stderr )
  defined?( @input ) or @input = nil
  initialize_dfas
end

Class Attribute Details

.antlr_version ⇒ `Object` (readonly)

Returns the value of attribute antlr_version.



207
208
209

# File 'lib/antlr3/recognizers.rb', line 207

def antlr_version
  @antlr_version
end

.antlr_version_string ⇒ `Object` (readonly)

Returns the value of attribute antlr_version_string.



207
208
209

# File 'lib/antlr3/recognizers.rb', line 207

def antlr_version_string
  @antlr_version_string
end

.default_rule ⇒ `Object`

Returns the value of attribute default_rule.



213
214
215

# File 'lib/antlr3/recognizers.rb', line 213

def default_rule
  @default_rule
end

.grammar_file_name ⇒ `Object` (readonly)

Returns the value of attribute grammar_file_name.



207
208
209

# File 'lib/antlr3/recognizers.rb', line 207

def grammar_file_name
  @grammar_file_name
end

.grammar_home ⇒ `Object` (readonly)

Returns the value of attribute grammar_home.



207
208
209

# File 'lib/antlr3/recognizers.rb', line 207

def grammar_home
  @grammar_home
end

.library_version_string ⇒ `Object` (readonly)

Returns the value of attribute library_version_string.



207
208
209

# File 'lib/antlr3/recognizers.rb', line 207

def library_version_string
  @library_version_string
end

.token_scheme ⇒ `Object`

Returns the value of attribute token_scheme.



213
214
215

# File 'lib/antlr3/recognizers.rb', line 213

def token_scheme
  @token_scheme
end

Instance Attribute Details

#input ⇒ `Object`

Returns the value of attribute input.



344
345
346

# File 'lib/antlr3/recognizers.rb', line 344

def input
  @input
end

#state ⇒ `Object` (readonly)

Returns the value of attribute state.



345
346
347

# File 'lib/antlr3/recognizers.rb', line 345

def state
  @state
end

Class Method Details

.debug? ⇒ `Boolean`

Returns:

(Boolean)



306
307
308

# File 'lib/antlr3/recognizers.rb', line 306

def debug?
  return false
end

.define_return_scope(*members) ⇒ `Object`

this method is used to generate return-value structures for rules with multiple return values. To avoid generating a special class for ever rule in AST parsers and such (where most rules have the same default set of return values), each recognizer gets a default return value structure assigned to the constant Return. Rules which don’t require additional custom members will have a rule-return name constant that just points to the generic return value.

# File 'lib/antlr3/recognizers.rb', line 241

def define_return_scope( *members )
  if members.empty? then generic_return_scope
  else
    members += return_scope_members
    Struct.new( *members )
  end
end

.generated_using(grammar_file, antlr_version, library_version = nil) ⇒ `Object`

generated recognizer code uses this method to stamp the code with the name of the grammar file and the current version of ANTLR being used to generate the code

# File 'lib/antlr3/recognizers.rb', line 219

def generated_using( grammar_file, antlr_version, library_version = nil )
  @grammar_file_name = grammar_file.freeze
  @antlr_version_string = antlr_version.freeze
  @library_version = Util.parse_version( library_version )
  if @antlr_version_string =~ /^(\d+)\.(\d+)(?:\.(\d+)(?:b(\d+))?)?(.*)$/
    @antlr_version = [ $1, $2, $3, $4 ].map! { |str| str.to_i }
    timestamp = $5.strip
    #@antlr_release_time = $5.empty? ? nil : Time.parse($5)
  else
    raise "bad version string: %p" % version_string
  end
end

.generic_return_scope ⇒ `Object`

sets up and returns the generic rule return scope for a recognizer

# File 'lib/antlr3/recognizers.rb', line 260

def generic_return_scope
  @generic_return_scope ||= begin
    struct = Struct.new( *return_scope_members )
    const_set( :Return, struct )
  end
end

.imported_grammars ⇒ `Object`



267
268
269

# File 'lib/antlr3/recognizers.rb', line 267

def imported_grammars
  @imported_grammars ||= Set.new
end

.imports(*grammar_names) ⇒ `Object`

# File 'lib/antlr3/recognizers.rb', line 289

def imports( *grammar_names )
  for grammar in grammar_names
    imported_grammars.add?( grammar.to_sym ) and
      attr_reader( Util.snake_case( grammar ) )
  end
  return imported_grammars
end

.master ⇒ `Object`



275
276
277

# File 'lib/antlr3/recognizers.rb', line 275

def master
  master_grammars.last
end

.master_grammars ⇒ `Object`



271
272
273

# File 'lib/antlr3/recognizers.rb', line 271

def master_grammars
  @master_grammars ||= []
end

.masters(*grammar_names) ⇒ `Object`

# File 'lib/antlr3/recognizers.rb', line 279

def masters( *grammar_names )
  for grammar in grammar_names
    unless master_grammars.include?( grammar )
      master_grammars << grammar
      attr_reader( Util.snake_case( grammar ) )
    end
  end
end

.profile? ⇒ `Boolean`

Returns:

(Boolean)



310
311
312

# File 'lib/antlr3/recognizers.rb', line 310

def profile?
  return false
end

.return_scope_members ⇒ `Object`

used as a hook to add additional default members to default return value structures For example, all AST-building parsers override this method to add an extra :tree field to all rule return structures.



254
255
256

# File 'lib/antlr3/recognizers.rb', line 254

def return_scope_members
  [ :start, :stop ]
end

.rules ⇒ `Object`



298
299
300

# File 'lib/antlr3/recognizers.rb', line 298

def rules
  self::RULE_METHODS.dup rescue []
end

.Scope(*declarations, &body) ⇒ `Object`



314
315
316

# File 'lib/antlr3/recognizers.rb', line 314

def Scope( *declarations, &body )
  Scope.new( *declarations, &body )
end

.token_class ⇒ `Object`

# File 'lib/antlr3/recognizers.rb', line 318

def token_class
  @token_class ||= begin
    self::Token            rescue
    superclass.token_class rescue
    ANTLR3::CommonToken
  end
end

Instance Method Details

#already_parsed_rule?(rule) ⇒ `Boolean`

Returns:

(Boolean)

# File 'lib/antlr3/recognizers.rb', line 866

def already_parsed_rule?( rule )
  stop_index = rule_memoization( rule, @input.index )
  case stop_index
  when MEMO_RULE_UNKNOWN then return false
  when MEMO_RULE_FAILED
    raise BacktrackingFailed
  else
    @input.seek( stop_index + 1 )
  end
  return true
end

#antlr_version ⇒ `Object`



336
337
338

# File 'lib/antlr3/recognizers.rb', line 336

def antlr_version
  self.class.antlr_version
end

#antlr_version_string ⇒ `Object`



340
341
342

# File 'lib/antlr3/recognizers.rb', line 340

def antlr_version_string
  self.class.antlr_version_string
end

#backtrack ⇒ `Object`

# File 'lib/antlr3/recognizers.rb', line 839

def backtrack
  @state.backtracking += 1
  start = @input.mark
  success =
    begin yield
    rescue BacktrackingFailed then false
    else true
    end
  return success
ensure
  @input.rewind( start )
  @state.backtracking -= 1
end

#backtracking? ⇒ `Boolean`

Returns true if the recognizer is currently in a decision for which backtracking has been enabled

Returns:

(Boolean)



827
828
829

# File 'lib/antlr3/recognizers.rb', line 827

def backtracking?
  @state.backtracking > 0
end

#backtracking_level ⇒ `Object` Also known as: backtracking



831
832
833

# File 'lib/antlr3/recognizers.rb', line 831

def backtracking_level
  @state.backtracking
end

#backtracking_level=(n) ⇒ `Object` Also known as: backtracking=



835
836
837

# File 'lib/antlr3/recognizers.rb', line 835

def backtracking_level=( n )
  @state.backtracking = n
end

#begin_resync ⇒ `Object`

overridable hook method that is executed at the start of the resyncing procedure in recover

by default, it does nothing



519
520
521

# File 'lib/antlr3/recognizers.rb', line 519

def begin_resync
  # do nothing
end

#combine_follows(exact) ⇒ `Object`

# File 'lib/antlr3/recognizers.rb', line 779

def combine_follows( exact )
  follow_set = Set.new
  @state.following.each_with_index.reverse_each do |local_follow_set, index|
    follow_set |= local_follow_set
    if exact
      if local_follow_set.include?( EOR_TOKEN_TYPE )
        follow_set.delete( EOR_TOKEN_TYPE ) if index > 0
      else
        break
      end
    end
  end
  return follow_set
end

#compute_context_sensitive_rule_follow ⇒ `Object`

Compute the context-sensitive FOLLOW set for current rule. This is set of token types that can follow a specific rule reference given a specific call chain. You get the set of viable tokens that can possibly come next (look depth 1) given the current call chain. Contrast this with the definition of plain FOLLOW for rule r:

FOLLOW(r)={x | S=>*alpha r beta in G and x in FIRST(beta)}

where x in T* and alpha, beta in V*; T is set of terminals and V is the set of terminals and nonterminals. In other words, FOLLOW® is the set of all tokens that can possibly follow references to r in any sentential form (context). At runtime, however, we know precisely which context applies as we have the call chain. We may compute the exact (rather than covering superset) set of following tokens.

For example, consider grammar:

stat : ID '=' expr ';'      // FOLLOW(stat)=={EOF}
     | "return" expr '.'
     ;
expr : atom ('+' atom)* ;   // FOLLOW(expr)=={';','.',')'}
atom : INT                  // FOLLOW(atom)=={'+',')',';','.'}
     | '(' expr ')'
     ;

The FOLLOW sets are all inclusive whereas context-sensitive FOLLOW sets are precisely what could follow a rule reference. For input input “i=(3);”, here is the derivation:

stat => ID '=' expr ';'
     => ID '=' atom ('+' atom)* ';'
     => ID '=' '(' expr ')' ('+' atom)* ';'
     => ID '=' '(' atom ')' ('+' atom)* ';'
     => ID '=' '(' INT ')' ('+' atom)* ';'
     => ID '=' '(' INT ')' ';'

At the “3” token, you’d have a call chain of

stat -> expr -> atom -> expr -> atom

What can follow that specific nested ref to atom? Exactly ‘)’ as you can see by looking at the derivation of this specific input. Contrast this with the FOLLOW(atom)=ANTLR3::Recognizer.‘+’,‘)’,‘;’,‘‘+’,‘)’,‘;’,‘.’.

You want the exact viable token set when recovering from a token mismatch. Upon token mismatch, if LA(1) is member of the viable next token set, then you know there is most likely a missing token in the input stream. “Insert” one by just not throwing an exception.



775
776
777

# File 'lib/antlr3/recognizers.rb', line 775

def compute_context_sensitive_rule_follow
  combine_follows true
end

#compute_error_recovery_set ⇒ `Object`

(The following explanation has been lifted directly from the

source code documentation of the ANTLR Java runtime library)

Compute the error recovery set for the current rule. During rule invocation, the parser pushes the set of tokens that can follow that rule reference on the stack; this amounts to computing FIRST of what follows the rule reference in the enclosing rule. This local follow set only includes tokens from within the rule; i.e., the FIRST computation done by ANTLR stops at the end of a rule.

EXAMPLE

When you find a “no viable alt exception”, the input is not consistent with any of the alternatives for rule r. The best thing to do is to consume tokens until you see something that can legally follow a call to r or any rule that called r. You don’t want the exact set of viable next tokens because the input might just be missing a token–you might consume the rest of the input looking for one of the missing tokens.

Consider grammar:

a : '[' b ']'
  | '(' b ')'
  ;
b : c '^' INT ;
c : ID
  | INT
  ;

At each rule invocation, the set of tokens that could follow that rule is pushed on a stack. Here are the various “local” follow sets:

FOLLOW( b1_in_a ) = FIRST( ']' ) = ']'
FOLLOW( b2_in_a ) = FIRST( ')' ) = ')'
FOLLOW( c_in_b ) = FIRST( '^' ) = '^'

Upon erroneous input “[]”, the call chain is

a -> b -> c

and, hence, the follow context stack is:

depth  local follow set     after call to rule
  0         \<EOF>                   a (from main( ) )
  1          ']'                     b
  3          '^'                     c

Notice that ')' is not included, because b would have to have been called from a different context in rule a for ‘)’ to be included.

For error recovery, we cannot consider FOLLOW© (context-sensitive or otherwise). We need the combined set of all context-sensitive FOLLOW sets–the set of all tokens that could follow any reference in the call chain. We need to resync to one of those tokens. Note that FOLLOW©=‘^’ and if we resync’d to that token, we’d consume until EOF. We need to sync to context-sensitive FOLLOWs for a, b, and c: ‘]’,‘^’. In this case, for input “[]”, LA(1) is in this set so we would not consume anything and after printing an error rule c would return normally. It would not find the required ‘^’ though. At this point, it gets a mismatched token error and throws an exception (since LA(1) is not in the viable following token set). The rule exception handler tries to recover, but finds the same recovery set and doesn’t consume anything. Rule b exits normally returning to rule a. Now it finds the ‘]’ (and with the successful match exits errorRecovery mode).

So, you cna see that the parser walks up call chain looking for the token that was a member of the recovery set.

Errors are not generated in errorRecovery mode.

ANTLR’s error recovery mechanism is based upon original ideas:

“Algorithms + Data Structures = Programs” by Niklaus Wirth

and

“A note on error recovery in recursive descent parsers”: portal.acm.org/citation.cfm?id=947902.947905

Later, Josef Grosch had some good ideas:

“Efficient and Comfortable Error Recovery in Recursive Descent Parsers”: www.cocolab.com/products/cocktail/doca4.ps/ell.ps.zip

Like Grosch I implemented local FOLLOW sets that are combined at run-time upon error to avoid overhead during parsing.



623
624
625

# File 'lib/antlr3/recognizers.rb', line 623

def compute_error_recovery_set
  combine_follows( false )
end

#consume_until(types) ⇒ `Object`

Consume input symbols until one matches a type within types

types can be a single symbol type or a set of symbol types

# File 'lib/antlr3/recognizers.rb', line 813

def consume_until( types )
  types.is_a?( Set ) or types = Set[ *types ]
  type = @input.peek
  until type == EOF or types.include?( type )
    @input.consume
    type = @input.peek
  end
  return( type )
end

#current_symbol ⇒ `Object`

Match needs to return the current input symbol, which gets put into the label for the associated token ref; e.g., x=ID. Token and tree parsers need to return different objects. Rather than test for input stream type or change the IntStream interface, I use a simple method to ask the recognizer to tell me what the current input symbol is.

This is ignored for lexers.



804
805
806

# File 'lib/antlr3/recognizers.rb', line 804

def current_symbol
  @input.look
end

#display_recognition_error(e = $!) ⇒ `Object`

error reporting hook for presenting the information The default implementation builds appropriate error message text using error_header and error_message, and calls emit_error_message to write the error message out to some source

# File 'lib/antlr3/recognizers.rb', line 424

def display_recognition_error( e = $! )
  header = error_header( e )
  message = error_message( e )
  emit_error_message( "#{ header } #{ message }" )
end

#each_delegate ⇒ `Object`

# File 'lib/antlr3/recognizers.rb', line 347

def each_delegate
  block_given? or return enum_for( __method__ )
  for grammar in self.class.imported_grammars
    del = __send__( Util.snake_case( grammar ) ) and
      yield( del )
  end
end

#emit_error_message(message) ⇒ `Object`

Write the error report data out to some source. By default, the error message is written to $stderr



491
492
493

# File 'lib/antlr3/recognizers.rb', line 491

def emit_error_message( message )
  @error_output.puts( message ) if @error_output
end

#end_resync ⇒ `Object`

overridable hook method that is after the resyncing procedure has completed

by default, it does nothing



526
527
528

# File 'lib/antlr3/recognizers.rb', line 526

def end_resync
  # do nothing
end

#error_header(e = $!) ⇒ `Object`

used to add a tag to the error message that indicates the location of the input stream when the error occurred



466
467
468

# File 'lib/antlr3/recognizers.rb', line 466

def error_header( e = $! )
  e.location
end

#error_message(e = $!) ⇒ `Object`

used to construct an appropriate error message based on the specific type of error and the error’s attributes

# File 'lib/antlr3/recognizers.rb', line 433

def error_message( e = $! )
  case e
  when UnwantedToken
    token_name = token_name( e.expecting )
    "extraneous input #{ token_error_display( e.unexpected_token ) } expecting #{ token_name }"
  when MissingToken
    token_name = token_name( e.expecting )
    "missing #{ token_name } at #{ token_error_display( e.symbol ) }"
  when MismatchedToken
    token_name = token_name( e.expecting )
    "mismatched input #{ token_error_display( e.symbol ) } expecting #{ token_name }"
  when MismatchedTreeNode
    token_name = token_name( e.expecting )
    "mismatched tree node: #{ e.symbol } expecting #{ token_name }"
  when NoViableAlternative
    "no viable alternative at input " << token_error_display( e.symbol )
  when MismatchedSet
    "mismatched input %s expecting set %s" %
      [ token_error_display( e.symbol ), e.expecting.inspect ]
  when MismatchedNotSet
    "mismatched input %s expecting set %s" %
      [ token_error_display( e.symbol ), e.expecting.inspect ]
  when FailedPredicate
    "rule %s failed predicate: { %s }?" % [ e.rule_name, e.predicate_text ]
  else e.message
  end
end

#grammar_file_name ⇒ `Object`



332
333
334

# File 'lib/antlr3/recognizers.rb', line 332

def grammar_file_name
  self.class.grammar_file_name
end

#match(type, follow) ⇒ `Object`

Attempt to match the current input symbol the token type specified by type. If the symbol matches the type, consume the current symbol and return its value. If the symbol doesn’t match, attempt to use the follow-set data provided by follow to recover from the mismatched token.

Raises:

(BacktrackingFailed)

# File 'lib/antlr3/recognizers.rb', line 385

def match( type, follow )
  matched_symbol = current_symbol
  if @input.peek == type
    @input.consume
    @state.error_recovery = false
    return matched_symbol
  end
  raise( BacktrackingFailed ) if @state.backtracking > 0
  return recover_from_mismatched_token( type, follow )
end

#match_any ⇒ `Object`

match anything – i.e. wildcard match. Simply consume the current symbol from the input stream.

# File 'lib/antlr3/recognizers.rb', line 398

def match_any
  @state.error_recovery = false
  @input.consume
end

#memoize(rule, start_index, success) ⇒ `Object`

# File 'lib/antlr3/recognizers.rb', line 878

def memoize( rule, start_index, success )
  stop_index = success ? @input.index - 1 : MEMO_RULE_FAILED
  memo = @state.rule_memory[ rule ] and memo[ start_index ] = stop_index
end

#mismatch_is_missing_token?(follow) ⇒ `Boolean`

Returns:

(Boolean)

# File 'lib/antlr3/recognizers.rb', line 692

def mismatch_is_missing_token?( follow )
  follow.nil? and return false
  if follow.include?( EOR_TOKEN_TYPE )
    viable_tokens = compute_context_sensitive_rule_follow
    follow = follow | viable_tokens
    
    follow.delete( EOR_TOKEN_TYPE ) unless @state.following.empty?
  end
  if follow.include?( @input.peek ) or follow.include?( EOR_TOKEN_TYPE )
    return true
  end
  return false
end

#mismatch_is_unwanted_token?(type) ⇒ `Boolean`

Returns:

(Boolean)



688
689
690

# File 'lib/antlr3/recognizers.rb', line 688

def mismatch_is_unwanted_token?( type )
  @input.peek( 2 ) == type
end

#missing_symbol(error, expected_token_type, follow) ⇒ `Object`

Conjure up a missing token during error recovery.

The recognizer attempts to recover from single missing symbols. But, actions might refer to that missing symbol. For example, x=ID f($x);. The action clearly assumes that there has been an identifier matched previously and that $x points at that token. If that token is missing, but the next token in the stream is what we want we assume that this token is missing and we keep going. Because we have to return some token to replace the missing token, we have to conjure one up. This method gives the user control over the tokens returned for missing tokens. Mostly, you will want to create something special for identifier tokens. For literals such as ‘{’ and ‘,’, the default action in the parser or tree parser works. It simply creates a CommonToken of the appropriate type. The text will be the token. If you change what tokens must be created by the lexer, override this method to create the appropriate tokens.



684
685
686

# File 'lib/antlr3/recognizers.rb', line 684

def missing_symbol( error, expected_token_type, follow )
  return nil
end

#number_of_syntax_errors ⇒ `Object`

factor out what to do upon token mismatch so tree parsers can behave differently.

override this method in your parser to do things like bailing out after the first error
just raise the exception instead of calling the recovery method.



718
719
720

# File 'lib/antlr3/recognizers.rb', line 718

def number_of_syntax_errors
  @state.syntax_errors
end

#recover(error = $!) ⇒ `Object`

Error Recovery ########################################

# File 'lib/antlr3/recognizers.rb', line 499

def recover( error = $! )
  @state.last_error_index == @input.index and @input.consume
  @state.last_error_index = @input.index
  
  follow_set = compute_error_recovery_set
  
  resync { consume_until( follow_set ) }
end

#recover_from_mismatched_element(e, follow) ⇒ `Object`

# File 'lib/antlr3/recognizers.rb', line 653

def recover_from_mismatched_element( e, follow )
  follow.nil? and return false
  if follow.include?( EOR_TOKEN_TYPE )
    viable_tokens = compute_context_sensitive_rule_follow
    follow = ( follow | viable_tokens ) - Set[ EOR_TOKEN_TYPE ]
  end
  if follow.include?( @input.peek )
    report_error( e )
    return true
  end
  return false
end

#recover_from_mismatched_set(e, follow) ⇒ `Object`

# File 'lib/antlr3/recognizers.rb', line 645

def recover_from_mismatched_set( e, follow )
  if mismatch_is_missing_token?( follow )
    report_error( e )
    return missing_symbol( e, INVALID_TOKEN_TYPE, follow )
  end
  raise e
end

#recover_from_mismatched_token(type, follow) ⇒ `Object`

# File 'lib/antlr3/recognizers.rb', line 627

def recover_from_mismatched_token( type, follow )
  if mismatch_is_unwanted_token?( type )
    err = UnwantedToken( type )
    resync { @input.consume }
    report_error( err )
    
    return @input.consume
  end
  
  if mismatch_is_missing_token?( follow )
    inserted = missing_symbol( nil, type, follow )
    report_error( MissingToken( type, inserted ) )
    return inserted
  end
  
  raise MismatchedToken( type )
end

#report_error(e = $!) ⇒ `Object`

When a recognition error occurs, this method is the main hook for carrying out the error reporting process. The default implementation calls display_recognition_error to display the error info on $stderr.

# File 'lib/antlr3/recognizers.rb', line 412

def report_error( e = $! )
  @state.error_recovery and return
  @state.syntax_errors += 1
  @state.error_recovery = true
  display_recognition_error( e )
end

#reset ⇒ `Object`

Resets the recognizer’s state data to initial values. As a result, all error tracking and error recovery data accumulated in the current state will be cleared. It will also attempt to reset the input stream via input.reset, but it ignores any errors received from doing so. Thus the input stream is not guarenteed to be rewound to its initial position

# File 'lib/antlr3/recognizers.rb', line 374

def reset
  @state and @state.reset!
  @input and @input.reset rescue nil
end

#resync ⇒ `Object`

# File 'lib/antlr3/recognizers.rb', line 508

def resync
  begin_resync
  return( yield )
ensure
  end_resync
end

#rule_memoization(rule, start_index) ⇒ `Object`

# File 'lib/antlr3/recognizers.rb', line 860

def rule_memoization( rule, start_index )
  @state.rule_memory.fetch( rule ) do
    @state.rule_memory[ rule ] = Hash.new( MEMO_RULE_UNKNOWN )
  end[ start_index ]
end

#syntactic_predicate?(name) ⇒ `Boolean`

Returns:

(Boolean)



853
854
855

# File 'lib/antlr3/recognizers.rb', line 853

def syntactic_predicate?( name )
  backtrack { send name }
end

#syntax_errors? ⇒ `Boolean`

Returns:

(Boolean)



706
707
708

# File 'lib/antlr3/recognizers.rb', line 706

def syntax_errors?
  ( error_count = @state.syntax_errors ) > 0 and return( error_count )
end

#token_error_display(token) ⇒ `Object`

formats a token object appropriately for inspection within an error message

# File 'lib/antlr3/recognizers.rb', line 474

def token_error_display( token )
  unless text = token.text || ( token.source_text rescue nil )
    text =
      case
      when token.type == EOF then '<EOF>'
      when name = token_name( token.type ) rescue nil then "<#{ name }>"
      when token.respond_to?( :name ) then "<#{ token.name }>"
      else "<#{ token.type }>"
      end
  end
  return text.inspect
end

#trace_in(rule_name, rule_index, input_symbol) ⇒ `Object`

# File 'lib/antlr3/recognizers.rb', line 883

def trace_in( rule_name, rule_index, input_symbol )
  @error_output.printf( "--> enter %s on %s", rule_name, input_symbol )
  @state.backtracking > 0 and @error_output.printf( 
    " (in backtracking mode: depth = %s)", @state.backtracking
  )
  @error_output.print( "\n" )
end

#trace_out(rule_name, rule_index, input_symbol) ⇒ `Object`

# File 'lib/antlr3/recognizers.rb', line 891

def trace_out( rule_name, rule_index, input_symbol )
  @error_output.printf( "<-- exit %s on %s", rule_name, input_symbol )
  @state.backtracking > 0 and @error_output.printf( 
    " (in backtracking mode: depth = %s)", @state.backtracking
  )
  @error_output.print( "\n" )
end

Class: ANTLR3::Recognizer

Overview

Recognizer

Direct Known Subclasses

Constant Summary

Constants included from Constants

Class Attribute Summary collapse

Instance Attribute Summary collapse

Attributes included from TokenFactory

Class Method Summary collapse

Instance Method Summary collapse

Methods included from TokenFactory

Methods included from Error

Constructor Details

#initialize(options = {}) ⇒ Recognizer

Class Attribute Details

.antlr_version ⇒ Object (readonly)

.antlr_version_string ⇒ Object (readonly)

.default_rule ⇒ Object

.grammar_file_name ⇒ Object (readonly)

.grammar_home ⇒ Object (readonly)

.library_version_string ⇒ Object (readonly)

.token_scheme ⇒ Object

Instance Attribute Details

#input ⇒ Object

#state ⇒ Object (readonly)

Class Method Details

.debug? ⇒ Boolean

.define_return_scope(*members) ⇒ Object

.generated_using(grammar_file, antlr_version, library_version = nil) ⇒ Object

.generic_return_scope ⇒ Object

.imported_grammars ⇒ Object

.imports(*grammar_names) ⇒ Object

.master ⇒ Object

.master_grammars ⇒ Object

.masters(*grammar_names) ⇒ Object

.profile? ⇒ Boolean

.return_scope_members ⇒ Object

.rules ⇒ Object

.Scope(*declarations, &body) ⇒ Object

.token_class ⇒ Object

Instance Method Details

#already_parsed_rule?(rule) ⇒ Boolean

#antlr_version ⇒ Object

#antlr_version_string ⇒ Object

#backtrack ⇒ Object

#backtracking? ⇒ Boolean

#backtracking_level ⇒ Object Also known as: backtracking

#backtracking_level=(n) ⇒ Object Also known as: backtracking=

#begin_resync ⇒ Object

#combine_follows(exact) ⇒ Object

#compute_context_sensitive_rule_follow ⇒ Object

#compute_error_recovery_set ⇒ Object

#consume_until(types) ⇒ Object

#current_symbol ⇒ Object

#display_recognition_error(e = $!) ⇒ Object

#each_delegate ⇒ Object

#emit_error_message(message) ⇒ Object

#end_resync ⇒ Object

#error_header(e = $!) ⇒ Object

#error_message(e = $!) ⇒ Object

#grammar_file_name ⇒ Object

#match(type, follow) ⇒ Object

#match_any ⇒ Object

#memoize(rule, start_index, success) ⇒ Object

#mismatch_is_missing_token?(follow) ⇒ Boolean

#mismatch_is_unwanted_token?(type) ⇒ Boolean

#missing_symbol(error, expected_token_type, follow) ⇒ Object

#number_of_syntax_errors ⇒ Object

#recover(error = $!) ⇒ Object

#recover_from_mismatched_element(e, follow) ⇒ Object

#recover_from_mismatched_set(e, follow) ⇒ Object

#recover_from_mismatched_token(type, follow) ⇒ Object

#report_error(e = $!) ⇒ Object

#reset ⇒ Object

#resync ⇒ Object

#rule_memoization(rule, start_index) ⇒ Object

#syntactic_predicate?(name) ⇒ Boolean

#syntax_errors? ⇒ Boolean

#token_error_display(token) ⇒ Object

#initialize(options = {}) ⇒ `Recognizer`

.antlr_version ⇒ `Object` (readonly)

.antlr_version_string ⇒ `Object` (readonly)

.default_rule ⇒ `Object`

.grammar_file_name ⇒ `Object` (readonly)

.grammar_home ⇒ `Object` (readonly)

.library_version_string ⇒ `Object` (readonly)

.token_scheme ⇒ `Object`

#input ⇒ `Object`

#state ⇒ `Object` (readonly)

.debug? ⇒ `Boolean`

.define_return_scope(*members) ⇒ `Object`

.generated_using(grammar_file, antlr_version, library_version = nil) ⇒ `Object`

.generic_return_scope ⇒ `Object`

.imported_grammars ⇒ `Object`

.imports(*grammar_names) ⇒ `Object`

.master ⇒ `Object`

.master_grammars ⇒ `Object`

.masters(*grammar_names) ⇒ `Object`

.profile? ⇒ `Boolean`

.return_scope_members ⇒ `Object`

.rules ⇒ `Object`

.Scope(*declarations, &body) ⇒ `Object`

.token_class ⇒ `Object`

#already_parsed_rule?(rule) ⇒ `Boolean`

#antlr_version ⇒ `Object`

#antlr_version_string ⇒ `Object`

#backtrack ⇒ `Object`

#backtracking? ⇒ `Boolean`

#backtracking_level ⇒ `Object` Also known as: backtracking

#backtracking_level=(n) ⇒ `Object` Also known as: backtracking=

#begin_resync ⇒ `Object`

#combine_follows(exact) ⇒ `Object`

#compute_context_sensitive_rule_follow ⇒ `Object`

#compute_error_recovery_set ⇒ `Object`

#consume_until(types) ⇒ `Object`

#current_symbol ⇒ `Object`

#display_recognition_error(e = $!) ⇒ `Object`

#each_delegate ⇒ `Object`

#emit_error_message(message) ⇒ `Object`

#end_resync ⇒ `Object`

#error_header(e = $!) ⇒ `Object`

#error_message(e = $!) ⇒ `Object`

#grammar_file_name ⇒ `Object`

#match(type, follow) ⇒ `Object`

#match_any ⇒ `Object`

#memoize(rule, start_index, success) ⇒ `Object`

#mismatch_is_missing_token?(follow) ⇒ `Boolean`

#mismatch_is_unwanted_token?(type) ⇒ `Boolean`

#missing_symbol(error, expected_token_type, follow) ⇒ `Object`

#number_of_syntax_errors ⇒ `Object`

#recover(error = $!) ⇒ `Object`

#recover_from_mismatched_element(e, follow) ⇒ `Object`

#recover_from_mismatched_set(e, follow) ⇒ `Object`

#recover_from_mismatched_token(type, follow) ⇒ `Object`

#report_error(e = $!) ⇒ `Object`

#reset ⇒ `Object`

#resync ⇒ `Object`

#rule_memoization(rule, start_index) ⇒ `Object`

#syntactic_predicate?(name) ⇒ `Boolean`

#syntax_errors? ⇒ `Boolean`

#token_error_display(token) ⇒ `Object`

#trace_in(rule_name, rule_index, input_symbol) ⇒ `Object`

#trace_out(rule_name, rule_index, input_symbol) ⇒ `Object`