Class: ANTLR3::Recognizer
- Inherits:
-
Object
- Object
- ANTLR3::Recognizer
- Extended by:
- ClassMacros
- Includes:
- Constants, Error, TokenFactory
- Defined in:
- lib/antlr3/recognizers.rb
Overview
Recognizer
As the base class of all ANTLR-generated recognizers, Recognizer provides much of the shared functionality and structure used in the recognition process. For all effective purposes, the class and its immediate subclasses Lexer, Parser, and TreeParser are abstract classes. They can be instantiated, but they’re pretty useless on their own. Instead, to make useful code, you write an ANTLR grammar and ANTLR will generate classes which inherit from one of the recognizer base classes, providing the implementation of the grammar rules itself. this group of classes to implement necessary tasks. Recognizer defines methods related to:
-
token and character matching
-
prediction and recognition strategy
-
recovering from errors
-
reporting errors
-
memoization
-
simple rule tracing and debugging
Direct Known Subclasses
Constant Summary
Constants included from Constants
Constants::BUILT_IN_TOKEN_NAMES, Constants::DEFAULT, Constants::DOWN, Constants::EOF, Constants::EOF_TOKEN, Constants::EOR_TOKEN_TYPE, Constants::HIDDEN, Constants::INVALID, Constants::INVALID_NODE, Constants::INVALID_TOKEN, Constants::MEMO_RULE_FAILED, Constants::MEMO_RULE_UNKNOWN, Constants::MIN_TOKEN_TYPE, Constants::SKIP_TOKEN, Constants::UP
Class Attribute Summary collapse
-
.antlr_version ⇒ Object
readonly
Returns the value of attribute antlr_version.
-
.antlr_version_string ⇒ Object
readonly
Returns the value of attribute antlr_version_string.
-
.default_rule ⇒ Object
Returns the value of attribute default_rule.
-
.grammar_file_name ⇒ Object
readonly
Returns the value of attribute grammar_file_name.
-
.grammar_home ⇒ Object
readonly
Returns the value of attribute grammar_home.
-
.library_version_string ⇒ Object
readonly
Returns the value of attribute library_version_string.
-
.token_scheme ⇒ Object
Returns the value of attribute token_scheme.
Instance Attribute Summary collapse
-
#input ⇒ Object
Returns the value of attribute input.
-
#state ⇒ Object
readonly
Returns the value of attribute state.
Attributes included from TokenFactory
Class Method Summary collapse
- .debug? ⇒ Boolean
-
.define_return_scope(*members) ⇒ Object
this method is used to generate return-value structures for rules with multiple return values.
-
.generated_using(grammar_file, antlr_version, library_version = nil) ⇒ Object
generated recognizer code uses this method to stamp the code with the name of the grammar file and the current version of ANTLR being used to generate the code.
-
.generic_return_scope ⇒ Object
sets up and returns the generic rule return scope for a recognizer.
- .imported_grammars ⇒ Object
- .imports(*grammar_names) ⇒ Object
- .master ⇒ Object
- .master_grammars ⇒ Object
- .masters(*grammar_names) ⇒ Object
- .profile? ⇒ Boolean
-
.return_scope_members ⇒ Object
used as a hook to add additional default members to default return value structures For example, all AST-building parsers override this method to add an extra
:tree
field to all rule return structures. - .rules ⇒ Object
- .Scope(*declarations, &body) ⇒ Object
- .token_class ⇒ Object
Instance Method Summary collapse
- #already_parsed_rule?(rule) ⇒ Boolean
- #antlr_version ⇒ Object
- #antlr_version_string ⇒ Object
- #backtrack ⇒ Object
-
#backtracking? ⇒ Boolean
Returns true if the recognizer is currently in a decision for which backtracking has been enabled.
- #backtracking_level ⇒ Object (also: #backtracking)
- #backtracking_level=(n) ⇒ Object (also: #backtracking=)
-
#begin_resync ⇒ Object
overridable hook method that is executed at the start of the resyncing procedure in recover.
- #combine_follows(exact) ⇒ Object
-
#compute_context_sensitive_rule_follow ⇒ Object
Compute the context-sensitive
FOLLOW
set for current rule. -
#compute_error_recovery_set ⇒ Object
(The following explanation has been lifted directly from the source code documentation of the ANTLR Java runtime library).
-
#consume_until(types) ⇒ Object
Consume input symbols until one matches a type within types.
-
#current_symbol ⇒ Object
Match needs to return the current input symbol, which gets put into the label for the associated token ref; e.g., x=ID.
-
#display_recognition_error(e = $!) ⇒ Object
error reporting hook for presenting the information The default implementation builds appropriate error message text using
error_header
anderror_message
, and callsemit_error_message
to write the error message out to some source. - #each_delegate ⇒ Object
-
#emit_error_message(message) ⇒ Object
Write the error report data out to some source.
-
#end_resync ⇒ Object
overridable hook method that is after the resyncing procedure has completed.
-
#error_header(e = $!) ⇒ Object
used to add a tag to the error message that indicates the location of the input stream when the error occurred.
-
#error_message(e = $!) ⇒ Object
used to construct an appropriate error message based on the specific type of error and the error’s attributes.
- #grammar_file_name ⇒ Object
-
#initialize(options = {}) ⇒ Recognizer
constructor
Create a new recognizer.
-
#match(type, follow) ⇒ Object
Attempt to match the current input symbol the token type specified by
type
. -
#match_any ⇒ Object
match anything – i.e.
- #memoize(rule, start_index, success) ⇒ Object
- #mismatch_is_missing_token?(follow) ⇒ Boolean
- #mismatch_is_unwanted_token?(type) ⇒ Boolean
-
#missing_symbol(error, expected_token_type, follow) ⇒ Object
Conjure up a missing token during error recovery.
-
#number_of_syntax_errors ⇒ Object
factor out what to do upon token mismatch so tree parsers can behave differently.
-
#recover(error = $!) ⇒ Object
Error Recovery ########################################.
- #recover_from_mismatched_element(e, follow) ⇒ Object
- #recover_from_mismatched_set(e, follow) ⇒ Object
- #recover_from_mismatched_token(type, follow) ⇒ Object
-
#report_error(e = $!) ⇒ Object
When a recognition error occurs, this method is the main hook for carrying out the error reporting process.
-
#reset ⇒ Object
Resets the recognizer’s state data to initial values.
- #resync ⇒ Object
- #rule_memoization(rule, start_index) ⇒ Object
- #syntactic_predicate?(name) ⇒ Boolean
- #syntax_errors? ⇒ Boolean
-
#token_error_display(token) ⇒ Object
formats a token object appropriately for inspection within an error message.
- #trace_in(rule_name, rule_index, input_symbol) ⇒ Object
- #trace_out(rule_name, rule_index, input_symbol) ⇒ Object
Methods included from TokenFactory
Methods included from Error
EarlyExit, FailedPredicate, MismatchedNotSet, MismatchedRange, MismatchedSet, MismatchedToken, MismatchedTreeNode, MissingToken, NoViableAlternative, RewriteCardinalityError, RewriteEarlyExit, RewriteEmptyStream, UnwantedToken
Constructor Details
#initialize(options = {}) ⇒ Recognizer
Create a new recognizer. The constructor simply ensures that all recognizers are initialized with a shared state object. See the main recognizer subclasses for more specific information about creating recognizer objects like lexers and parsers.
360 361 362 363 364 365 |
# File 'lib/antlr3/recognizers.rb', line 360 def initialize( = {} ) @state = [ :state ] || RecognizerSharedState.new @error_output = .fetch( :error_output, $stderr ) defined?( @input ) or @input = nil initialize_dfas end |
Class Attribute Details
.antlr_version ⇒ Object (readonly)
Returns the value of attribute antlr_version.
207 208 209 |
# File 'lib/antlr3/recognizers.rb', line 207 def antlr_version @antlr_version end |
.antlr_version_string ⇒ Object (readonly)
Returns the value of attribute antlr_version_string.
207 208 209 |
# File 'lib/antlr3/recognizers.rb', line 207 def antlr_version_string @antlr_version_string end |
.default_rule ⇒ Object
Returns the value of attribute default_rule.
213 214 215 |
# File 'lib/antlr3/recognizers.rb', line 213 def default_rule @default_rule end |
.grammar_file_name ⇒ Object (readonly)
Returns the value of attribute grammar_file_name.
207 208 209 |
# File 'lib/antlr3/recognizers.rb', line 207 def grammar_file_name @grammar_file_name end |
.grammar_home ⇒ Object (readonly)
Returns the value of attribute grammar_home.
207 208 209 |
# File 'lib/antlr3/recognizers.rb', line 207 def grammar_home @grammar_home end |
.library_version_string ⇒ Object (readonly)
Returns the value of attribute library_version_string.
207 208 209 |
# File 'lib/antlr3/recognizers.rb', line 207 def library_version_string @library_version_string end |
.token_scheme ⇒ Object
Returns the value of attribute token_scheme.
213 214 215 |
# File 'lib/antlr3/recognizers.rb', line 213 def token_scheme @token_scheme end |
Instance Attribute Details
#input ⇒ Object
Returns the value of attribute input.
344 345 346 |
# File 'lib/antlr3/recognizers.rb', line 344 def input @input end |
#state ⇒ Object (readonly)
Returns the value of attribute state.
345 346 347 |
# File 'lib/antlr3/recognizers.rb', line 345 def state @state end |
Class Method Details
.debug? ⇒ Boolean
306 307 308 |
# File 'lib/antlr3/recognizers.rb', line 306 def debug? return false end |
.define_return_scope(*members) ⇒ Object
this method is used to generate return-value structures for rules with multiple return values. To avoid generating a special class for ever rule in AST parsers and such (where most rules have the same default set of return values), each recognizer gets a default return value structure assigned to the constant Return
. Rules which don’t require additional custom members will have a rule-return name constant that just points to the generic return value.
241 242 243 244 245 246 247 |
# File 'lib/antlr3/recognizers.rb', line 241 def define_return_scope( *members ) if members.empty? then generic_return_scope else members += return_scope_members Struct.new( *members ) end end |
.generated_using(grammar_file, antlr_version, library_version = nil) ⇒ Object
generated recognizer code uses this method to stamp the code with the name of the grammar file and the current version of ANTLR being used to generate the code
219 220 221 222 223 224 225 226 227 228 229 230 |
# File 'lib/antlr3/recognizers.rb', line 219 def generated_using( grammar_file, antlr_version, library_version = nil ) @grammar_file_name = grammar_file.freeze @antlr_version_string = antlr_version.freeze @library_version = Util.parse_version( library_version ) if @antlr_version_string =~ /^(\d+)\.(\d+)(?:\.(\d+)(?:b(\d+))?)?(.*)$/ @antlr_version = [ $1, $2, $3, $4 ].map! { |str| str.to_i } = $5.strip #@antlr_release_time = $5.empty? ? nil : Time.parse($5) else raise "bad version string: %p" % version_string end end |
.generic_return_scope ⇒ Object
sets up and returns the generic rule return scope for a recognizer
260 261 262 263 264 265 |
# File 'lib/antlr3/recognizers.rb', line 260 def generic_return_scope @generic_return_scope ||= begin struct = Struct.new( *return_scope_members ) const_set( :Return, struct ) end end |
.imported_grammars ⇒ Object
267 268 269 |
# File 'lib/antlr3/recognizers.rb', line 267 def imported_grammars @imported_grammars ||= Set.new end |
.imports(*grammar_names) ⇒ Object
289 290 291 292 293 294 295 |
# File 'lib/antlr3/recognizers.rb', line 289 def imports( *grammar_names ) for grammar in grammar_names imported_grammars.add?( grammar.to_sym ) and attr_reader( Util.snake_case( grammar ) ) end return imported_grammars end |
.master ⇒ Object
275 276 277 |
# File 'lib/antlr3/recognizers.rb', line 275 def master master_grammars.last end |
.master_grammars ⇒ Object
271 272 273 |
# File 'lib/antlr3/recognizers.rb', line 271 def master_grammars @master_grammars ||= [] end |
.masters(*grammar_names) ⇒ Object
279 280 281 282 283 284 285 286 |
# File 'lib/antlr3/recognizers.rb', line 279 def masters( *grammar_names ) for grammar in grammar_names unless master_grammars.include?( grammar ) master_grammars << grammar attr_reader( Util.snake_case( grammar ) ) end end end |
.profile? ⇒ Boolean
310 311 312 |
# File 'lib/antlr3/recognizers.rb', line 310 def profile? return false end |
.return_scope_members ⇒ Object
used as a hook to add additional default members to default return value structures For example, all AST-building parsers override this method to add an extra :tree
field to all rule return structures.
254 255 256 |
# File 'lib/antlr3/recognizers.rb', line 254 def return_scope_members [ :start, :stop ] end |
.rules ⇒ Object
298 299 300 |
# File 'lib/antlr3/recognizers.rb', line 298 def rules self::RULE_METHODS.dup rescue [] end |
.Scope(*declarations, &body) ⇒ Object
314 315 316 |
# File 'lib/antlr3/recognizers.rb', line 314 def Scope( *declarations, &body ) Scope.new( *declarations, &body ) end |
.token_class ⇒ Object
318 319 320 321 322 323 324 |
# File 'lib/antlr3/recognizers.rb', line 318 def token_class @token_class ||= begin self::Token rescue superclass.token_class rescue ANTLR3::CommonToken end end |
Instance Method Details
#already_parsed_rule?(rule) ⇒ Boolean
866 867 868 869 870 871 872 873 874 875 876 |
# File 'lib/antlr3/recognizers.rb', line 866 def already_parsed_rule?( rule ) stop_index = rule_memoization( rule, @input.index ) case stop_index when MEMO_RULE_UNKNOWN then return false when MEMO_RULE_FAILED raise BacktrackingFailed else @input.seek( stop_index + 1 ) end return true end |
#antlr_version ⇒ Object
336 337 338 |
# File 'lib/antlr3/recognizers.rb', line 336 def antlr_version self.class.antlr_version end |
#antlr_version_string ⇒ Object
340 341 342 |
# File 'lib/antlr3/recognizers.rb', line 340 def antlr_version_string self.class.antlr_version_string end |
#backtrack ⇒ Object
839 840 841 842 843 844 845 846 847 848 849 850 851 |
# File 'lib/antlr3/recognizers.rb', line 839 def backtrack @state.backtracking += 1 start = @input.mark success = begin yield rescue BacktrackingFailed then false else true end return success ensure @input.rewind( start ) @state.backtracking -= 1 end |
#backtracking? ⇒ Boolean
Returns true if the recognizer is currently in a decision for which backtracking has been enabled
827 828 829 |
# File 'lib/antlr3/recognizers.rb', line 827 def backtracking? @state.backtracking > 0 end |
#backtracking_level ⇒ Object Also known as: backtracking
831 832 833 |
# File 'lib/antlr3/recognizers.rb', line 831 def backtracking_level @state.backtracking end |
#backtracking_level=(n) ⇒ Object Also known as: backtracking=
835 836 837 |
# File 'lib/antlr3/recognizers.rb', line 835 def backtracking_level=( n ) @state.backtracking = n end |
#begin_resync ⇒ Object
overridable hook method that is executed at the start of the resyncing procedure in recover
by default, it does nothing
519 520 521 |
# File 'lib/antlr3/recognizers.rb', line 519 def begin_resync # do nothing end |
#combine_follows(exact) ⇒ Object
779 780 781 782 783 784 785 786 787 788 789 790 791 792 |
# File 'lib/antlr3/recognizers.rb', line 779 def combine_follows( exact ) follow_set = Set.new @state.following.each_with_index.reverse_each do |local_follow_set, index| follow_set |= local_follow_set if exact if local_follow_set.include?( EOR_TOKEN_TYPE ) follow_set.delete( EOR_TOKEN_TYPE ) if index > 0 else break end end end return follow_set end |
#compute_context_sensitive_rule_follow ⇒ Object
Compute the context-sensitive FOLLOW
set for current rule. This is set of token types that can follow a specific rule reference given a specific call chain. You get the set of viable tokens that can possibly come next (look depth 1) given the current call chain. Contrast this with the definition of plain FOLLOW for rule r:
FOLLOW(r)={x | S=>*alpha r beta in G and x in FIRST(beta)}
where x in T* and alpha, beta in V*; T is set of terminals and V is the set of terminals and nonterminals. In other words, FOLLOW® is the set of all tokens that can possibly follow references to r in any sentential form (context). At runtime, however, we know precisely which context applies as we have the call chain. We may compute the exact (rather than covering superset) set of following tokens.
For example, consider grammar:
stat : ID '=' expr ';' // FOLLOW(stat)=={EOF}
| "return" expr '.'
;
expr : atom ('+' atom)* ; // FOLLOW(expr)=={';','.',')'}
atom : INT // FOLLOW(atom)=={'+',')',';','.'}
| '(' expr ')'
;
The FOLLOW sets are all inclusive whereas context-sensitive FOLLOW sets are precisely what could follow a rule reference. For input input “i=(3);”, here is the derivation:
stat => ID '=' expr ';'
=> ID '=' atom ('+' atom)* ';'
=> ID '=' '(' expr ')' ('+' atom)* ';'
=> ID '=' '(' atom ')' ('+' atom)* ';'
=> ID '=' '(' INT ')' ('+' atom)* ';'
=> ID '=' '(' INT ')' ';'
At the “3” token, you’d have a call chain of
stat -> expr -> atom -> expr -> atom
What can follow that specific nested ref to atom? Exactly ‘)’ as you can see by looking at the derivation of this specific input. Contrast this with the FOLLOW(atom)=ANTLR3::Recognizer.‘+’,‘)’,‘;’,‘‘+’,‘)’,‘;’,‘.’.
You want the exact viable token set when recovering from a token mismatch. Upon token mismatch, if LA(1) is member of the viable next token set, then you know there is most likely a missing token in the input stream. “Insert” one by just not throwing an exception.
775 776 777 |
# File 'lib/antlr3/recognizers.rb', line 775 def compute_context_sensitive_rule_follow combine_follows true end |
#compute_error_recovery_set ⇒ Object
(The following explanation has been lifted directly from the
source code documentation of the ANTLR Java runtime library)
Compute the error recovery set for the current rule. During rule invocation, the parser pushes the set of tokens that can follow that rule reference on the stack; this amounts to computing FIRST of what follows the rule reference in the enclosing rule. This local follow set only includes tokens from within the rule; i.e., the FIRST computation done by ANTLR stops at the end of a rule.
EXAMPLE
When you find a “no viable alt exception”, the input is not consistent with any of the alternatives for rule r. The best thing to do is to consume tokens until you see something that can legally follow a call to r or any rule that called r. You don’t want the exact set of viable next tokens because the input might just be missing a token–you might consume the rest of the input looking for one of the missing tokens.
Consider grammar:
a : '[' b ']'
| '(' b ')'
;
b : c '^' INT ;
c : ID
| INT
;
At each rule invocation, the set of tokens that could follow that rule is pushed on a stack. Here are the various “local” follow sets:
FOLLOW( b1_in_a ) = FIRST( ']' ) = ']'
FOLLOW( b2_in_a ) = FIRST( ')' ) = ')'
FOLLOW( c_in_b ) = FIRST( '^' ) = '^'
Upon erroneous input “[]”, the call chain is
a -> b -> c
and, hence, the follow context stack is:
depth local follow set after call to rule
0 \<EOF> a (from main( ) )
1 ']' b
3 '^' c
Notice that ')'
is not included, because b would have to have been called from a different context in rule a for ‘)’ to be included.
For error recovery, we cannot consider FOLLOW© (context-sensitive or otherwise). We need the combined set of all context-sensitive FOLLOW sets–the set of all tokens that could follow any reference in the call chain. We need to resync to one of those tokens. Note that FOLLOW©=‘^’ and if we resync’d to that token, we’d consume until EOF. We need to sync to context-sensitive FOLLOWs for a, b, and c: ‘]’,‘^’. In this case, for input “[]”, LA(1) is in this set so we would not consume anything and after printing an error rule c would return normally. It would not find the required ‘^’ though. At this point, it gets a mismatched token error and throws an exception (since LA(1) is not in the viable following token set). The rule exception handler tries to recover, but finds the same recovery set and doesn’t consume anything. Rule b exits normally returning to rule a. Now it finds the ‘]’ (and with the successful match exits errorRecovery mode).
So, you cna see that the parser walks up call chain looking for the token that was a member of the recovery set.
Errors are not generated in errorRecovery mode.
ANTLR’s error recovery mechanism is based upon original ideas:
“Algorithms + Data Structures = Programs” by Niklaus Wirth
and
“A note on error recovery in recursive descent parsers”: portal.acm.org/citation.cfm?id=947902.947905
Later, Josef Grosch had some good ideas:
“Efficient and Comfortable Error Recovery in Recursive Descent Parsers”: www.cocolab.com/products/cocktail/doca4.ps/ell.ps.zip
Like Grosch I implemented local FOLLOW sets that are combined at run-time upon error to avoid overhead during parsing.
623 624 625 |
# File 'lib/antlr3/recognizers.rb', line 623 def compute_error_recovery_set combine_follows( false ) end |
#consume_until(types) ⇒ Object
Consume input symbols until one matches a type within types
types can be a single symbol type or a set of symbol types
813 814 815 816 817 818 819 820 821 |
# File 'lib/antlr3/recognizers.rb', line 813 def consume_until( types ) types.is_a?( Set ) or types = Set[ *types ] type = @input.peek until type == EOF or types.include?( type ) @input.consume type = @input.peek end return( type ) end |
#current_symbol ⇒ Object
Match needs to return the current input symbol, which gets put into the label for the associated token ref; e.g., x=ID. Token and tree parsers need to return different objects. Rather than test for input stream type or change the IntStream interface, I use a simple method to ask the recognizer to tell me what the current input symbol is.
This is ignored for lexers.
804 805 806 |
# File 'lib/antlr3/recognizers.rb', line 804 def current_symbol @input.look end |
#display_recognition_error(e = $!) ⇒ Object
error reporting hook for presenting the information The default implementation builds appropriate error message text using error_header
and error_message
, and calls emit_error_message
to write the error message out to some source
424 425 426 427 428 |
# File 'lib/antlr3/recognizers.rb', line 424 def display_recognition_error( e = $! ) header = error_header( e ) = ( e ) ( "#{ header } #{ }" ) end |
#each_delegate ⇒ Object
347 348 349 350 351 352 353 |
# File 'lib/antlr3/recognizers.rb', line 347 def each_delegate block_given? or return enum_for( __method__ ) for grammar in self.class.imported_grammars del = __send__( Util.snake_case( grammar ) ) and yield( del ) end end |
#emit_error_message(message) ⇒ Object
Write the error report data out to some source. By default, the error message is written to $stderr
491 492 493 |
# File 'lib/antlr3/recognizers.rb', line 491 def ( ) @error_output.puts( ) if @error_output end |
#end_resync ⇒ Object
overridable hook method that is after the resyncing procedure has completed
by default, it does nothing
526 527 528 |
# File 'lib/antlr3/recognizers.rb', line 526 def end_resync # do nothing end |
#error_header(e = $!) ⇒ Object
used to add a tag to the error message that indicates the location of the input stream when the error occurred
466 467 468 |
# File 'lib/antlr3/recognizers.rb', line 466 def error_header( e = $! ) e.location end |
#error_message(e = $!) ⇒ Object
used to construct an appropriate error message based on the specific type of error and the error’s attributes
433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 |
# File 'lib/antlr3/recognizers.rb', line 433 def ( e = $! ) case e when UnwantedToken token_name = token_name( e.expecting ) "extraneous input #{ token_error_display( e.unexpected_token ) } expecting #{ token_name }" when MissingToken token_name = token_name( e.expecting ) "missing #{ token_name } at #{ token_error_display( e.symbol ) }" when MismatchedToken token_name = token_name( e.expecting ) "mismatched input #{ token_error_display( e.symbol ) } expecting #{ token_name }" when MismatchedTreeNode token_name = token_name( e.expecting ) "mismatched tree node: #{ e.symbol } expecting #{ token_name }" when NoViableAlternative "no viable alternative at input " << token_error_display( e.symbol ) when MismatchedSet "mismatched input %s expecting set %s" % [ token_error_display( e.symbol ), e.expecting.inspect ] when MismatchedNotSet "mismatched input %s expecting set %s" % [ token_error_display( e.symbol ), e.expecting.inspect ] when FailedPredicate "rule %s failed predicate: { %s }?" % [ e.rule_name, e.predicate_text ] else e. end end |
#grammar_file_name ⇒ Object
332 333 334 |
# File 'lib/antlr3/recognizers.rb', line 332 def grammar_file_name self.class.grammar_file_name end |
#match(type, follow) ⇒ Object
Attempt to match the current input symbol the token type specified by type
. If the symbol matches the type, consume the current symbol and return its value. If the symbol doesn’t match, attempt to use the follow-set data provided by follow
to recover from the mismatched token.
385 386 387 388 389 390 391 392 393 394 |
# File 'lib/antlr3/recognizers.rb', line 385 def match( type, follow ) matched_symbol = current_symbol if @input.peek == type @input.consume @state.error_recovery = false return matched_symbol end raise( BacktrackingFailed ) if @state.backtracking > 0 return recover_from_mismatched_token( type, follow ) end |
#match_any ⇒ Object
match anything – i.e. wildcard match. Simply consume the current symbol from the input stream.
398 399 400 401 |
# File 'lib/antlr3/recognizers.rb', line 398 def match_any @state.error_recovery = false @input.consume end |
#memoize(rule, start_index, success) ⇒ Object
878 879 880 881 |
# File 'lib/antlr3/recognizers.rb', line 878 def memoize( rule, start_index, success ) stop_index = success ? @input.index - 1 : MEMO_RULE_FAILED memo = @state.rule_memory[ rule ] and memo[ start_index ] = stop_index end |
#mismatch_is_missing_token?(follow) ⇒ Boolean
692 693 694 695 696 697 698 699 700 701 702 703 704 |
# File 'lib/antlr3/recognizers.rb', line 692 def mismatch_is_missing_token?( follow ) follow.nil? and return false if follow.include?( EOR_TOKEN_TYPE ) viable_tokens = compute_context_sensitive_rule_follow follow = follow | viable_tokens follow.delete( EOR_TOKEN_TYPE ) unless @state.following.empty? end if follow.include?( @input.peek ) or follow.include?( EOR_TOKEN_TYPE ) return true end return false end |
#mismatch_is_unwanted_token?(type) ⇒ Boolean
688 689 690 |
# File 'lib/antlr3/recognizers.rb', line 688 def mismatch_is_unwanted_token?( type ) @input.peek( 2 ) == type end |
#missing_symbol(error, expected_token_type, follow) ⇒ Object
Conjure up a missing token during error recovery.
The recognizer attempts to recover from single missing symbols. But, actions might refer to that missing symbol. For example, x=ID f($x);. The action clearly assumes that there has been an identifier matched previously and that $x points at that token. If that token is missing, but the next token in the stream is what we want we assume that this token is missing and we keep going. Because we have to return some token to replace the missing token, we have to conjure one up. This method gives the user control over the tokens returned for missing tokens. Mostly, you will want to create something special for identifier tokens. For literals such as ‘{’ and ‘,’, the default action in the parser or tree parser works. It simply creates a CommonToken of the appropriate type. The text will be the token. If you change what tokens must be created by the lexer, override this method to create the appropriate tokens.
684 685 686 |
# File 'lib/antlr3/recognizers.rb', line 684 def missing_symbol( error, expected_token_type, follow ) return nil end |
#number_of_syntax_errors ⇒ Object
factor out what to do upon token mismatch so tree parsers can behave differently.
-
override this method in your parser to do things like bailing out after the first error
-
just raise the exception instead of calling the recovery method.
718 719 720 |
# File 'lib/antlr3/recognizers.rb', line 718 def number_of_syntax_errors @state.syntax_errors end |
#recover(error = $!) ⇒ Object
Error Recovery ########################################
499 500 501 502 503 504 505 506 |
# File 'lib/antlr3/recognizers.rb', line 499 def recover( error = $! ) @state.last_error_index == @input.index and @input.consume @state.last_error_index = @input.index follow_set = compute_error_recovery_set resync { consume_until( follow_set ) } end |
#recover_from_mismatched_element(e, follow) ⇒ Object
653 654 655 656 657 658 659 660 661 662 663 664 |
# File 'lib/antlr3/recognizers.rb', line 653 def recover_from_mismatched_element( e, follow ) follow.nil? and return false if follow.include?( EOR_TOKEN_TYPE ) viable_tokens = compute_context_sensitive_rule_follow follow = ( follow | viable_tokens ) - Set[ EOR_TOKEN_TYPE ] end if follow.include?( @input.peek ) report_error( e ) return true end return false end |
#recover_from_mismatched_set(e, follow) ⇒ Object
645 646 647 648 649 650 651 |
# File 'lib/antlr3/recognizers.rb', line 645 def recover_from_mismatched_set( e, follow ) if mismatch_is_missing_token?( follow ) report_error( e ) return missing_symbol( e, INVALID_TOKEN_TYPE, follow ) end raise e end |
#recover_from_mismatched_token(type, follow) ⇒ Object
627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 |
# File 'lib/antlr3/recognizers.rb', line 627 def recover_from_mismatched_token( type, follow ) if mismatch_is_unwanted_token?( type ) err = UnwantedToken( type ) resync { @input.consume } report_error( err ) return @input.consume end if mismatch_is_missing_token?( follow ) inserted = missing_symbol( nil, type, follow ) report_error( MissingToken( type, inserted ) ) return inserted end raise MismatchedToken( type ) end |
#report_error(e = $!) ⇒ Object
When a recognition error occurs, this method is the main hook for carrying out the error reporting process. The default implementation calls display_recognition_error
to display the error info on $stderr.
412 413 414 415 416 417 |
# File 'lib/antlr3/recognizers.rb', line 412 def report_error( e = $! ) @state.error_recovery and return @state.syntax_errors += 1 @state.error_recovery = true display_recognition_error( e ) end |
#reset ⇒ Object
Resets the recognizer’s state data to initial values. As a result, all error tracking and error recovery data accumulated in the current state will be cleared. It will also attempt to reset the input stream via input.reset, but it ignores any errors received from doing so. Thus the input stream is not guarenteed to be rewound to its initial position
374 375 376 377 |
# File 'lib/antlr3/recognizers.rb', line 374 def reset @state and @state.reset! @input and @input.reset rescue nil end |
#resync ⇒ Object
508 509 510 511 512 513 |
# File 'lib/antlr3/recognizers.rb', line 508 def resync begin_resync return( yield ) ensure end_resync end |
#rule_memoization(rule, start_index) ⇒ Object
860 861 862 863 864 |
# File 'lib/antlr3/recognizers.rb', line 860 def rule_memoization( rule, start_index ) @state.rule_memory.fetch( rule ) do @state.rule_memory[ rule ] = Hash.new( MEMO_RULE_UNKNOWN ) end[ start_index ] end |
#syntactic_predicate?(name) ⇒ Boolean
853 854 855 |
# File 'lib/antlr3/recognizers.rb', line 853 def syntactic_predicate?( name ) backtrack { send name } end |
#syntax_errors? ⇒ Boolean
706 707 708 |
# File 'lib/antlr3/recognizers.rb', line 706 def syntax_errors? ( error_count = @state.syntax_errors ) > 0 and return( error_count ) end |
#token_error_display(token) ⇒ Object
formats a token object appropriately for inspection within an error message
474 475 476 477 478 479 480 481 482 483 484 485 |
# File 'lib/antlr3/recognizers.rb', line 474 def token_error_display( token ) unless text = token.text || ( token.source_text rescue nil ) text = case when token.type == EOF then '<EOF>' when name = token_name( token.type ) rescue nil then "<#{ name }>" when token.respond_to?( :name ) then "<#{ token.name }>" else "<#{ token.type }>" end end return text.inspect end |
#trace_in(rule_name, rule_index, input_symbol) ⇒ Object
883 884 885 886 887 888 889 |
# File 'lib/antlr3/recognizers.rb', line 883 def trace_in( rule_name, rule_index, input_symbol ) @error_output.printf( "--> enter %s on %s", rule_name, input_symbol ) @state.backtracking > 0 and @error_output.printf( " (in backtracking mode: depth = %s)", @state.backtracking ) @error_output.print( "\n" ) end |
#trace_out(rule_name, rule_index, input_symbol) ⇒ Object
891 892 893 894 895 896 897 |
# File 'lib/antlr3/recognizers.rb', line 891 def trace_out( rule_name, rule_index, input_symbol ) @error_output.printf( "<-- exit %s on %s", rule_name, input_symbol ) @state.backtracking > 0 and @error_output.printf( " (in backtracking mode: depth = %s)", @state.backtracking ) @error_output.print( "\n" ) end |