Class: ANTLR3::Lexer
Overview
Lexer
Lexer is the default superclass of all lexers generated by ANTLR. The class tailors the core functionality provided by Recognizer to the task of matching patterns in the text input and breaking the input into tokens.
About Lexers
A lexer’s job is to take input text and break it up into tokens – objects that encapsulate a piece of text, a type label (such as ID or INTEGER), and the position of the text with respect to the input. Thus, a lexer is essentially a complicated iterator that steps through an input stream and produces a sequence of tokens. Sometimes lexers are enough to carry out a goal on their own, such as tasks like source code highlighting and simple code analysis. Usually, however, the lexer converts text into tokens for use by a parser, which recognizes larger structures within the text.
ANTLR parsers have a variety of entry points specified by parser rules, each of which defines the structure of a specific type of sentence in a grammar. Lexers, however, are primarily intended to have a single entry point. It looks at the characters starting at the current input position, decides if the chunk of text matches one of a number of possible token type definitions, wraps the chunk into a token with information on its type and location, and advances the input stream to the next place.
ANTLR Lexers and the Lexer API
ANTLR-generated lexers will subclass this class, unless specified otherwise within a grammar file. The generated class will provide an implementation of each lexer rule as a method of the same name. The subclass will also provide an implementation for the abstract method #m_tokens, the purpose of which is to multiplex the token type definitions and predict what rule definition to execute to fetch a token. The primary method in the lexer API, #next_token, uses #m_tokens to fetch the next token and drive the iteration.
If the lexer is preparing tokens for use by an ANTLR generated parser, the lexer will generally be used to build a TokenStream object. The following code example demonstrates the typical setup for using ANTLR parsers and lexers in Ruby.
module Hypothetical
class Lexer < ANTLR3::Lexer
end
end
module Hypothetical
class Parser < ANTLR3::Parser
end
end
source = "some hypothetical source code"
input = ANTLR3::StringStream.new(source, :file => 'blah-de-blah.hyp')
lexer = Hypothetical::Lexer.new( input )
tokens = ANTLR3::CommonTokenStream.new( lexer )
parser = Hypothetical::Parser.new( tokens )
lexer = Hypothetical::Lexer.new("some hypothetical source code", :file => 'blah-de-blah.hyp')
parser = Hypothetical::Parser.new( lexer )
Constant Summary
Constants included
from Constants
Constants::BUILT_IN_TOKEN_NAMES, Constants::DEFAULT, Constants::DOWN, Constants::EOF, Constants::EOF_TOKEN, Constants::EOR_TOKEN_TYPE, Constants::HIDDEN, Constants::INVALID, Constants::INVALID_NODE, Constants::INVALID_TOKEN, Constants::MEMO_RULE_FAILED, Constants::MEMO_RULE_UNKNOWN, Constants::MIN_TOKEN_TYPE, Constants::SKIP_TOKEN, Constants::UP
Instance Attribute Summary
Attributes inherited from Recognizer
#input, #state
Attributes included from TokenFactory
#token_class
Class Method Summary
collapse
Instance Method Summary
collapse
#each, #next, #to_stream
Methods inherited from Recognizer
Scope, #already_parsed_rule?, #antlr_version, #antlr_version_string, #backtrack, #backtracking?, #backtracking_level, #backtracking_level=, #begin_resync, #combine_follows, #compute_context_sensitive_rule_follow, #compute_error_recovery_set, #consume_until, debug?, define_return_scope, #display_recognition_error, #each_delegate, #emit_error_message, #end_resync, #error_header, generated_using, generic_return_scope, #grammar_file_name, imported_grammars, imports, master, master_grammars, masters, #memoize, #mismatch_is_missing_token?, #mismatch_is_unwanted_token?, #missing_symbol, #number_of_syntax_errors, profile?, #recover_from_mismatched_element, #recover_from_mismatched_set, #recover_from_mismatched_token, #reset, #resync, return_scope_members, #rule_memoization, rules, #syntactic_predicate?, #syntax_errors?, token_class, #token_error_display
Methods included from Error
EarlyExit, FailedPredicate, MismatchedNotSet, MismatchedRange, MismatchedSet, MismatchedToken, MismatchedTreeNode, MissingToken, NoViableAlternative, RewriteCardinalityError, RewriteEarlyExit, RewriteEmptyStream, UnwantedToken
Constructor Details
#initialize(input, options = {}) ⇒ Lexer
Returns a new instance of Lexer.
1014
1015
1016
1017
|
# File 'lib/antlr3/recognizers.rb', line 1014
def initialize( input, options = {} )
super( options )
@input = cast_input( input, options )
end
|
Class Method Details
.associated_parser ⇒ Object
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
|
# File 'lib/antlr3/recognizers.rb', line 1001
def self.associated_parser
@associated_parser ||= begin
@grammar_home and @grammar_home::Parser
rescue NameError
grammar_name = @grammar_home.name.split( "::" ).last
begin
require "#{ grammar_name }Parser"
@grammar_home::Parser
rescue LoadError, NameError
end
end
end
|
.default_rule ⇒ Object
991
992
993
|
# File 'lib/antlr3/recognizers.rb', line 991
def self.default_rule
@default_rule ||= :token!
end
|
.main(argv = ARGV, options = {}) ⇒ Object
995
996
997
998
999
|
# File 'lib/antlr3/recognizers.rb', line 995
def self.main( argv = ARGV, options = {} )
if argv.is_a?( ::Hash ) then argv, options = ARGV, argv end
main = ANTLR3::Main::LexerMain.new( self, options )
block_given? ? yield( main ) : main.execute( argv )
end
|
Instance Method Details
#char_stream=(input) ⇒ Object
Also known as:
input=
1060
1061
1062
1063
1064
|
# File 'lib/antlr3/recognizers.rb', line 1060
def char_stream=( input )
@input = nil
reset()
@input = input
end
|
#character_error_display(char) ⇒ Object
1163
1164
1165
1166
1167
1168
1169
|
# File 'lib/antlr3/recognizers.rb', line 1163
def character_error_display( char )
case char
when EOF then '<EOF>'
when Integer then char.chr.inspect
else char.inspect
end
end
|
#character_index ⇒ Object
1124
1125
1126
|
# File 'lib/antlr3/recognizers.rb', line 1124
def character_index
@input.index
end
|
#column ⇒ Object
1120
1121
1122
|
# File 'lib/antlr3/recognizers.rb', line 1120
def column
@input.column
end
|
#current_symbol ⇒ Object
1019
1020
1021
|
# File 'lib/antlr3/recognizers.rb', line 1019
def current_symbol
nil
end
|
#emit(token = @state.token) ⇒ Object
1070
1071
1072
1073
1074
|
# File 'lib/antlr3/recognizers.rb', line 1070
def emit( token = @state.token )
token ||= create_token
@state.token = token
return token
end
|
#error_message(e) ⇒ Object
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
|
# File 'lib/antlr3/recognizers.rb', line 1141
def error_message( e )
char = character_error_display( e.symbol ) rescue nil
case e
when Error::MismatchedToken
expecting = character_error_display( e.expecting )
"mismatched character #{ char }; expecting #{ expecting }"
when Error::NoViableAlternative
"no viable alternative at character #{ char }"
when Error::EarlyExit
"required ( ... )+ loop did not match anything at character #{ char }"
when Error::MismatchedNotSet
"mismatched character %s; expecting set %p" % [ char, e.expecting ]
when Error::MismatchedSet
"mismatched character %s; expecting set %p" % [ char, e.expecting ]
when Error::MismatchedRange
a = character_error_display( e.min )
b = character_error_display( e.max )
"mismatched character %s; expecting set %s..%s" % [ char, a, b ]
else super
end
end
|
#exhaust ⇒ Object
1056
1057
1058
|
# File 'lib/antlr3/recognizers.rb', line 1056
def exhaust
self.to_a
end
|
#line ⇒ Object
1116
1117
1118
|
# File 'lib/antlr3/recognizers.rb', line 1116
def line
@input.line
end
|
#match(expected) ⇒ Object
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
|
# File 'lib/antlr3/recognizers.rb', line 1076
def match( expected )
case expected
when String
expected.each_byte do |char|
unless @input.peek == char
@state.backtracking > 0 and raise BacktrackingFailed
error = MismatchedToken( char )
recover( error )
raise error
end
@input.consume()
end
else unless @input.peek == expected
@state.backtracking > 0 and raise BacktrackingFailed
error = MismatchedToken( expected )
recover( error )
raise error
end
@input.consume
end
return true
end
|
#match_any ⇒ Object
1100
1101
1102
|
# File 'lib/antlr3/recognizers.rb', line 1100
def match_any
@input.consume
end
|
#match_range(min, max) ⇒ Object
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
|
# File 'lib/antlr3/recognizers.rb', line 1104
def match_range( min, max )
char = @input.peek
if char.between?( min, max ) then @input.consume
else
@state.backtracking > 0 and raise BacktrackingFailed
error = MismatchedRange( min.chr, max.chr )
recover( error )
raise( error )
end
return true
end
|
#next_token ⇒ Object
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
|
# File 'lib/antlr3/recognizers.rb', line 1023
def next_token
loop do
@state.token = nil
@state.channel = DEFAULT_CHANNEL
@state.token_start_position = @input.index
@state.token_start_column = @input.column
@state.token_start_line = @input.line
@state.text = nil
@input.peek == EOF and return EOF_TOKEN
begin
token!
case token = @state.token
when nil then return( emit )
when SKIP_TOKEN then next
else
return token
end
rescue NoViableAlternative => re
report_error( re )
recover( re )
rescue Error::RecognitionError => re
report_error( re )
end
end
end
|
#recover(re) ⇒ Object
1171
1172
1173
|
# File 'lib/antlr3/recognizers.rb', line 1171
def recover( re )
@input.consume
end
|
#report_error(e) ⇒ Object
1137
1138
1139
|
# File 'lib/antlr3/recognizers.rb', line 1137
def report_error( e )
display_recognition_error( e )
end
|
#skip ⇒ Object
1050
1051
1052
|
# File 'lib/antlr3/recognizers.rb', line 1050
def skip
@state.token = SKIP_TOKEN
end
|
#source_name ⇒ Object
1066
1067
1068
|
# File 'lib/antlr3/recognizers.rb', line 1066
def source_name
@input.source_name
end
|
#text ⇒ Object
1128
1129
1130
1131
|
# File 'lib/antlr3/recognizers.rb', line 1128
def text
@state.text and return @state.text
@input.substring( @state.token_start_position, character_index - 1 )
end
|
#text=(text) ⇒ Object
1133
1134
1135
|
# File 'lib/antlr3/recognizers.rb', line 1133
def text=( text )
@state.text = text
end
|