Class: ANTLR3::Lexer

Inherits:
Recognizer show all
Includes:
TokenSource
Defined in:
lib/antlr3/recognizers.rb

Overview

Lexer

Lexer is the default superclass of all lexers generated by ANTLR. The class tailors the core functionality provided by Recognizer to the task of matching patterns in the text input and breaking the input into tokens.

About Lexers

A lexer's job is to take input text and break it up into tokens -- objects that encapsulate a piece of text, a type label (such as ID or INTEGER), and the position of the text with respect to the input. Thus, a lexer is essentially a complicated iterator that steps through an input stream and produces a sequence of tokens. Sometimes lexers are enough to carry out a goal on their own, such as tasks like source code highlighting and simple code analysis. Usually, however, the lexer converts text into tokens for use by a parser, which recognizes larger structures within the text.

ANTLR parsers have a variety of entry points specified by parser rules, each of which defines the structure of a specific type of sentence in a grammar. Lexers, however, are primarily intended to have a single entry point. It looks at the characters starting at the current input position, decides if the chunk of text matches one of a number of possible token type definitions, wraps the chunk into a token with information on its type and location, and advances the input stream to the next place.

ANTLR Lexers and the Lexer API

ANTLR-generated lexers will subclass this class, unless specified otherwise within a grammar file. The generated class will provide an implementation of each lexer rule as a method of the same name. The subclass will also provide an implementation for the abstract method #m_tokens, the purpose of which is to multiplex the token type definitions and predict what rule definition to execute to fetch a token. The primary method in the lexer API, #next_token, uses #m_tokens to fetch the next token and drive the iteration.

If the lexer is preparing tokens for use by an ANTLR generated parser, the lexer will generally be used to build a TokenStream object. The following code example demonstrates the typical setup for using ANTLR parsers and lexers in Ruby.

# in HypotheticalLexer.rb
module Hypothetical
class Lexer < ANTLR3::Lexer
  # ...
  # ANTLR generated code
  # ...
end
end

# in HypotheticalParser.rb
module Hypothetical
class Parser < ANTLR3::Parser
  # ...
  # more ANTLR generated code
  # ...
end
end

# to take hypothetical source code and prepare it for parsing,
# there is generally a four-step construction process

source = "some hypothetical source code"
input = ANTLR3::StringStream.new(source, :file => 'blah-de-blah.hyp')
lexer = Hypothetical::Lexer.new( input )
tokens = ANTLR3::CommonTokenStream.new( lexer )
parser = Hypothetical::Parser.new( tokens )

# if you're using the standard streams, ANTLR3::StringStream and
# ANTLR3::CommonTokenStream, you can write the same process 
# shown above more succinctly:

lexer  = Hypothetical::Lexer.new("some hypothetical source code", :file => 'blah-de-blah.hyp')
parser = Hypothetical::Parser.new( lexer )

Direct Known Subclasses

Template::GroupFile::Lexer

Constant Summary

Constant Summary

Constants included from Constants

Constants::BUILT_IN_TOKEN_NAMES, Constants::DEFAULT, Constants::DOWN, Constants::EOF, Constants::EOF_TOKEN, Constants::EOR_TOKEN_TYPE, Constants::HIDDEN, Constants::INVALID_TOKEN, Constants::INVALID_TOKEN_TYPE, Constants::MEMO_RULE_FAILED, Constants::MEMO_RULE_UNKNOWN, Constants::MIN_TOKEN_TYPE, Constants::SKIP_TOKEN, Constants::UP

Instance Attribute Summary

Attributes inherited from Recognizer

#input, #state

Attributes included from TokenFactory

#token_class

Class Method Summary collapse

Instance Method Summary collapse

Methods included from TokenSource

#each, #next, #to_stream

Methods inherited from Recognizer

Scope, #already_parsed_rule?, #antlr_version, #antlr_version_string, #backtrack, #backtracking?, #backtracking_level, #backtracking_level=, #begin_resync, #combine_follows, #compute_context_sensitive_rule_follow, #compute_error_recovery_set, #consume_until, debug?, define_return_scope, #display_recognition_error, #each_delegate, #emit_error_message, #end_resync, #error_header, generic_return_scope, #grammar_file_name, imported_grammars, master, master_grammars, #memoize, #mismatch_is_missing_token?, #mismatch_is_unwanted_token?, #missing_symbol, #number_of_syntax_errors, profile?, #recover_from_mismatched_element, #recover_from_mismatched_set, #recover_from_mismatched_token, #reset, #resync, return_scope_members, #rule_memoization, rules, #syntactic_predicate?, #syntax_errors?, token_class, #token_error_display

Methods included from Error

#EarlyExit, #FailedPredicate, #MismatchedNotSet, #MismatchedRange, #MismatchedSet, #MismatchedToken, #MismatchedTreeNode, #MissingToken, #NoViableAlternative, #RewriteCardinalityError, #RewriteEarlyExit, #RewriteEmptyStream, #UnwantedToken

Constructor Details

#initialize(input, options = {}) ⇒ Lexer

Returns a new instance of Lexer



1014
1015
1016
1017
# File 'lib/antlr3/recognizers.rb', line 1014

def initialize( input, options = {} )
  super( options )
  @input = cast_input( input, options )
end

Class Method Details

.associated_parserObject



1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
# File 'lib/antlr3/recognizers.rb', line 1001

def self.associated_parser
  @associated_parser ||= begin
    @grammar_home and @grammar_home::Parser
  rescue NameError
    grammar_name = @grammar_home.name.split( "::" ).last
    begin
      require "#{ grammar_name }Parser"
      @grammar_home::Parser
    rescue LoadError, NameError
    end
  end
end

.default_ruleObject



991
992
993
# File 'lib/antlr3/recognizers.rb', line 991

def self.default_rule
  @default_rule ||= :token!
end

.main(argv = ARGV, options = {}) ⇒ Object



995
996
997
998
999
# File 'lib/antlr3/recognizers.rb', line 995

def self.main( argv = ARGV, options = {} )
  if argv.is_a?( ::Hash ) then argv, options = ARGV, argv end
  main = ANTLR3::Main::LexerMain.new( self, options )
  block_given? ? yield( main ) : main.execute( argv )
end

Instance Method Details

#char_stream=(input) ⇒ Object Also known as: input=



1060
1061
1062
1063
1064
# File 'lib/antlr3/recognizers.rb', line 1060

def char_stream=( input )
  @input = nil
  reset()
  @input = input
end

#character_error_display(char) ⇒ Object



1163
1164
1165
1166
1167
1168
1169
# File 'lib/antlr3/recognizers.rb', line 1163

def character_error_display( char )
  case char
  when EOF then '<EOF>'
  when Integer then char.chr.inspect
  else char.inspect
  end
end

#character_indexObject



1124
1125
1126
# File 'lib/antlr3/recognizers.rb', line 1124

def character_index
  @input.index
end

#columnObject



1120
1121
1122
# File 'lib/antlr3/recognizers.rb', line 1120

def column
  @input.column
end

#current_symbolObject



1019
1020
1021
# File 'lib/antlr3/recognizers.rb', line 1019

def current_symbol
  nil
end

#emit(token = @state.token) ⇒ Object



1070
1071
1072
1073
1074
# File 'lib/antlr3/recognizers.rb', line 1070

def emit( token = @state.token )
  token ||= create_token
  @state.token = token
  return token
end

#error_message(e) ⇒ Object



1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
# File 'lib/antlr3/recognizers.rb', line 1141

def error_message( e )
  char = character_error_display( e.symbol ) rescue nil
  case e
  when Error::MismatchedToken
    expecting = character_error_display( e.expecting )
    "mismatched character #{ char }; expecting #{ expecting }"
  when Error::NoViableAlternative
    "no viable alternative at character #{ char }"
  when Error::EarlyExit
    "required ( ... )+ loop did not match anything at character #{ char }"
  when Error::MismatchedNotSet
    "mismatched character %s; expecting set %p" % [ char, e.expecting ]
  when Error::MismatchedSet
    "mismatched character %s; expecting set %p" % [ char, e.expecting ]
  when Error::MismatchedRange
    a = character_error_display( e.min )
    b = character_error_display( e.max )
    "mismatched character %s; expecting set %s..%s" % [ char, a, b ]
  else super
  end
end

#exhaustObject



1056
1057
1058
# File 'lib/antlr3/recognizers.rb', line 1056

def exhaust
  self.to_a
end

#lineObject



1116
1117
1118
# File 'lib/antlr3/recognizers.rb', line 1116

def line
  @input.line
end

#match(expected) ⇒ Object



1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
# File 'lib/antlr3/recognizers.rb', line 1076

def match( expected )
  case expected
  when String
    expected.each_byte do |char|
      unless @input.peek == char
        @state.backtracking > 0 and raise BacktrackingFailed
        error = MismatchedToken( char )
        recover( error )
        raise error
      end
      @input.consume()
    end
  else # single integer character
    unless @input.peek == expected
      @state.backtracking > 0 and raise BacktrackingFailed
      error = MismatchedToken( expected )
      recover( error )
      raise error
    end
    @input.consume
  end
  return true
end

#match_anyObject



1100
1101
1102
# File 'lib/antlr3/recognizers.rb', line 1100

def match_any
  @input.consume
end

#match_range(min, max) ⇒ Object



1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
# File 'lib/antlr3/recognizers.rb', line 1104

def match_range( min, max )
  char = @input.peek
  if char.between?( min, max ) then @input.consume
  else
    @state.backtracking > 0 and raise BacktrackingFailed
    error = MismatchedRange( min.chr, max.chr )
    recover( error )
    raise( error )
  end
  return true
end

#next_tokenObject



1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
# File 'lib/antlr3/recognizers.rb', line 1023

def next_token
  loop do
    @state.token = nil
    @state.channel = DEFAULT_CHANNEL
    @state.token_start_position = @input.index
    @state.token_start_column = @input.column
    @state.token_start_line = @input.line
    @state.text = nil
    @input.peek == EOF and return EOF_TOKEN
    begin
      token!
      
      case token = @state.token
      when nil then return( emit )
      when SKIP_TOKEN then next
      else
        return token
      end
    rescue NoViableAlternative => re
      report_error( re )
      recover( re )
    rescue Error::RecognitionError => re
      report_error( re )
    end
  end
end

#recover(re) ⇒ Object



1171
1172
1173
# File 'lib/antlr3/recognizers.rb', line 1171

def recover( re )
  @input.consume
end

#report_error(e) ⇒ Object



1137
1138
1139
# File 'lib/antlr3/recognizers.rb', line 1137

def report_error( e )
  display_recognition_error( e )
end

#skipObject



1050
1051
1052
# File 'lib/antlr3/recognizers.rb', line 1050

def skip
  @state.token = SKIP_TOKEN
end

#source_nameObject



1066
1067
1068
# File 'lib/antlr3/recognizers.rb', line 1066

def source_name
  @input.source_name
end

#textObject



1128
1129
1130
1131
# File 'lib/antlr3/recognizers.rb', line 1128

def text
  @state.text and return @state.text
  @input.substring( @state.token_start_position, character_index - 1 )
end

#text=(text) ⇒ Object



1133
1134
1135
# File 'lib/antlr3/recognizers.rb', line 1133

def text=( text )
  @state.text = text
end