Class: ANTLR3::Lexer

Inherits:

Recognizer

Object
Recognizer
ANTLR3::Lexer

show all

Includes:: TokenSource

Defined in:: lib/antlr3/recognizers.rb

Overview

Lexer

Lexer is the default superclass of all lexers generated by ANTLR. The class tailors the core functionality provided by Recognizer to the task of matching patterns in the text input and breaking the input into tokens.

About Lexers

A lexer’s job is to take input text and break it up into tokens – objects that encapsulate a piece of text, a type label (such as ID or INTEGER), and the position of the text with respect to the input. Thus, a lexer is essentially a complicated iterator that steps through an input stream and produces a sequence of tokens. Sometimes lexers are enough to carry out a goal on their own, such as tasks like source code highlighting and simple code analysis. Usually, however, the lexer converts text into tokens for use by a parser, which recognizes larger structures within the text.

ANTLR parsers have a variety of entry points specified by parser rules, each of which defines the structure of a specific type of sentence in a grammar. Lexers, however, are primarily intended to have a single entry point. It looks at the characters starting at the current input position, decides if the chunk of text matches one of a number of possible token type definitions, wraps the chunk into a token with information on its type and location, and advances the input stream to the next place.

ANTLR Lexers and the Lexer API

ANTLR-generated lexers will subclass this class, unless specified otherwise within a grammar file. The generated class will provide an implementation of each lexer rule as a method of the same name. The subclass will also provide an implementation for the abstract method #m_tokens, the purpose of which is to multiplex the token type definitions and predict what rule definition to execute to fetch a token. The primary method in the lexer API, #next_token, uses #m_tokens to fetch the next token and drive the iteration.

If the lexer is preparing tokens for use by an ANTLR generated parser, the lexer will generally be used to build a TokenStream object. The following code example demonstrates the typical setup for using ANTLR parsers and lexers in Ruby.

# in HypotheticalLexer.rb
module Hypothetical
class Lexer < ANTLR3::Lexer
  # ...
  # ANTLR generated code
  # ...
end
end

# in HypotheticalParser.rb
module Hypothetical
class Parser < ANTLR3::Parser
  # ...
  # more ANTLR generated code
  # ...
end
end

# to take hypothetical source code and prepare it for parsing,
# there is generally a four-step construction process

source = "some hypothetical source code"
input = ANTLR3::StringStream.new(source, :file => 'blah-de-blah.hyp')
lexer = Hypothetical::Lexer.new( input )
tokens = ANTLR3::CommonTokenStream.new( lexer )
parser = Hypothetical::Parser.new( tokens )

# if you're using the standard streams, ANTLR3::StringStream and
# ANTLR3::CommonTokenStream, you can write the same process 
# shown above more succinctly:

lexer  = Hypothetical::Lexer.new("some hypothetical source code", :file => 'blah-de-blah.hyp')
parser = Hypothetical::Parser.new( lexer )

Direct Known Subclasses

Template::GroupFile::Lexer

Constant Summary

Constants included from Constants

Constants::BUILT_IN_TOKEN_NAMES, Constants::DEFAULT, Constants::DOWN, Constants::EOF, Constants::EOF_TOKEN, Constants::EOR_TOKEN_TYPE, Constants::HIDDEN, Constants::INVALID, Constants::INVALID_NODE, Constants::INVALID_TOKEN, Constants::MEMO_RULE_FAILED, Constants::MEMO_RULE_UNKNOWN, Constants::MIN_TOKEN_TYPE, Constants::SKIP_TOKEN, Constants::UP

Instance Attribute Summary

Attributes inherited from Recognizer

#input, #state

Attributes included from TokenFactory

#token_class

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(input, options = {}) ⇒ `Lexer`

Returns a new instance of Lexer.

# File 'lib/antlr3/recognizers.rb', line 1014

def initialize( input, options = {} )
  super( options )
  @input = cast_input( input, options )
end

Class Method Details

.associated_parser ⇒ `Object`

# File 'lib/antlr3/recognizers.rb', line 1001

def self.associated_parser
  @associated_parser ||= begin
    @grammar_home and @grammar_home::Parser
  rescue NameError
    grammar_name = @grammar_home.name.split( "::" ).last
    begin
      require "#{ grammar_name }Parser"
      @grammar_home::Parser
    rescue LoadError, NameError
    end
  end
end

.default_rule ⇒ `Object`



991
992
993

# File 'lib/antlr3/recognizers.rb', line 991

def self.default_rule
  @default_rule ||= :token!
end

.main(argv = ARGV, options = {}) ⇒ `Object`

# File 'lib/antlr3/recognizers.rb', line 995

def self.main( argv = ARGV, options = {} )
  if argv.is_a?( ::Hash ) then argv, options = ARGV, argv end
  main = ANTLR3::Main::LexerMain.new( self, options )
  block_given? ? yield( main ) : main.execute( argv )
end

Instance Method Details

#char_stream=(input) ⇒ `Object` Also known as: input=

# File 'lib/antlr3/recognizers.rb', line 1060

def char_stream=( input )
  @input = nil
  reset()
  @input = input
end

#character_error_display(char) ⇒ `Object`

# File 'lib/antlr3/recognizers.rb', line 1163

def character_error_display( char )
  case char
  when EOF then '<EOF>'
  when Integer then char.chr.inspect
  else char.inspect
  end
end

#character_index ⇒ `Object`



1124
1125
1126

# File 'lib/antlr3/recognizers.rb', line 1124

def character_index
  @input.index
end

#column ⇒ `Object`



1120
1121
1122

# File 'lib/antlr3/recognizers.rb', line 1120

def column
  @input.column
end

#current_symbol ⇒ `Object`



1019
1020
1021

# File 'lib/antlr3/recognizers.rb', line 1019

def current_symbol
  nil
end

#emit(token = @state.token) ⇒ `Object`

# File 'lib/antlr3/recognizers.rb', line 1070

def emit( token = @state.token )
  token ||= create_token
  @state.token = token
  return token
end

#error_message(e) ⇒ `Object`

# File 'lib/antlr3/recognizers.rb', line 1141

def error_message( e )
  char = character_error_display( e.symbol ) rescue nil
  case e
  when Error::MismatchedToken
    expecting = character_error_display( e.expecting )
    "mismatched character #{ char }; expecting #{ expecting }"
  when Error::NoViableAlternative
    "no viable alternative at character #{ char }"
  when Error::EarlyExit
    "required ( ... )+ loop did not match anything at character #{ char }"
  when Error::MismatchedNotSet
    "mismatched character %s; expecting set %p" % [ char, e.expecting ]
  when Error::MismatchedSet
    "mismatched character %s; expecting set %p" % [ char, e.expecting ]
  when Error::MismatchedRange
    a = character_error_display( e.min )
    b = character_error_display( e.max )
    "mismatched character %s; expecting set %s..%s" % [ char, a, b ]
  else super
  end
end

#exhaust ⇒ `Object`



1056
1057
1058

# File 'lib/antlr3/recognizers.rb', line 1056

def exhaust
  self.to_a
end

#line ⇒ `Object`



1116
1117
1118

# File 'lib/antlr3/recognizers.rb', line 1116

def line
  @input.line
end

#match(expected) ⇒ `Object`

# File 'lib/antlr3/recognizers.rb', line 1076

def match( expected )
  case expected
  when String
    expected.each_byte do |char|
      unless @input.peek == char
        @state.backtracking > 0 and raise BacktrackingFailed
        error = MismatchedToken( char )
        recover( error )
        raise error
      end
      @input.consume()
    end
  else # single integer character
    unless @input.peek == expected
      @state.backtracking > 0 and raise BacktrackingFailed
      error = MismatchedToken( expected )
      recover( error )
      raise error
    end
    @input.consume
  end
  return true
end

#match_any ⇒ `Object`



1100
1101
1102

# File 'lib/antlr3/recognizers.rb', line 1100

def match_any
  @input.consume
end

#match_range(min, max) ⇒ `Object`

# File 'lib/antlr3/recognizers.rb', line 1104

def match_range( min, max )
  char = @input.peek
  if char.between?( min, max ) then @input.consume
  else
    @state.backtracking > 0 and raise BacktrackingFailed
    error = MismatchedRange( min.chr, max.chr )
    recover( error )
    raise( error )
  end
  return true
end

#next_token ⇒ `Object`

# File 'lib/antlr3/recognizers.rb', line 1023

def next_token
  loop do
    @state.token = nil
    @state.channel = DEFAULT_CHANNEL
    @state.token_start_position = @input.index
    @state.token_start_column = @input.column
    @state.token_start_line = @input.line
    @state.text = nil
    @input.peek == EOF and return EOF_TOKEN
    begin
      token!
      
      case token = @state.token
      when nil then return( emit )
      when SKIP_TOKEN then next
      else
        return token
      end
    rescue NoViableAlternative => re
      report_error( re )
      recover( re )
    rescue Error::RecognitionError => re
      report_error( re )
    end
  end
end

#recover(re) ⇒ `Object`



1171
1172
1173

# File 'lib/antlr3/recognizers.rb', line 1171

def recover( re )
  @input.consume
end

#report_error(e) ⇒ `Object`



1137
1138
1139

# File 'lib/antlr3/recognizers.rb', line 1137

def report_error( e )
  display_recognition_error( e )
end

#skip ⇒ `Object`



1050
1051
1052

# File 'lib/antlr3/recognizers.rb', line 1050

def skip
  @state.token = SKIP_TOKEN
end

#source_name ⇒ `Object`



1066
1067
1068

# File 'lib/antlr3/recognizers.rb', line 1066

def source_name
  @input.source_name
end

#text ⇒ `Object`

# File 'lib/antlr3/recognizers.rb', line 1128

def text
  @state.text and return @state.text
  @input.substring( @state.token_start_position, character_index - 1 )
end

#text=(text) ⇒ `Object`



1133
1134
1135

# File 'lib/antlr3/recognizers.rb', line 1133

def text=( text )
  @state.text = text
end

Class: ANTLR3::Lexer

Overview

Lexer

About Lexers

ANTLR Lexers and the Lexer API

Direct Known Subclasses

Constant Summary

Constants included from Constants

Instance Attribute Summary

Attributes inherited from Recognizer

Attributes included from TokenFactory

Class Method Summary collapse

Instance Method Summary collapse

Methods included from TokenSource

Methods inherited from Recognizer

Methods included from Error

Constructor Details

#initialize(input, options = {}) ⇒ Lexer

Class Method Details

.associated_parser ⇒ Object

.default_rule ⇒ Object

.main(argv = ARGV, options = {}) ⇒ Object

Instance Method Details

#char_stream=(input) ⇒ Object Also known as: input=

#character_error_display(char) ⇒ Object

#character_index ⇒ Object

#column ⇒ Object

#current_symbol ⇒ Object

#emit(token = @state.token) ⇒ Object

#error_message(e) ⇒ Object

#exhaust ⇒ Object

#line ⇒ Object

#match(expected) ⇒ Object

#match_any ⇒ Object

#match_range(min, max) ⇒ Object

#next_token ⇒ Object

#recover(re) ⇒ Object

#report_error(e) ⇒ Object

#skip ⇒ Object

#source_name ⇒ Object

#text ⇒ Object

#text=(text) ⇒ Object