Module: ANTLR3::Stream

Extended by:
ClassMacros
Includes:
Constants
Included in:
AST::TreeNodeStream, CharacterStream, TokenStream
Defined in:
lib/antlr3/streams.rb

Overview

ANTLR3 Streams

This documentation first covers the general concept of streams as used by ANTLR recognizers, and then discusses the specific ANTLR3::Stream module.

ANTLR Stream Classes

ANTLR recognizers need a way to walk through input data in a serialized IO-style fashion. They also need some book-keeping about the input to provide useful information to developers, such as current line number and column. Furthermore, to implement backtracking and various error recovery techniques, recognizers need a way to record various locations in the input at a number of points in the recognition process so the input state may be restored back to a prior state.

ANTLR bundles all of this functionality into a number of Stream classes, each designed to be used by recognizers for a specific recognition task. Most of the Stream hierarchy is implemented in antlr3/stream.rb, which is loaded by default when 'antlr3' is required.


Here's a brief overview of the various stream classes and their respective purpose:

StringStream

Similar to StringIO from the standard Ruby library, StringStream wraps raw String data in a Stream interface for use by ANTLR lexers.

FileStream

A subclass of StringStream, FileStream simply wraps data read from an IO or File object for use by lexers.

CommonTokenStream

The job of a TokenStream is to read lexer output and then provide ANTLR parsers with the means to sequential walk through series of tokens. CommonTokenStream is the default TokenStream implementation.

TokenRewriteStream

A subclass of CommonTokenStream, TokenRewriteStreams provide rewriting-parsers the ability to produce new output text from an input token-sequence by managing rewrite "programs" on top of the stream.

CommonTreeNodeStream

In a similar fashion to CommonTokenStream, CommonTreeNodeStream feeds tokens to recognizers in a sequential fashion. However, the stream object serializes an Abstract Syntax Tree into a flat, one-dimensional sequence, but preserves the two-dimensional shape of the tree using special UP and DOWN tokens. The sequence is primarily used by ANTLR Tree Parsers. note -- this is not defined in antlr3/stream.rb, but antlr3/tree.rb


The next few sections cover the most significant methods of all stream classes.

consume / look / peek

stream.consume is used to advance a stream one unit. StringStreams are advanced by one character and TokenStreams are advanced by one token.

stream.peek(k = 1) is used to quickly retrieve the object of interest to a recognizer at look-ahead position specified by k. For StringStreams, this is the integer value of the character k characters ahead of the stream cursor. For TokenStreams, this is the integer token type of the token k tokens ahead of the stream cursor.

stream.look(k = 1) is used to retrieve the full object of interest at look-ahead position specified by k. While peek provides the bare-minimum lightweight information that the recognizer needs, look provides the full object of concern in the stream. For StringStreams, this is a string object containing the single character k characters ahead of the stream cursor. For TokenStreams, this is the full token structure k tokens ahead of the stream cursor.

Note: in most ANTLR runtime APIs for other languages, peek is implemented by some method with a name like LA(k) and look is implemented by some method with a name like LT(k). When writing this Ruby runtime API, I found this naming practice both confusing, ambiguous, and un-Ruby-like. Thus, I chose peek and look to represent a quick-look (peek) and a full-fledged look-ahead operation (look). If this causes confusion or any sort of compatibility strife for developers using this implementation, all apologies.

mark / rewind / release

marker = stream.mark causes the stream to record important information about the current stream state, place the data in an internal memory table, and return a memento, marker. The marker object is typically an integer key to the stream's internal memory table.

Used in tandem with, stream.rewind(mark = last_marker), the marker can be used to restore the stream to an earlier state. This is used by recognizers to perform tasks such as backtracking and error recovery.

stream.release(marker = last_marker) can be used to release an existing state marker from the memory table.

seek

stream.seek(position) moves the stream cursor to an absolute position within the stream, basically like typical ruby IO#seek style methods. However, unlike IO#seek, ANTLR streams currently always use absolute position seeking.

The Stream Module

ANTLR3::Stream is an abstract-ish base mixin for all IO-like stream classes used by ANTLR recognizers.

The module doesn't do much on its own besides define arguably annoying "abstract'' pseudo-methods that demand implementation when it is mixed in to a class that wants to be a Stream. Right now this exists as an artifact of porting the ANTLR Java/Python runtime library to Ruby. In Java, of course, this is represented as an interface. In Ruby, however, objects are duck-typed and interfaces aren't that useful as programmatic entities -- in fact, it's mildly wasteful to have a module like this hanging out. Thus, I may axe it.

When mixed in, it does give the class a #size and #source_name attribute methods.

Except in a small handful of places, most of the ANTLR runtime library uses duck-typing and not type checking on objects. This means that the methods which manipulate stream objects don't usually bother checking that the object is a Stream and assume that the object implements the proper stream interface. Thus, it is not strictly necessary that custom stream objects include ANTLR3::Stream, though it isn't a bad idea.

Constant Summary

Constant Summary

Constants included from Constants

Constants::BUILT_IN_TOKEN_NAMES, Constants::DEFAULT, Constants::DOWN, Constants::EOF, Constants::EOF_TOKEN, Constants::EOR_TOKEN_TYPE, Constants::HIDDEN, Constants::INVALID_TOKEN, Constants::INVALID_TOKEN_TYPE, Constants::MEMO_RULE_FAILED, Constants::MEMO_RULE_UNKNOWN, Constants::MIN_TOKEN_TYPE, Constants::SKIP_TOKEN, Constants::UP

Instance Attribute Summary collapse

Instance Attribute Details

#sizeObject (readonly)

the total number of symbols in the stream



217
218
219
# File 'lib/antlr3/streams.rb', line 217

def size
  @size
end

#source_nameObject

indicates an identifying name for the stream -- usually the file path of the input



221
222
223
# File 'lib/antlr3/streams.rb', line 221

def source_name
  @source_name
end