Module: ANTLR3::Stream
- Extended by:
- ClassMacros
- Includes:
- Constants
- Included in:
- AST::TreeNodeStream, CharacterStream, TokenStream
- Defined in:
- lib/antlr3/streams.rb
Overview
ANTLR3 Streams
This documentation first covers the general concept of streams as used by ANTLR recognizers, and then discusses the specific ANTLR3::Stream
module.
ANTLR Stream Classes
ANTLR recognizers need a way to walk through input data in a serialized IO-style fashion. They also need some book-keeping about the input to provide useful information to developers, such as current line number and column. Furthermore, to implement backtracking and various error recovery techniques, recognizers need a way to record various locations in the input at a number of points in the recognition process so the input state may be restored back to a prior state.
ANTLR bundles all of this functionality into a number of Stream classes, each designed to be used by recognizers for a specific recognition task. Most of the Stream hierarchy is implemented in antlr3/stream.rb, which is loaded by default when ‘antlr3’ is required.
Here’s a brief overview of the various stream classes and their respective purpose:
- StringStream
-
Similar to StringIO from the standard Ruby library, StringStream wraps raw String data in a Stream interface for use by ANTLR lexers.
- FileStream
-
A subclass of StringStream, FileStream simply wraps data read from an IO or File object for use by lexers.
- CommonTokenStream
-
The job of a TokenStream is to read lexer output and then provide ANTLR parsers with the means to sequential walk through series of tokens. CommonTokenStream is the default TokenStream implementation.
- TokenRewriteStream
-
A subclass of CommonTokenStream, TokenRewriteStreams provide rewriting-parsers the ability to produce new output text from an input token-sequence by managing rewrite “programs” on top of the stream.
- CommonTreeNodeStream
-
In a similar fashion to CommonTokenStream, CommonTreeNodeStream feeds tokens to recognizers in a sequential fashion. However, the stream object serializes an Abstract Syntax Tree into a flat, one-dimensional sequence, but preserves the two-dimensional shape of the tree using special UP and DOWN tokens. The sequence is primarily used by ANTLR Tree Parsers. note – this is not defined in antlr3/stream.rb, but antlr3/tree.rb
The next few sections cover the most significant methods of all stream classes.
consume / look / peek
stream.consume
is used to advance a stream one unit. StringStreams are advanced by one character and TokenStreams are advanced by one token.
stream.peek(k = 1)
is used to quickly retrieve the object of interest to a recognizer at look-ahead position specified by k
. For StringStreams, this is the integer value of the character k
characters ahead of the stream cursor. For TokenStreams, this is the integer token type of the token k
tokens ahead of the stream cursor.
stream.look(k = 1)
is used to retrieve the full object of interest at look-ahead position specified by k
. While peek
provides the bare-minimum lightweight information that the recognizer needs, look
provides the full object of concern in the stream. For StringStreams, this is a string object containing the single character k
characters ahead of the stream cursor. For TokenStreams, this is the full token structure k
tokens ahead of the stream cursor.
Note: in most ANTLR runtime APIs for other languages, peek
is implemented by some method with a name like LA(k)
and look
is implemented by some method with a name like LT(k)
. When writing this Ruby runtime API, I found this naming practice both confusing, ambiguous, and un-Ruby-like. Thus, I chose peek
and look
to represent a quick-look (peek) and a full-fledged look-ahead operation (look). If this causes confusion or any sort of compatibility strife for developers using this implementation, all apologies.
mark / rewind / release
marker = stream.mark
causes the stream to record important information about the current stream state, place the data in an internal memory table, and return a memento, marker
. The marker object is typically an integer key to the stream’s internal memory table.
Used in tandem with, stream.rewind(mark = last_marker)
, the marker can be used to restore the stream to an earlier state. This is used by recognizers to perform tasks such as backtracking and error recovery.
stream.release(marker = last_marker)
can be used to release an existing state marker from the memory table.
seek
stream.seek(position)
moves the stream cursor to an absolute position within the stream, basically like typical ruby IO#seek
style methods. However, unlike IO#seek
, ANTLR streams currently always use absolute position seeking.
The Stream Module
ANTLR3::Stream
is an abstract-ish base mixin for all IO-like stream classes used by ANTLR recognizers.
The module doesn’t do much on its own besides define arguably annoying “abstract” pseudo-methods that demand implementation when it is mixed in to a class that wants to be a Stream. Right now this exists as an artifact of porting the ANTLR Java/Python runtime library to Ruby. In Java, of course, this is represented as an interface. In Ruby, however, objects are duck-typed and interfaces aren’t that useful as programmatic entities – in fact, it’s mildly wasteful to have a module like this hanging out. Thus, I may axe it.
When mixed in, it does give the class a #size and #source_name attribute methods.
Except in a small handful of places, most of the ANTLR runtime library uses duck-typing and not type checking on objects. This means that the methods which manipulate stream objects don’t usually bother checking that the object is a Stream and assume that the object implements the proper stream interface. Thus, it is not strictly necessary that custom stream objects include ANTLR3::Stream, though it isn’t a bad idea.
Constant Summary
Constants included from Constants
Constants::BUILT_IN_TOKEN_NAMES, Constants::DEFAULT, Constants::DOWN, Constants::EOF, Constants::EOF_TOKEN, Constants::EOR_TOKEN_TYPE, Constants::HIDDEN, Constants::INVALID, Constants::INVALID_NODE, Constants::INVALID_TOKEN, Constants::MEMO_RULE_FAILED, Constants::MEMO_RULE_UNKNOWN, Constants::MIN_TOKEN_TYPE, Constants::SKIP_TOKEN, Constants::UP
Instance Attribute Summary collapse
-
#size ⇒ Object
readonly
the total number of symbols in the stream.
-
#source_name ⇒ Object
indicates an identifying name for the stream – usually the file path of the input.
Instance Method Summary collapse
-
#consume ⇒ Object
:method: consume used to advance a stream one unit (such as character or token).
-
#index ⇒ Object
:method: index returns the current position of the stream.
-
#look ⇒ Object
:method: look( k = 1 ) used to retreive the full object of interest at lookahead position specified by
k
(such as a character string or a token structure). -
#mark ⇒ Object
:method: mark saves the current position for the purposes of backtracking and returns a value to pass to #rewind at a later time.
-
#peek ⇒ Object
:method: peek( k = 1 ) used to quickly retreive the object of interest to a recognizer at lookahead position specified by
k
(such as integer value of a character or an integer token type). -
#release ⇒ Object
:method: release( marker = last_marker ) clears the saved state information associated with the given marker value.
-
#rewind ⇒ Object
:method: rewind( marker = last_marker ) restores the stream position using the state information previously saved by the given marker.
-
#seek ⇒ Object
:method: seek( position ) move the stream to the given absolute index given by
position
.
Instance Attribute Details
#size ⇒ Object (readonly)
the total number of symbols in the stream
217 218 219 |
# File 'lib/antlr3/streams.rb', line 217 def size @size end |
#source_name ⇒ Object
indicates an identifying name for the stream – usually the file path of the input
221 222 223 |
# File 'lib/antlr3/streams.rb', line 221 def source_name @source_name end |
Instance Method Details
#consume ⇒ Object
:method: consume used to advance a stream one unit (such as character or token)
173 |
# File 'lib/antlr3/streams.rb', line 173 abstract :consume |
#index ⇒ Object
:method: index returns the current position of the stream
197 |
# File 'lib/antlr3/streams.rb', line 197 abstract :index |
#look ⇒ Object
:method: look( k = 1 ) used to retreive the full object of interest at lookahead position specified by k
(such as a character string or a token structure)
186 |
# File 'lib/antlr3/streams.rb', line 186 abstract :look |
#mark ⇒ Object
:method: mark saves the current position for the purposes of backtracking and returns a value to pass to #rewind at a later time
192 |
# File 'lib/antlr3/streams.rb', line 192 abstract :mark |
#peek ⇒ Object
:method: peek( k = 1 ) used to quickly retreive the object of interest to a recognizer at lookahead position specified by k
(such as integer value of a character or an integer token type)
180 |
# File 'lib/antlr3/streams.rb', line 180 abstract :peek |
#release ⇒ Object
:method: release( marker = last_marker ) clears the saved state information associated with the given marker value
208 |
# File 'lib/antlr3/streams.rb', line 208 abstract :release |
#rewind ⇒ Object
:method: rewind( marker = last_marker ) restores the stream position using the state information previously saved by the given marker
203 |
# File 'lib/antlr3/streams.rb', line 203 abstract :rewind |
#seek ⇒ Object
:method: seek( position ) move the stream to the given absolute index given by position
213 |
# File 'lib/antlr3/streams.rb', line 213 abstract :seek |