Class: ANTLR3::TokenScheme
- Includes:
- TokenFactory
- Defined in:
- lib/antlr3/token.rb
Overview
TokenSchemes exist to handle the problem of defining token types as integer values while maintaining meaningful text names for the types. They are dynamically defined modules that map integer values to constants with token-type names.
Fundamentally, tokens exist to take a chunk of text and identify it as belonging to some category, like “VARIABLE” or “INTEGER”. In code, the category is represented by an integer – some arbitrary value that ANTLR will decide to use as it is creating the recognizer. The purpose of using an integer (instead of say, a ruby symbol) is that ANTLR’s decision logic often needs to test whether a token’s type falls within a range, which is not possible with symbols.
The downside of token types being represented as integers is that a developer needs to be able to reference the unknown type value by name in action code. Furthermore, code that references the type by name and tokens that can be inspected with names in place of type values are more meaningful to a developer.
Since ANTLR requires token type names to follow capital-letter naming conventions, defining types as named constants of the recognizer class resolves the problem of referencing type values by name. Thus, a token type like “VARIABLE” can be represented by a number like 5 and referenced within code by VARIABLE
. However, when a recognizer creates tokens, the name of the token’s type cannot be seen without using the data defined in the recognizer.
Of course, tokens could be defined with a name attribute that could be specified when tokens are created. However, doing so would make tokens take up more space than necessary, as well as making it difficult to change the type of a token while maintaining a correct name value.
TokenSchemes exist as a technique to manage token type referencing and name extraction. They:
-
keep token type references clear and understandable in recognizer code
-
permit access to a token’s type-name independently of recognizer objects
-
allow multiple classes to share the same token information
Building Token Schemes
TokenScheme is a subclass of Module. Thus, it has the method TokenScheme.new(tk_class = nil) { ... module-level code ...}
, which will evaluate the block in the context of the scheme (module), similarly to Module#module_eval. Before evaluating the block, .new
will setup the module with the following actions:
-
define a customized token class (more on that below)
-
add a new constant, TOKEN_NAMES, which is a hash that maps types to names
-
dynamically populate the new scheme module with a couple instance methods
-
include ANTLR3::Constants in the new scheme module
As TokenScheme the class functions as a metaclass, figuring out some of the scoping behavior can be mildly confusing if you’re trying to get a handle of the entity for your own purposes. Remember that all of the instance methods of TokenScheme function as module-level methods of TokenScheme instances, ala attr_accessor
and friends.
TokenScheme#define_token(name_symbol, int_value)
adds a constant definition name_symbol
with the value int_value
. It is essentially like Module#const_set
, except it forbids constant overwriting (which would mess up recognizer code fairly badly) and adds an inverse type-to-name map to its own TOKEN_NAMES
table. TokenScheme#define_tokens
is a convenience method for defining many types with a hash pairing names to values.
TokenScheme#register_name(value, name_string)
specifies a custom type-to-name definition. This is particularly useful for the anonymous tokens that ANTLR generates for literal strings in the grammar specification. For example, if you refer to the literal '='
in some parser rule in your grammar, ANTLR will add a lexer rule for the literal and give the token a name like T__x
, where x
is the type’s integer value. Since this is pretty meaningless to a developer, generated code should add a special name definition for type value x
with the string "'='"
.
Sample TokenScheme Construction
TokenData = ANTLR3::TokenScheme.new do
define_tokens(
:INT => 4,
:ID => 6,
:T__5 => 5,
:WS => 7
)
# note the self:: scoping below is due to the fact that
# ruby lexically-scopes constant names instead of
# looking up in the current scope
register_name(self::T__5, "'='")
end
TokenData::ID # => 6
TokenData::T__5 # => 5
TokenData.token_name(4) # => 'INT'
TokenData.token_name(5) # => "'='"
class ARecognizerOrSuch < ANTLR3::Parser
include TokenData
ID # => 6
end
Custom Token Classes and Relationship with Tokens
When a TokenScheme is created, it will define a subclass of ANTLR3::CommonToken and assigned it to the constant name Token
. This token class will both include and extend the scheme module. Since token schemes define the private instance method token_name(type)
, instances of the token class are now able to provide their type names. The Token method name
uses the token_name
method to provide the type name as if it were a simple attribute without storing the name itself.
When a TokenScheme is included in a recognizer class, the class will now have the token types as named constants, a type-to-name map constant TOKEN_NAMES
, and a grammar-specific subclass of ANTLR3::CommonToken assigned to the constant Token. Thus, when recognizers need to manufacture tokens, instead of using the generic CommonToken class, they can create tokens using the customized Token class provided by the token scheme.
If you need to use a token class other than CommonToken, you can pass the class as a parameter to TokenScheme.new, which will be used in place of the dynamically-created CommonToken subclass.
Constant Summary collapse
- FETCH_KEY =
proc { | h, v | h.index( v ) }
Instance Attribute Summary collapse
-
#types ⇒ Object
readonly
Returns the value of attribute types.
-
#unused ⇒ Object
readonly
Returns the value of attribute unused.
Class Method Summary collapse
Instance Method Summary collapse
- #[](name_or_value) ⇒ Object
- #built_in_type?(type_value) ⇒ Boolean
- #define_token(name, value = nil) ⇒ Object
- #define_tokens(token_map = {}) ⇒ Object
- #register_name(type_value, name) ⇒ Object
- #register_names(*names) ⇒ Object
- #token_class ⇒ Object
- #token_class=(klass) ⇒ Object
- #token_defined?(name_or_value) ⇒ Boolean
Methods included from TokenFactory
Methods inherited from Module
Instance Attribute Details
#types ⇒ Object (readonly)
Returns the value of attribute types.
563 564 565 |
# File 'lib/antlr3/token.rb', line 563 def types @types end |
#unused ⇒ Object (readonly)
Returns the value of attribute unused.
563 564 565 |
# File 'lib/antlr3/token.rb', line 563 def unused @unused end |
Class Method Details
.build(*token_names) ⇒ Object
539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 |
# File 'lib/antlr3/token.rb', line 539 def self.build( *token_names ) token_names = [ token_names ].flatten! token_names.compact! token_names.uniq! tk_class = Class === token_names.first ? token_names.shift : nil value_maps, names = token_names.partition { |i| Hash === i } new( tk_class ) do for value_map in value_maps define_tokens( value_map ) end for name in names define_token( name ) end end end |
.new(tk_class = nil, &body) ⇒ Object
511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 |
# File 'lib/antlr3/token.rb', line 511 def self.new( tk_class = nil, &body ) super() do tk_class ||= Class.new( ::ANTLR3::CommonToken ) self.token_class = tk_class const_set( :TOKEN_NAMES, ::ANTLR3::Constants::BUILT_IN_TOKEN_NAMES.clone ) @types = ::ANTLR3::Constants::BUILT_IN_TOKEN_NAMES.invert @unused = ::ANTLR3::Constants::MIN_TOKEN_TYPE scheme = self define_method( :token_scheme ) { scheme } define_method( :token_names ) { scheme::TOKEN_NAMES } define_method( :token_name ) do |type| begin token_names[ type ] or super rescue NoMethodError ::ANTLR3::CommonToken.token_name( type ) end end module_function :token_name, :token_names include ANTLR3::Constants body and module_eval( &body ) end end |
Instance Method Details
#[](name_or_value) ⇒ Object
649 650 651 652 653 654 |
# File 'lib/antlr3/token.rb', line 649 def []( name_or_value ) case name_or_value when Integer then token_names.fetch( name_or_value, nil ) else const_get( name_or_value.to_s ) rescue FETCH_KEY.call( token_names, name_or_value ) end end |
#built_in_type?(type_value) ⇒ Boolean
638 639 640 |
# File 'lib/antlr3/token.rb', line 638 def built_in_type?( type_value ) Constants::BUILT_IN_TOKEN_NAMES.fetch( type_value, false ) and true end |
#define_token(name, value = nil) ⇒ Object
572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 |
# File 'lib/antlr3/token.rb', line 572 def define_token( name, value = nil ) name = name.to_s if current_value = @types[ name ] # token type has already been defined # raise an error unless value is the same as the current value value ||= current_value unless current_value == value raise NameError.new( "new token type definition ``#{ name } = #{ value }'' conflicts " << "with existing type definition ``#{ name } = #{ current_value }''", name ) end else value ||= @unused if name =~ /^[A-Z]\w*$/ const_set( name, @types[ name ] = value ) else constant = "T__#{ value }" const_set( constant, @types[ constant ] = value ) @types[ name ] = value end register_name( value, name ) unless built_in_type?( value ) end value >= @unused and @unused = value + 1 return self end |
#define_tokens(token_map = {}) ⇒ Object
565 566 567 568 569 570 |
# File 'lib/antlr3/token.rb', line 565 def define_tokens( token_map = {} ) for token_name, token_value in token_map define_token( token_name, token_value ) end return self end |
#register_name(type_value, name) ⇒ Object
614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 |
# File 'lib/antlr3/token.rb', line 614 def register_name( type_value, name ) name = name.to_s.freeze if token_names.has_key?( type_value ) current_name = token_names[ type_value ] current_name == name and return name if current_name == "T__#{ type_value }" # only an anonymous name is registered -- upgrade the name to the full literal name token_names[ type_value ] = name elsif name == "T__#{ type_value }" # ignore name downgrade from literal to anonymous constant return current_name else error = NameError.new( "attempted assignment of token type #{ type_value }" << " to name #{ name } conflicts with existing name #{ current_name }", name ) raise error end else token_names[ type_value ] = name.to_s.freeze end end |
#register_names(*names) ⇒ Object
601 602 603 604 605 606 607 608 609 610 611 612 |
# File 'lib/antlr3/token.rb', line 601 def register_names( *names ) if names.length == 1 and Hash === names.first names.first.each do |value, name| register_name( value, name ) end else names.each_with_index do |name, i| type_value = Constants::MIN_TOKEN_TYPE + i register_name( type_value, name ) end end end |
#token_class ⇒ Object
656 657 658 |
# File 'lib/antlr3/token.rb', line 656 def token_class self::Token end |
#token_class=(klass) ⇒ Object
660 661 662 663 664 665 666 |
# File 'lib/antlr3/token.rb', line 660 def token_class=( klass ) Class === klass or raise( TypeError, "token_class must be a Class" ) Util.silence_warnings do klass < self or klass.send( :include, self ) const_set( :Token, klass ) end end |
#token_defined?(name_or_value) ⇒ Boolean
642 643 644 645 646 647 |
# File 'lib/antlr3/token.rb', line 642 def token_defined?( name_or_value ) case value when Integer then token_names.has_key?( name_or_value ) else const_defined?( name_or_value.to_s ) end end |