Class: Dhaka::Tokenizer
- Inherits:
-
Object
- Object
- Dhaka::Tokenizer
- Defined in:
- lib/tokenizer/tokenizer.rb
Overview
This class contains a DSL for specifying tokenizers. Subclass it to implement tokenizers for specific grammars. Subclasses of this class may not be further subclassed.
Tokenizers are state machines that are specified pretty much by hand. Each state of a tokenizer is identified by a Ruby symbol. The constant Dhaka::TOKENIZER_IDLE_STATE is reserved for the idle state of the tokenizer (the one that it starts in).
Instance Attribute Summary collapse
-
#accumulator ⇒ Object
A slot that can be used to accumulate characters when processing multi-character tokens.
-
#tokens ⇒ Object
readonly
The tokens shifted so far.
Class Method Summary collapse
-
.for_state(state_name, &blk) ⇒ Object
Define the action for the state named
state_name
. -
.tokenize(input) ⇒ Object
Tokenizes a string
input
and returns an array of Token-s.
Instance Method Summary collapse
-
#advance ⇒ Object
Advance to the next character.
-
#curr_char ⇒ Object
The character currently being processed.
-
#initialize(input) ⇒ Tokenizer
constructor
:nodoc:.
-
#run ⇒ Object
:nodoc:.
-
#switch_to(state_name) ⇒ Object
Change the active state of the tokenizer to the state identified by the symbol
state_name
.
Constructor Details
#initialize(input) ⇒ Tokenizer
:nodoc:
66 67 68 69 70 71 |
# File 'lib/tokenizer/tokenizer.rb', line 66 def initialize(input) #:nodoc: @input = input @current_state = self.class.states[TOKENIZER_IDLE_STATE] @curr_char_index = 0 @tokens = [] end |
Instance Attribute Details
#accumulator ⇒ Object
A slot that can be used to accumulate characters when processing multi-character tokens.
62 63 64 |
# File 'lib/tokenizer/tokenizer.rb', line 62 def accumulator @accumulator end |
#tokens ⇒ Object (readonly)
The tokens shifted so far.
64 65 66 |
# File 'lib/tokenizer/tokenizer.rb', line 64 def tokens @tokens end |
Class Method Details
.for_state(state_name, &blk) ⇒ Object
Define the action for the state named state_name
.
52 53 54 |
# File 'lib/tokenizer/tokenizer.rb', line 52 def self.for_state(state_name, &blk) states[state_name].instance_eval(&blk) end |
.tokenize(input) ⇒ Object
Tokenizes a string input
and returns an array of Token-s.
57 58 59 |
# File 'lib/tokenizer/tokenizer.rb', line 57 def self.tokenize(input) self.new(input).run end |
Instance Method Details
#advance ⇒ Object
Advance to the next character.
79 80 81 |
# File 'lib/tokenizer/tokenizer.rb', line 79 def advance @curr_char_index += 1 end |
#curr_char ⇒ Object
The character currently being processed.
74 75 76 |
# File 'lib/tokenizer/tokenizer.rb', line 74 def curr_char @input[@curr_char_index] and @input[@curr_char_index].chr end |
#run ⇒ Object
:nodoc:
88 89 90 91 92 93 94 95 |
# File 'lib/tokenizer/tokenizer.rb', line 88 def run #:nodoc: while curr_char blk = @current_state.actions[curr_char] raise UnrecognizedInputCharacterException.new(@input, @curr_char_index) unless blk instance_eval(&blk) end tokens end |
#switch_to(state_name) ⇒ Object
Change the active state of the tokenizer to the state identified by the symbol state_name
.
84 85 86 |
# File 'lib/tokenizer/tokenizer.rb', line 84 def switch_to state_name @current_state = self.class.states[state_name] end |