Class: SrlRuby::Tokenizer
- Inherits:
-
Object
- Object
- SrlRuby::Tokenizer
- Defined in:
- lib/srl_ruby/tokenizer.rb
Overview
A tokenizer for the Simple Regex Language. Responsibility: break input SRL into a sequence of token objects. The tokenizer should recognize: Keywords: as, capture, letter Integer literals including single digit String literals (quote delimited) Single character literal Delimiters: parentheses '(' and ')' Separators: comma (optional)
Defined Under Namespace
Classes: ScanError
Constant Summary collapse
- PATT_CHAR_CLASS =
Returns Matches a SRL character class.
/[^,"\s]{2,}/- PATT_DIGIT_LIT =
Returns Matches single digit.
/[0-9]((?=\s|,|\))|$)/- PATT_IDENTIFIER =
Returns Matches a SRL identifier.
/[a-zA-Z_][a-zA-Z0-9_]+/- PATT_INTEGER =
Returns Matches a decimal integer. An integer has 2..* digits.
/[0-9]{2,}((?=\s|,|\))|$)/- PATT_LETTER_LIT =
Returns Matches a single letter.
/[a-zA-Z]((?=\s|,|\))|$)/- PATT_NEWLINE =
Returns Matches a new line (cross-platform).
/(?:\r\n)|\r|\n/- PATT_STR_DBL_QUOTE =
Returns Matches a text enclosed in double quotes.
/"(?:\\"|[^"])*"/- PATT_STR_SNGL_QUOTE =
Returns Matches a text enclosed in single quotes.
/'(?:\\'|[^'])*'/- PATT_WHITESPACE =
Returns Matches SRL blank(s).
/[ \t\f]+/- Lexeme2name =
Returns Mapping special single characters to symbolic names.
{ '(' => 'LPAREN', ')' => 'RPAREN', ',' => 'COMMA' }.freeze
- Keywords =
Here are all the SRL keywords (in uppercase)
%w[ ALL ALREADY AND ANY ANYTHING AS AT BACKSLASH BEGIN BETWEEN BY CAPTURE CARRIAGE CASE CHARACTER DIGIT EITHER END EXACTLY FOLLOWED FROM HAD IF INSENSITIVE LAZY LEAST LETTER LINE LITERALLY MORE MULTI MUST NEVER NEW NO NONE NOT NUMBER OF ONCE ONE OPTIONAL OR RAW RETURN STARTS TAB TIMES TO TWICE UNTIL UPPERCASE VERTICAL WHITESPACE WITH WORD ].to_h { |x| [x, x] }
Instance Attribute Summary collapse
-
#line_start ⇒ Integer
readonly
Offset of start of current line within input.
-
#lineno ⇒ Integer
readonly
Current line number.
- #scanner ⇒ StringScanner readonly
Instance Method Summary collapse
-
#initialize(source) ⇒ Tokenizer
constructor
Constructor.
-
#tokens ⇒ Array<Rley::Lexical::Token>
Returns the sequence of tokens recognized from the input text given at initialization.
Constructor Details
#initialize(source) ⇒ Tokenizer
Constructor. Initialize a tokenizer for SRL.
129 130 131 132 133 |
# File 'lib/srl_ruby/tokenizer.rb', line 129 def initialize(source) @scanner = StringScanner.new(source) @lineno = 1 @line_start = 0 end |
Instance Attribute Details
#line_start ⇒ Integer (readonly)
Returns offset of start of current line within input.
54 55 56 |
# File 'lib/srl_ruby/tokenizer.rb', line 54 def line_start @line_start end |
#lineno ⇒ Integer (readonly)
Returns current line number.
51 52 53 |
# File 'lib/srl_ruby/tokenizer.rb', line 51 def lineno @lineno end |
#scanner ⇒ StringScanner (readonly)
48 49 50 |
# File 'lib/srl_ruby/tokenizer.rb', line 48 def scanner @scanner end |
Instance Method Details
#tokens ⇒ Array<Rley::Lexical::Token>
Returns the sequence of tokens recognized from the input text given at initialization.
137 138 139 140 141 142 143 144 145 |
# File 'lib/srl_ruby/tokenizer.rb', line 137 def tokens tok_sequence = [] until @scanner.eos? token = _next_token tok_sequence << token unless token.nil? end tok_sequence end |