Class: SrlRuby::Tokenizer
- Inherits:
-
Object
- Object
- SrlRuby::Tokenizer
- Defined in:
- lib/srl_ruby/tokenizer.rb
Overview
A tokenizer for the Simple Regex Language. Responsibility: break input SRL into a sequence of token objects. The tokenizer should recognize: Keywords: as, capture, letter Integer literals including single digit String literals (quote delimited) Single character literal Delimiters: parentheses ‘(‘ and ‘)’ Separators: comma (optional)
Defined Under Namespace
Classes: ScanError
Constant Summary collapse
- PATT_CHAR_CLASS =
/[^,"\s]{2,}/.freeze
- PATT_DIGIT_LIT =
/[0-9]((?=\s|,|\))|$)/.freeze
- PATT_IDENTIFIER =
/[a-zA-Z_][a-zA-Z0-9_]+/.freeze
- PATT_INTEGER =
An integer has 2..* digits
/[0-9]{2,}((?=\s|,|\))|$)/.freeze
- PATT_LETTER_LIT =
/[a-zA-Z]((?=\s|,|\))|$)/.freeze
- PATT_NEWLINE =
/(?:\r\n)|\r|\n/.freeze
- PATT_STR_DBL_QUOTE =
Double quotes literal?
/"(?:\\"|[^"])*"/.freeze
- PATT_STR_SNGL_QUOTE =
Single quotes literal?
/'(?:\\'|[^'])*'/.freeze
- PATT_WHITESPACE =
/[ \t\f]+/.freeze
- Lexeme2name =
{ '(' => 'LPAREN', ')' => 'RPAREN', ',' => 'COMMA' }.freeze
- Keywords =
Here are all the SRL keywords (in uppercase)
%w[ ALL ALREADY AND ANY ANYTHING AS AT BACKSLASH BEGIN BETWEEN BY CAPTURE CARRIAGE CASE CHARACTER DIGIT EITHER END EXACTLY FOLLOWED FROM HAD IF INSENSITIVE LAZY LEAST LETTER LINE LITERALLY MORE MULTI MUST NEVER NEW NO NONE NOT NUMBER OF ONCE ONE OPTIONAL OR RAW RETURN STARTS TAB TIMES TO TWICE UNTIL UPPERCASE VERTICAL WHITESPACE WITH WORD ].map { |x| [x, x] }.to_h
Instance Attribute Summary collapse
-
#line_start ⇒ Integer
readonly
Offset of start of current line within input.
-
#lineno ⇒ Integer
readonly
Current line number.
-
#scanner ⇒ StringScanner
readonly
Instance Method Summary collapse
-
#initialize(source) ⇒ Tokenizer
constructor
Constructor.
-
#tokens ⇒ Object
Constructor Details
#initialize(source) ⇒ Tokenizer
Constructor. Initialize a tokenizer for SRL.
109 110 111 112 113 |
# File 'lib/srl_ruby/tokenizer.rb', line 109 def initialize(source) @scanner = StringScanner.new(source) @lineno = 1 @line_start = 0 end |
Instance Attribute Details
#line_start ⇒ Integer (readonly)
Returns offset of start of current line within input.
37 38 39 |
# File 'lib/srl_ruby/tokenizer.rb', line 37 def line_start @line_start end |
#lineno ⇒ Integer (readonly)
Returns current line number.
34 35 36 |
# File 'lib/srl_ruby/tokenizer.rb', line 34 def lineno @lineno end |
#scanner ⇒ StringScanner (readonly)
31 32 33 |
# File 'lib/srl_ruby/tokenizer.rb', line 31 def scanner @scanner end |
Instance Method Details
#tokens ⇒ Object
115 116 117 118 119 120 121 122 123 |
# File 'lib/srl_ruby/tokenizer.rb', line 115 def tokens tok_sequence = [] until @scanner.eos? token = _next_token tok_sequence << token unless token.nil? end tok_sequence end |