Class: SrlRuby::Tokenizer

Inherits:
Object
  • Object
show all
Defined in:
lib/srl_ruby/tokenizer.rb

Overview

A tokenizer for the Simple Regex Language. Responsibility: break input SRL into a sequence of token objects. The tokenizer should recognize: Keywords: as, capture, letter Integer literals including single digit String literals (quote delimited) Single character literal Delimiters: parentheses ‘(‘ and ‘)’ Separators: comma (optional)

Defined Under Namespace

Classes: ScanError

Constant Summary collapse

PATT_CHAR_CLASS =
/[^,"\s]{2,}/.freeze
PATT_DIGIT_LIT =
/[0-9]((?=\s|,|\))|$)/.freeze
PATT_IDENTIFIER =
/[a-zA-Z_][a-zA-Z0-9_]+/.freeze
PATT_INTEGER =

An integer has 2..* digits

/[0-9]{2,}((?=\s|,|\))|$)/.freeze
PATT_LETTER_LIT =
/[a-zA-Z]((?=\s|,|\))|$)/.freeze
PATT_NEWLINE =
/(?:\r\n)|\r|\n/.freeze
PATT_STR_DBL_QUOTE =

Double quotes literal?

/"(?:\\"|[^"])*"/.freeze
PATT_STR_SNGL_QUOTE =

Single quotes literal?

/'(?:\\'|[^'])*'/.freeze
PATT_WHITESPACE =
/[ \t\f]+/.freeze
Lexeme2name =
{
  '(' => 'LPAREN',
  ')' => 'RPAREN',
  ',' => 'COMMA'
}.freeze
Keywords =

Here are all the SRL keywords (in uppercase)

%w[
  ALL
  ALREADY
  AND
  ANY
  ANYTHING
  AS
  AT
  BACKSLASH
  BEGIN
  BETWEEN
  BY
  CAPTURE
  CARRIAGE
  CASE
  CHARACTER
  DIGIT
  EITHER
  END
  EXACTLY
  FOLLOWED
  FROM
  HAD
  IF
  INSENSITIVE
  LAZY
  LEAST
  LETTER
  LINE
  LITERALLY
  MORE
  MULTI
  MUST
  NEVER
  NEW
  NO
  NONE
  NOT
  NUMBER
  OF
  ONCE
  ONE
  OPTIONAL
  OR
  RAW
  RETURN
  STARTS
  TAB
  TIMES
  TO
  TWICE
  UNTIL
  UPPERCASE
  VERTICAL
  WHITESPACE
  WITH
  WORD
].map { |x| [x, x] }.to_h

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(source) ⇒ Tokenizer

Constructor. Initialize a tokenizer for SRL.

Parameters:

  • source (String)

    SRL text to tokenize.



109
110
111
112
113
# File 'lib/srl_ruby/tokenizer.rb', line 109

def initialize(source)
  @scanner = StringScanner.new(source)
  @lineno = 1
  @line_start = 0
end

Instance Attribute Details

#line_startInteger (readonly)

Returns offset of start of current line within input.

Returns:

  • (Integer)

    offset of start of current line within input



37
38
39
# File 'lib/srl_ruby/tokenizer.rb', line 37

def line_start
  @line_start
end

#linenoInteger (readonly)

Returns current line number.

Returns:

  • (Integer)

    current line number



34
35
36
# File 'lib/srl_ruby/tokenizer.rb', line 34

def lineno
  @lineno
end

#scannerStringScanner (readonly)

Returns:

  • (StringScanner)


31
32
33
# File 'lib/srl_ruby/tokenizer.rb', line 31

def scanner
  @scanner
end

Instance Method Details

#tokensObject



115
116
117
118
119
120
121
122
123
# File 'lib/srl_ruby/tokenizer.rb', line 115

def tokens
  tok_sequence = []
  until @scanner.eos?
    token = _next_token
    tok_sequence << token unless token.nil?
  end

  tok_sequence
end