Class: SrlRuby::Tokenizer

Inherits:
Object
  • Object
show all
Defined in:
lib/srl_ruby/tokenizer.rb

Overview

A tokenizer for the Simple Regex Language. Responsibility: break input SRL into a sequence of token objects. The tokenizer should recognize: Keywords: as, capture, letter Integer literals including single digit String literals (quote delimited) Single character literal Delimiters: parentheses '(' and ')' Separators: comma (optional)

Defined Under Namespace

Classes: ScanError

Constant Summary collapse

PATT_CHAR_CLASS =

Returns Matches a SRL character class.

Returns:

  • (Regexp)

    Matches a SRL character class

/[^,"\s]{2,}/
PATT_DIGIT_LIT =

Returns Matches single digit.

Returns:

  • (Regexp)

    Matches single digit

/[0-9]((?=\s|,|\))|$)/
PATT_IDENTIFIER =

Returns Matches a SRL identifier.

Returns:

  • (Regexp)

    Matches a SRL identifier

/[a-zA-Z_][a-zA-Z0-9_]+/
PATT_INTEGER =

Returns Matches a decimal integer. An integer has 2..* digits.

Returns:

  • (Regexp)

    Matches a decimal integer. An integer has 2..* digits

/[0-9]{2,}((?=\s|,|\))|$)/
PATT_LETTER_LIT =

Returns Matches a single letter.

Returns:

  • (Regexp)

    Matches a single letter.

/[a-zA-Z]((?=\s|,|\))|$)/
PATT_NEWLINE =

Returns Matches a new line (cross-platform).

Returns:

  • (Regexp)

    Matches a new line (cross-platform)

/(?:\r\n)|\r|\n/
PATT_STR_DBL_QUOTE =

Returns Matches a text enclosed in double quotes.

Returns:

  • (Regexp)

    Matches a text enclosed in double quotes

/"(?:\\"|[^"])*"/
PATT_STR_SNGL_QUOTE =

Returns Matches a text enclosed in single quotes.

Returns:

  • (Regexp)

    Matches a text enclosed in single quotes

/'(?:\\'|[^'])*'/
PATT_WHITESPACE =

Returns Matches SRL blank(s).

Returns:

  • (Regexp)

    Matches SRL blank(s)

/[ \t\f]+/
Lexeme2name =

Returns Mapping special single characters to symbolic names.

Returns:

  • ({String => String})

    Mapping special single characters to symbolic names.

{
  '(' => 'LPAREN',
  ')' => 'RPAREN',
  ',' => 'COMMA'
}.freeze
Keywords =

Here are all the SRL keywords (in uppercase)

Returns:

  • ({String => String})
%w[
  ALL
  ALREADY
  AND
  ANY
  ANYTHING
  AS
  AT
  BACKSLASH
  BEGIN
  BETWEEN
  BY
  CAPTURE
  CARRIAGE
  CASE
  CHARACTER
  DIGIT
  EITHER
  END
  EXACTLY
  FOLLOWED
  FROM
  HAD
  IF
  INSENSITIVE
  LAZY
  LEAST
  LETTER
  LINE
  LITERALLY
  MORE
  MULTI
  MUST
  NEVER
  NEW
  NO
  NONE
  NOT
  NUMBER
  OF
  ONCE
  ONE
  OPTIONAL
  OR
  RAW
  RETURN
  STARTS
  TAB
  TIMES
  TO
  TWICE
  UNTIL
  UPPERCASE
  VERTICAL
  WHITESPACE
  WITH
  WORD
].to_h { |x| [x, x] }

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(source) ⇒ Tokenizer

Constructor. Initialize a tokenizer for SRL.

Parameters:

  • source (String)

    SRL text to tokenize.



129
130
131
132
133
# File 'lib/srl_ruby/tokenizer.rb', line 129

def initialize(source)
  @scanner = StringScanner.new(source)
  @lineno = 1
  @line_start = 0
end

Instance Attribute Details

#line_startInteger (readonly)

Returns offset of start of current line within input.

Returns:

  • (Integer)

    offset of start of current line within input



54
55
56
# File 'lib/srl_ruby/tokenizer.rb', line 54

def line_start
  @line_start
end

#linenoInteger (readonly)

Returns current line number.

Returns:

  • (Integer)

    current line number



51
52
53
# File 'lib/srl_ruby/tokenizer.rb', line 51

def lineno
  @lineno
end

#scannerStringScanner (readonly)

Returns:

  • (StringScanner)


48
49
50
# File 'lib/srl_ruby/tokenizer.rb', line 48

def scanner
  @scanner
end

Instance Method Details

#tokensArray<Rley::Lexical::Token>

Returns the sequence of tokens recognized from the input text given at initialization.

Returns:

  • (Array<Rley::Lexical::Token>)


137
138
139
140
141
142
143
144
145
# File 'lib/srl_ruby/tokenizer.rb', line 137

def tokens
  tok_sequence = []
  until @scanner.eos?
    token = _next_token
    tok_sequence << token unless token.nil?
  end

  tok_sequence
end