Class: StructuredSearch::Lexer

Inherits:
Object
  • Object
show all
Defined in:
lib/structured_search/lexer.rb,
lib/structured_search/patterns.rb

Overview

Converts the input into a token stream, that can be worked by the syntax parser.

Constant Summary collapse

RESERVED =

SQL reserved words list

%w{
SELECT ALL DISTINCT FROM WHERE ASC DESC
}
PATTERNS =

Pattern hash of token keys and regex values

[

  [:WHITESPACE, /[\r\v\f\t ]+/],
  [:TERMINATOR, /[\r\n]/],

  # intern reserved words and their patterns
  *RESERVED.map { |rw| [rw.intern, /#{rw}(?=[^A-z0-9_])/] },
  
  # match single / double quoted strings
  [:STRING, /(['"])(\\n|\\.|((?!\1)(?!\\)|.)*?((?!\1)(?!\\).)?)\1/, -> match { match[2] } ],
  [:L_PAREN,    /\(/],
  [:R_PAREN,    /\)/],
  [:L_BRACKET,  /\[/],
  [:R_BRACKET,  /\]/],
  [:L_BRACE,    /\{/],
  [:R_BRACE,    /\}/],
  [:PERCENT,    /%/ ],
  [:AMPERSAND,  /&/ ],
  [:ASTERISK,   /\*/],
  [:PLUS,       /\+/],
  [:MINUS,      /-/],
  [:COMMA,      /,/ ],
  [:PERIOD,     /\./],
  [:COLON,      /:/ ],
  [:SEMICOLON,  /;/ ],
  [:LEQ,        /<=/],
  [:EQUALS,     /=/ ],
  [:GEQ,        />=/],
  [:QUESTION,   /\?/],
  [:CIRCUMFLEX, /\^/],
  [:UNDERSCORE, /_/ ],
  [:PIPE,       /\|/]
].map { |pattern| [pattern[0], /\G#{pattern[1]}/m, pattern[2]] }

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(input) ⇒ Lexer

Creates a new instance of the Lexer. Params:

input

The SQL input that will be parsed.



32
33
34
35
36
37
# File 'lib/structured_search/lexer.rb', line 32

def initialize(input)
  @input = input
  @column = 1
  @line = 1
  @lexer_offset = 0
end

Instance Attribute Details

#columnObject

input

Input string to parse

column

Current column position

line

Current position in the line

lexer_offset

Current character position in the input string



14
15
16
# File 'lib/structured_search/lexer.rb', line 14

def column
  @column
end

#inputObject

input

Input string to parse

column

Current column position

line

Current position in the line

lexer_offset

Current character position in the input string



14
15
16
# File 'lib/structured_search/lexer.rb', line 14

def input
  @input
end

#lexer_offsetObject

input

Input string to parse

column

Current column position

line

Current position in the line

lexer_offset

Current character position in the input string



14
15
16
# File 'lib/structured_search/lexer.rb', line 14

def lexer_offset
  @lexer_offset
end

#lineObject

input

Input string to parse

column

Current column position

line

Current position in the line

lexer_offset

Current character position in the input string



14
15
16
# File 'lib/structured_search/lexer.rb', line 14

def line
  @line
end

Instance Method Details

#scan(is_peek = false) ⇒ Object

Scans the input, matching each token that appears and returns the token. Supports both read and peek operations determined by the state of the peek flag. Params:

is_peek

Whether the lexer will consume the token, or remain in it’s current position (false by default)

Returns:

token

A StructuredSeach::Token is returned to the caller.



47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
# File 'lib/structured_search/lexer.rb', line 47

def scan(is_peek = false)
  PATTERNS.each do |pattern|
    match = pattern[1].match(@input, @lexer_offset)
    if match
      token_data = { token:  pattern[0],
                     lexeme: pattern[2] ? pattern[2].call(match) : '',
                     line: @line, column: @column }
      token = Token.new(token_data)

      # increment line and col position if a read op:
      if !is_peek
        tok_length = match[0].size
        newline_count = match[0].count("\n")
        @lexer_offset += tok_length
        @line += newline_count
        @column = 1 if newline_count
        @column += tok_length - (match[0].rindex("\n") || 0)
      end

      # clear any whitespace
      if pattern[0] == :WHITESPACE
        @lexer_offset += match[0].size
        return scan(is_peek)
      else
        return token
      end

    end
  end

  # have we underrun the input due to lex error?:
  if @lexer_offset < @input.size
    raise LexicalError, "Unexpected character \"#{@input[@lexer_offset+1]}\"  at (Line #{@line}, Column #{@column})"
  end

  nil
end

#stateObject

Returns the current state of the lexer, by way of input, current line and column.



18
19
20
# File 'lib/structured_search/lexer.rb', line 18

def state
  { input: input, column: column, line: line }
end

#state=(state) ⇒ Object

Sets the state of the lexer



23
24
25
26
27
# File 'lib/structured_search/lexer.rb', line 23

def state=(state)
  @input = state[:input]
  @column = state[:column]
  @line = state[:line]
end