Class: StructuredSearch::Lexer
- Inherits:
-
Object
- Object
- StructuredSearch::Lexer
- Defined in:
- lib/structured_search/lexer.rb,
lib/structured_search/patterns.rb
Overview
Converts the input into a token stream, that can be worked by the syntax parser.
Constant Summary collapse
- RESERVED =
SQL reserved words list
%w{ SELECT ALL DISTINCT FROM WHERE ASC DESC }- PATTERNS =
Pattern hash of token keys and regex values
[ [:WHITESPACE, /[\r\v\f\t ]+/], [:TERMINATOR, /[\r\n]/], # intern reserved words and their patterns *RESERVED.map { |rw| [rw.intern, /#{rw}(?=[^A-z0-9_])/] }, # match single / double quoted strings [:STRING, /(['"])(\\n|\\.|((?!\1)(?!\\)|.)*?((?!\1)(?!\\).)?)\1/, -> match { match[2] } ], [:L_PAREN, /\(/], [:R_PAREN, /\)/], [:L_BRACKET, /\[/], [:R_BRACKET, /\]/], [:L_BRACE, /\{/], [:R_BRACE, /\}/], [:PERCENT, /%/ ], [:AMPERSAND, /&/ ], [:ASTERISK, /\*/], [:PLUS, /\+/], [:MINUS, /-/], [:COMMA, /,/ ], [:PERIOD, /\./], [:COLON, /:/ ], [:SEMICOLON, /;/ ], [:LEQ, /<=/], [:EQUALS, /=/ ], [:GEQ, />=/], [:QUESTION, /\?/], [:CIRCUMFLEX, /\^/], [:UNDERSCORE, /_/ ], [:PIPE, /\|/] ].map { |pattern| [pattern[0], /\G#{pattern[1]}/m, pattern[2]] }
Instance Attribute Summary collapse
-
#column ⇒ Object
input- Input string to parse
column - Current column position
line - Current position in the line
lexer_offset -
Current character position in the input string.
- Current position in the line
- Current column position
- Input string to parse
-
#input ⇒ Object
input- Input string to parse
column - Current column position
line - Current position in the line
lexer_offset -
Current character position in the input string.
- Current position in the line
- Current column position
- Input string to parse
-
#lexer_offset ⇒ Object
input- Input string to parse
column - Current column position
line - Current position in the line
lexer_offset -
Current character position in the input string.
- Current position in the line
- Current column position
- Input string to parse
-
#line ⇒ Object
input- Input string to parse
column - Current column position
line - Current position in the line
lexer_offset -
Current character position in the input string.
- Current position in the line
- Current column position
- Input string to parse
Instance Method Summary collapse
-
#initialize(input) ⇒ Lexer
constructor
Creates a new instance of the Lexer.
-
#scan(is_peek = false) ⇒ Object
Scans the input, matching each token that appears and returns the token.
-
#state ⇒ Object
Returns the current state of the lexer, by way of input, current line and column.
-
#state=(state) ⇒ Object
Sets the state of the lexer.
Constructor Details
#initialize(input) ⇒ Lexer
Creates a new instance of the Lexer. Params:
input-
The SQL input that will be parsed.
32 33 34 35 36 37 |
# File 'lib/structured_search/lexer.rb', line 32 def initialize(input) @input = input @column = 1 @line = 1 @lexer_offset = 0 end |
Instance Attribute Details
#column ⇒ Object
input-
Input string to parse
column-
Current column position
line-
Current position in the line
lexer_offset-
Current character position in the input string
14 15 16 |
# File 'lib/structured_search/lexer.rb', line 14 def column @column end |
#input ⇒ Object
input-
Input string to parse
column-
Current column position
line-
Current position in the line
lexer_offset-
Current character position in the input string
14 15 16 |
# File 'lib/structured_search/lexer.rb', line 14 def input @input end |
#lexer_offset ⇒ Object
input-
Input string to parse
column-
Current column position
line-
Current position in the line
lexer_offset-
Current character position in the input string
14 15 16 |
# File 'lib/structured_search/lexer.rb', line 14 def lexer_offset @lexer_offset end |
#line ⇒ Object
input-
Input string to parse
column-
Current column position
line-
Current position in the line
lexer_offset-
Current character position in the input string
14 15 16 |
# File 'lib/structured_search/lexer.rb', line 14 def line @line end |
Instance Method Details
#scan(is_peek = false) ⇒ Object
Scans the input, matching each token that appears and returns the token. Supports both read and peek operations determined by the state of the peek flag. Params:
is_peek-
Whether the lexer will consume the token, or remain in it’s current position (false by default)
Returns:
token-
A StructuredSeach::Token is returned to the caller.
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
# File 'lib/structured_search/lexer.rb', line 47 def scan(is_peek = false) PATTERNS.each do |pattern| match = pattern[1].match(@input, @lexer_offset) if match token_data = { token: pattern[0], lexeme: pattern[2] ? pattern[2].call(match) : '', line: @line, column: @column } token = Token.new(token_data) # increment line and col position if a read op: if !is_peek tok_length = match[0].size newline_count = match[0].count("\n") @lexer_offset += tok_length @line += newline_count @column = 1 if newline_count @column += tok_length - (match[0].rindex("\n") || 0) end # clear any whitespace if pattern[0] == :WHITESPACE @lexer_offset += match[0].size return scan(is_peek) else return token end end end # have we underrun the input due to lex error?: if @lexer_offset < @input.size raise LexicalError, "Unexpected character \"#{@input[@lexer_offset+1]}\" at (Line #{@line}, Column #{@column})" end nil end |
#state ⇒ Object
Returns the current state of the lexer, by way of input, current line and column.
18 19 20 |
# File 'lib/structured_search/lexer.rb', line 18 def state { input: input, column: column, line: line } end |
#state=(state) ⇒ Object
Sets the state of the lexer
23 24 25 26 27 |
# File 'lib/structured_search/lexer.rb', line 23 def state=(state) @input = state[:input] @column = state[:column] @line = state[:line] end |