Class: SQLTree::Tokenizer
- Inherits:
-
Object
- Object
- SQLTree::Tokenizer
- Includes:
- Enumerable
- Defined in:
- lib/sql_tree/tokenizer.rb
Overview
The SQLTree::Tokenizer class transforms a string or stream of characters into a enumeration of tokens, that are more appropriate for the SQL parser to work with.
An example:
>> SQLTree::Tokenizer.new.tokenize('SELECT * FROM table')
=> [:select, :all, :from, Variable('table')]
The tokenize method will return an array of tokens, while the each_token (aliased to each) will yield every token one by one.
Constant Summary collapse
- OPERATOR_CHARS =
A regular expression that matches all operator characters.
/\=|<|>|!|\-|\+|\/|\*|\%/
Instance Attribute Summary collapse
-
#keyword_queue ⇒ Object
readonly
The keyword queue, on which kywords are placed before they are yielded to the parser, to enable keyword combining (e.g. NOT LIKE).
Instance Method Summary collapse
-
#current_char ⇒ Object
Returns the current character that is being tokenized.
-
#each_token(&block) ⇒ Object
(also: #each)
Iterator method that yields each token that is encountered in the SQL stream.
-
#empty_keyword_queue!(&block) ⇒ Object
This method ensures that every keyword currently in the queue is yielded.
-
#handle_token(token, &block) ⇒ Object
Combines several tokens to a single token if possible, and yields teh result, or yields every single token if they cannot be combined.
-
#initialize ⇒ Tokenizer
constructor
:nodoc:.
-
#next_char ⇒ Object
Returns the next character to tokenize, and moves the pointer of the current character one position forward.
-
#peek_char(lookahead = 1) ⇒ Object
Returns the next character to tokenize, but does not move the pointer of the current character forward.
-
#tokenize(string) ⇒ Object
Returns an array of tokens for the given string.
-
#tokenize_keyword(&block) ⇒ Object
Tokenizes a eyword in the code.
-
#tokenize_number(&block) ⇒ Object
Tokenizes a number (either an integer or float) in the SQL stream.
-
#tokenize_operator(&block) ⇒ Object
Tokenizes an operator in the SQL stream.
-
#tokenize_quoted_string(&block) ⇒ Object
Reads a quoted string token from the SQL stream.
-
#tokenize_quoted_variable(&block) ⇒ Object
Tokenize a quoted variable from the SQL stream.
Constructor Details
#initialize ⇒ Tokenizer
:nodoc:
21 22 23 |
# File 'lib/sql_tree/tokenizer.rb', line 21 def initialize # :nodoc: @keyword_queue = [] end |
Instance Attribute Details
#keyword_queue ⇒ Object (readonly)
The keyword queue, on which kywords are placed before they are yielded to the parser, to enable keyword combining (e.g. NOT LIKE)
19 20 21 |
# File 'lib/sql_tree/tokenizer.rb', line 19 def keyword_queue @keyword_queue end |
Instance Method Details
#current_char ⇒ Object
Returns the current character that is being tokenized
34 35 36 |
# File 'lib/sql_tree/tokenizer.rb', line 34 def current_char @current_char end |
#each_token(&block) ⇒ Object Also known as: each
Iterator method that yields each token that is encountered in the SQL stream. These tokens are passed to the SQL parser to construct a syntax tree for the SQL query.
This method is aliased to :each to make the Enumerable methods work on this method.
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
# File 'lib/sql_tree/tokenizer.rb', line 81 def each_token(&block) # :yields: SQLTree::Token while next_char case current_char when /^\s?$/; # whitespace, go to next character when '('; handle_token(SQLTree::Token::LPAREN, &block) when ')'; handle_token(SQLTree::Token::RPAREN, &block) when '.'; handle_token(SQLTree::Token::DOT, &block) when ','; handle_token(SQLTree::Token::COMMA, &block) when /\d/; tokenize_number(&block) when "'"; tokenize_quoted_string(&block) when OPERATOR_CHARS; tokenize_operator(&block) when /\w/; tokenize_keyword(&block) when '"'; tokenize_quoted_variable(&block) # TODO: allow MySQL quoting mode end end # Make sure to yield any tokens that are still stashed on the queue. empty_keyword_queue!(&block) end |
#empty_keyword_queue!(&block) ⇒ Object
This method ensures that every keyword currently in the queue is yielded. This method get called by handle_token when it knows for sure that the keywords on the queue cannot be combined into a single keyword.
block-
the block to yield the tokens on the queue to.
71 72 73 |
# File 'lib/sql_tree/tokenizer.rb', line 71 def empty_keyword_queue!(&block) # :yields: SQLTree::Token block.call(@keyword_queue.shift) until @keyword_queue.empty? end |
#handle_token(token, &block) ⇒ Object
Combines several tokens to a single token if possible, and yields teh result, or yields every single token if they cannot be combined.
token-
the token to yield or combine
block-
the block to yield tokens and combined tokens to.
57 58 59 60 61 62 63 64 |
# File 'lib/sql_tree/tokenizer.rb', line 57 def handle_token(token, &block) # :yields: SQLTree::Token if token.kind_of?(SQLTree::Token::Keyword) keyword_queue.push(token) else empty_keyword_queue!(&block) block.call(token) end end |
#next_char ⇒ Object
Returns the next character to tokenize, and moves the pointer of the current character one position forward.
47 48 49 50 |
# File 'lib/sql_tree/tokenizer.rb', line 47 def next_char @current_char_pos += 1 @current_char = @string[@current_char_pos, 1] end |
#peek_char(lookahead = 1) ⇒ Object
Returns the next character to tokenize, but does not move the pointer of the current character forward.
lookahead-
how many positions forward to peek.
41 42 43 |
# File 'lib/sql_tree/tokenizer.rb', line 41 def peek_char(lookahead = 1) @string[@current_char_pos + lookahead, 1] end |
#tokenize(string) ⇒ Object
Returns an array of tokens for the given string.
string-
the string to tokenize
27 28 29 30 31 |
# File 'lib/sql_tree/tokenizer.rb', line 27 def tokenize(string) @string = string @current_char_pos = -1 self.entries end |
#tokenize_keyword(&block) ⇒ Object
Tokenizes a eyword in the code. This can either be a reserved SQL keyword or a variable. This method will yield variables directly. Keywords will be yielded with a delay, because they may need to be combined with other keywords in the handle_token method.
107 108 109 110 111 112 113 114 115 116 |
# File 'lib/sql_tree/tokenizer.rb', line 107 def tokenize_keyword(&block) # :yields: SQLTree::Token literal = current_char literal << next_char while /[\w]/ =~ peek_char if SQLTree::Token::KEYWORDS.include?(literal.upcase) handle_token(SQLTree::Token.const_get(literal.upcase), &block) else handle_token(SQLTree::Token::Variable.new(literal), &block) end end |
#tokenize_number(&block) ⇒ Object
Tokenizes a number (either an integer or float) in the SQL stream. This method will yield the token after the last digit of the number has been encountered.
121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
# File 'lib/sql_tree/tokenizer.rb', line 121 def tokenize_number(&block) # :yields: SQLTree::Token::Number number = current_char dot_encountered = false while /\d/ =~ peek_char || (peek_char == '.' && !dot_encountered) dot_encountered = true if peek_char == '.' number << next_char end if dot_encountered handle_token(SQLTree::Token::Number.new(number.to_f), &block) else handle_token(SQLTree::Token::Number.new(number.to_i), &block) end end |
#tokenize_operator(&block) ⇒ Object
Tokenizes an operator in the SQL stream. This method will yield the operator token when the last character of the token is encountered.
165 166 167 168 169 170 171 172 173 |
# File 'lib/sql_tree/tokenizer.rb', line 165 def tokenize_operator(&block) # :yields: SQLTree::Token operator = current_char if operator == '-' && /[\d\.]/ =~ peek_char tokenize_number(&block) else operator << next_char if SQLTree::Token::OPERATORS_HASH.has_key?(operator + peek_char) handle_token(SQLTree::Token.const_get(SQLTree::Token::OPERATORS_HASH[operator].to_s.upcase), &block) end end |
#tokenize_quoted_string(&block) ⇒ Object
Reads a quoted string token from the SQL stream. This method will yield an SQLTree::Token::String when the closing quote character is encountered.
139 140 141 142 143 144 145 |
# File 'lib/sql_tree/tokenizer.rb', line 139 def tokenize_quoted_string(&block) # :yields: SQLTree::Token::String string = '' until next_char.nil? || current_char == "'" string << (current_char == "\\" ? next_char : current_char) end handle_token(SQLTree::Token::String.new(string), &block) end |
#tokenize_quoted_variable(&block) ⇒ Object
Tokenize a quoted variable from the SQL stream. This method will yield an SQLTree::Token::Variable when to closing quote is found.
The actual quote character that is used depends on the DBMS. For now, only the more standard double quote is accepted.
152 153 154 155 156 157 158 |
# File 'lib/sql_tree/tokenizer.rb', line 152 def tokenize_quoted_variable(&block) # :yields: SQLTree::Token::Variable variable = '' until next_char.nil? || current_char == '"' # TODO: allow MySQL quoting mode variable << (current_char == "\\" ? next_char : current_char) end handle_token(SQLTree::Token::Variable.new(variable), &block) end |