Class: SQLTree::Tokenizer

Inherits:
Object
  • Object
show all
Includes:
Enumerable
Defined in:
lib/sql_tree/tokenizer.rb

Overview

The SQLTree::Tokenizer class transforms a string or stream of characters into a enumeration of tokens, that are more appropriate for the SQL parser to work with.

An example:

>> SQLTree::Tokenizer.new.tokenize('SELECT * FROM table')
=> [:select, :all, :from, Variable('table')]

The tokenize method will return an array of tokens, while the each_token (aliased to each) will yield every token one by one.

Constant Summary collapse

OPERATOR_CHARS =

A regular expression that matches all operator characters.

/\=|<|>|!|\-|\+|\/|\*|\%/

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initializeTokenizer

:nodoc:



21
22
23
# File 'lib/sql_tree/tokenizer.rb', line 21

def initialize # :nodoc:
  @keyword_queue = []
end

Instance Attribute Details

#keyword_queueObject (readonly)

The keyword queue, on which kywords are placed before they are yielded to the parser, to enable keyword combining (e.g. NOT LIKE)



19
20
21
# File 'lib/sql_tree/tokenizer.rb', line 19

def keyword_queue
  @keyword_queue
end

Instance Method Details

#current_charObject

Returns the current character that is being tokenized



34
35
36
# File 'lib/sql_tree/tokenizer.rb', line 34

def current_char
  @current_char
end

#each_token(&block) ⇒ Object Also known as: each

Iterator method that yields each token that is encountered in the SQL stream. These tokens are passed to the SQL parser to construct a syntax tree for the SQL query.

This method is aliased to :each to make the Enumerable methods work on this method.



81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
# File 'lib/sql_tree/tokenizer.rb', line 81

def each_token(&block) # :yields: SQLTree::Token
  while next_char
    case current_char
    when /^\s?$/;        # whitespace, go to next character
    when '(';            handle_token(SQLTree::Token::LPAREN, &block)
    when ')';            handle_token(SQLTree::Token::RPAREN, &block)
    when '.';            handle_token(SQLTree::Token::DOT, &block)
    when ',';            handle_token(SQLTree::Token::COMMA, &block)
    when /\d/;           tokenize_number(&block)
    when "'";            tokenize_quoted_string(&block)
    when OPERATOR_CHARS; tokenize_operator(&block)
    when /\w/;           tokenize_keyword(&block)
    when '"';            tokenize_quoted_variable(&block)     # TODO: allow MySQL quoting mode
    end
  end

  # Make sure to yield any tokens that are still stashed on the queue.
  empty_keyword_queue!(&block)
end

#empty_keyword_queue!(&block) ⇒ Object

This method ensures that every keyword currently in the queue is yielded. This method get called by handle_token when it knows for sure that the keywords on the queue cannot be combined into a single keyword.

block

the block to yield the tokens on the queue to.



71
72
73
# File 'lib/sql_tree/tokenizer.rb', line 71

def empty_keyword_queue!(&block) # :yields: SQLTree::Token
  block.call(@keyword_queue.shift) until @keyword_queue.empty?
end

#handle_token(token, &block) ⇒ Object

Combines several tokens to a single token if possible, and yields teh result, or yields every single token if they cannot be combined.

token

the token to yield or combine

block

the block to yield tokens and combined tokens to.



57
58
59
60
61
62
63
64
# File 'lib/sql_tree/tokenizer.rb', line 57

def handle_token(token, &block) # :yields: SQLTree::Token
  if token.kind_of?(SQLTree::Token::Keyword)
    keyword_queue.push(token)
  else
    empty_keyword_queue!(&block)
    block.call(token)
  end
end

#next_charObject

Returns the next character to tokenize, and moves the pointer of the current character one position forward.



47
48
49
50
# File 'lib/sql_tree/tokenizer.rb', line 47

def next_char
  @current_char_pos += 1
  @current_char = @string[@current_char_pos, 1]
end

#peek_char(lookahead = 1) ⇒ Object

Returns the next character to tokenize, but does not move the pointer of the current character forward.

lookahead

how many positions forward to peek.



41
42
43
# File 'lib/sql_tree/tokenizer.rb', line 41

def peek_char(lookahead = 1)
  @string[@current_char_pos + lookahead, 1]
end

#tokenize(string) ⇒ Object

Returns an array of tokens for the given string.

string

the string to tokenize



27
28
29
30
31
# File 'lib/sql_tree/tokenizer.rb', line 27

def tokenize(string)
  @string = string
  @current_char_pos = -1
  self.entries
end

#tokenize_keyword(&block) ⇒ Object

Tokenizes a eyword in the code. This can either be a reserved SQL keyword or a variable. This method will yield variables directly. Keywords will be yielded with a delay, because they may need to be combined with other keywords in the handle_token method.



107
108
109
110
111
112
113
114
115
116
# File 'lib/sql_tree/tokenizer.rb', line 107

def tokenize_keyword(&block) # :yields: SQLTree::Token
  literal = current_char
  literal << next_char while /[\w]/ =~ peek_char

  if SQLTree::Token::KEYWORDS.include?(literal.upcase)
    handle_token(SQLTree::Token.const_get(literal.upcase), &block)
  else
    handle_token(SQLTree::Token::Variable.new(literal), &block)
  end
end

#tokenize_number(&block) ⇒ Object

Tokenizes a number (either an integer or float) in the SQL stream. This method will yield the token after the last digit of the number has been encountered.



121
122
123
124
125
126
127
128
129
130
131
132
133
134
# File 'lib/sql_tree/tokenizer.rb', line 121

def tokenize_number(&block) # :yields: SQLTree::Token::Number
  number = current_char
  dot_encountered = false
  while /\d/ =~ peek_char || (peek_char == '.' && !dot_encountered)
    dot_encountered = true if peek_char == '.'
    number << next_char
  end

  if dot_encountered
    handle_token(SQLTree::Token::Number.new(number.to_f), &block)
  else
    handle_token(SQLTree::Token::Number.new(number.to_i), &block)
  end
end

#tokenize_operator(&block) ⇒ Object

Tokenizes an operator in the SQL stream. This method will yield the operator token when the last character of the token is encountered.



165
166
167
168
169
170
171
172
173
# File 'lib/sql_tree/tokenizer.rb', line 165

def tokenize_operator(&block) # :yields: SQLTree::Token
  operator = current_char
  if operator == '-' && /[\d\.]/ =~ peek_char
    tokenize_number(&block)
  else
    operator << next_char if SQLTree::Token::OPERATORS_HASH.has_key?(operator + peek_char)
    handle_token(SQLTree::Token.const_get(SQLTree::Token::OPERATORS_HASH[operator].to_s.upcase), &block)
  end
end

#tokenize_quoted_string(&block) ⇒ Object

Reads a quoted string token from the SQL stream. This method will yield an SQLTree::Token::String when the closing quote character is encountered.



139
140
141
142
143
144
145
# File 'lib/sql_tree/tokenizer.rb', line 139

def tokenize_quoted_string(&block) # :yields: SQLTree::Token::String
  string = ''
  until next_char.nil? || current_char == "'"
    string << (current_char == "\\" ? next_char : current_char)
  end
  handle_token(SQLTree::Token::String.new(string), &block)
end

#tokenize_quoted_variable(&block) ⇒ Object

Tokenize a quoted variable from the SQL stream. This method will yield an SQLTree::Token::Variable when to closing quote is found.

The actual quote character that is used depends on the DBMS. For now, only the more standard double quote is accepted.



152
153
154
155
156
157
158
# File 'lib/sql_tree/tokenizer.rb', line 152

def tokenize_quoted_variable(&block) # :yields: SQLTree::Token::Variable
  variable = ''
  until next_char.nil? || current_char == '"' # TODO: allow MySQL quoting mode
    variable << (current_char == "\\" ? next_char : current_char)
  end
  handle_token(SQLTree::Token::Variable.new(variable), &block)
end