Class: Rlex::Lexer
- Inherits:
-
Object
- Object
- Rlex::Lexer
- Defined in:
- lib/rlex/lexer.rb
Overview
Implements a simple lexer using a StringScanner
.
The lexer was written for use with Racc, a Ruby variant of Yacc. But there is no code dependency on that project so the lexer may also be used on its own or with other packages.
-
Ignored input takes precedence over rules and keywords, so if a prefix is matched by an ignore pattern, it’s ignored even if it’s also a keyword or matched by a rule
-
The lexer is greedy, so if a prefix is matched by multiple rules or keywords, the lexer chooses the option consuming the most input
Instance Method Summary collapse
-
#ignore(pattern) ⇒ Regexp
Instructs the lexer to ignore input matched by the specified pattern.
-
#initialize ⇒ Lexer
constructor
Initializes an empty Lexer.
-
#keyword(name = nil, kword) ⇒ Symbol
Defines a static sequence of input as a keyword.
-
#next_token ⇒ Token
Returns the next token matched from the remaining input.
-
#rule(name, pattern) ⇒ Symbol
Defines a rule to match the specified pattern.
-
#start(input) ⇒ String
Initializes the lexer with new input.
Constructor Details
#initialize ⇒ Lexer
Initializes an empty Lexer.
43 44 45 46 47 |
# File 'lib/rlex/lexer.rb', line 43 def initialize @ignored = [] @rules = [] @keywords = {} end |
Instance Method Details
#ignore(pattern) ⇒ Regexp
Ignored input takes precedence over rules and keywords, so if a prefix is matched by an ignore pattern, it’s ignored even if it’s also a keyword or matched by a rule
Instructs the lexer to ignore input matched by the specified pattern. If appropriate, call this multiple times to ignore several patterns.
61 62 63 64 |
# File 'lib/rlex/lexer.rb', line 61 def ignore(pattern) @ignored << pattern return pattern end |
#keyword(name = nil, kword) ⇒ Symbol
Use keywords for efficiency instead of rules whenever the matched input is static
Defines a static sequence of input as a keyword.
101 102 103 104 105 106 107 108 109 |
# File 'lib/rlex/lexer.rb', line 101 def keyword(name = nil, kword) # @todo Validate the keyword name kword_str = kword.to_s name = kword.to_sym if name == nil pattern = Regexp.new(Regexp.escape kword_str) rule name, pattern @keywords[kword_str] = Token.new name.to_sym, kword_str return name.to_sym end |
#next_token ⇒ Token
Returns the next token matched from the remaining input. If no input is left, or the lexer has not been initialized, EOF_TOKEN
is returned.
135 136 137 138 139 140 141 142 143 144 145 146 147 |
# File 'lib/rlex/lexer.rb', line 135 def next_token return EOF_TOKEN if @scanner.nil? or @scanner.empty? return next_token if ignore_prefix? rule = greediest_rule if rule prefix = fetch_prefix_and_update_pos(rule.pattern) keyword = @keywords[prefix] type = keyword ? keyword.type : rule.name token = keyword ? keyword.value : prefix return Token.new(type, token, @line, @col - token.size) end raise "unexpected input <#{@scanner.peek(5)}>" end |
#rule(name, pattern) ⇒ Symbol
Use keywords for efficiency instead of rules whenever the matched input is static
Defines a rule to match the specified pattern.
79 80 81 82 83 |
# File 'lib/rlex/lexer.rb', line 79 def rule(name, pattern) # @todo Validate the rule name @rules << (Rule.new name.to_sym, pattern) return name.to_sym end |
#start(input) ⇒ String
This resets the lexer with a new StringScanner so any state information related to previous input is lost
Initializes the lexer with new input.
120 121 122 123 124 125 |
# File 'lib/rlex/lexer.rb', line 120 def start(input) @line = 1 @col = 0 @scanner = StringScanner.new input return input end |