Class: Minilex::Lexer
- Inherits:
-
Object
- Object
- Minilex::Lexer
- Defined in:
- lib/minilex.rb
Instance Attribute Summary collapse
-
#pos ⇒ Object
readonly
Returns the value of attribute pos.
-
#rules ⇒ Object
readonly
Returns the value of attribute rules.
-
#scanner ⇒ Object
readonly
Returns the value of attribute scanner.
-
#tokens ⇒ Object
readonly
Returns the value of attribute tokens.
Instance Method Summary collapse
-
#append_eos ⇒ Object
Makes the end-of-stream token.
-
#append_token(id, value) ⇒ Object
Makes a token.
-
#initialize(&block) ⇒ Lexer
constructor
Creates a Lexer instance.
-
#lex(input) ⇒ Object
Runs the lexer on the given input.
-
#match ⇒ Object
- internal
-
Finds the matching rule.
-
#skip(id, pattern) ⇒ Object
Defines patterns to ignore.
-
#tok(id, pattern, processor = nil) ⇒ Object
Defines a token-matching rule.
-
#update_pos(text) ⇒ Object
- internal
-
Updates the position information.
Constructor Details
#initialize(&block) ⇒ Lexer
Creates a Lexer instance
Expression = Minilex::Lexer.new do
skip :whitespace, /\s+/
tok :number, /\d+(?:\.\d+)?/
tok :operator, /[\+\=\/\*]/
end
You don’t have to pass a block. This also works:
Expression = Minilex::Lexer.new
Expression.skip :whitespace, /\s+/
Expression.tok :number, /\d+(?:\.\d+)?/
Expression.tok :operator, /[\+\=\/\*]/
24 25 26 27 |
# File 'lib/minilex.rb', line 24 def initialize(&block) @rules = [] instance_eval &block if block end |
Instance Attribute Details
#pos ⇒ Object (readonly)
Returns the value of attribute pos.
8 9 10 |
# File 'lib/minilex.rb', line 8 def pos @pos end |
#rules ⇒ Object (readonly)
Returns the value of attribute rules.
8 9 10 |
# File 'lib/minilex.rb', line 8 def rules @rules end |
#scanner ⇒ Object (readonly)
Returns the value of attribute scanner.
8 9 10 |
# File 'lib/minilex.rb', line 8 def scanner @scanner end |
#tokens ⇒ Object (readonly)
Returns the value of attribute tokens.
8 9 10 |
# File 'lib/minilex.rb', line 8 def tokens @tokens end |
Instance Method Details
#append_eos ⇒ Object
Makes the end-of-stream token
Similar to ‘append_token`, used to make the final token. Append [:eos] to the `tokens` array.
90 91 92 |
# File 'lib/minilex.rb', line 90 def append_eos tokens << [:eos] end |
#append_token(id, value) ⇒ Object
Makes a token
id - the id of the matched rule value - the value that was matched
Called when a rule is matched to build the resulting token.
Override this method if you’d like your tokens in a different form. You have access to the array of tokens via ‘tokens` and the current token’s position information via ‘pos`.
returns an Array of [id, value, line, offset]
82 83 84 |
# File 'lib/minilex.rb', line 82 def append_token(id, value) tokens << [id, value, pos.line, pos.offset] end |
#lex(input) ⇒ Object
Runs the lexer on the given input
returns an Array of tokens
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
# File 'lib/minilex.rb', line 52 def lex(input) @tokens = [] @pos = Pos.new(1, 0) @scanner = StringScanner.new(input) until scanner.eos? rule, text = match value = rule.processor ? send(rule.processor, text) : text append_token(rule.id, value) unless rule.skip update_pos(text) end append_eos tokens end |
#match ⇒ Object
- internal
-
Finds the matching rule
Tries the rules in defined order until there’s a match. Raise an UnrecognizedInput error if ther isn’t one.
returns a 2-element Array of [rule, matched_text]
101 102 103 104 105 106 107 |
# File 'lib/minilex.rb', line 101 def match rules.each do |rule| next unless text = scanner.scan(rule.pattern) return [rule, text] end raise UnrecognizedInput.new(scanner, pos) end |
#skip(id, pattern) ⇒ Object
Defines patterns to ignore
id - an identifier, it’s nice to name things pattern - the Regexp to skip
45 46 47 |
# File 'lib/minilex.rb', line 45 def skip(id, pattern) rules << Rule.new(id, pattern, nil, true) end |
#tok(id, pattern, processor = nil) ⇒ Object
Defines a token-matching rule
id - this token’s identifier pattern - a Regexp to match this token processor - a Sym that references a method on
this Lexer instance, which will
be called to produce the `value`
for this token (defaults to nil)
37 38 39 |
# File 'lib/minilex.rb', line 37 def tok(id, pattern, processor=nil) rules << Rule.new(id, pattern, processor) end |
#update_pos(text) ⇒ Object
- internal
-
Updates the position information
text - the String that was matched by ‘match`
Inspects the matched text for newlines and updates the line number and offset accordingly
115 116 117 118 119 120 121 122 |
# File 'lib/minilex.rb', line 115 def update_pos(text) pos.line += newlines = text.count(?\n) if newlines > 0 pos.offset = text.rpartition(?\n)[2].length else pos.offset += text.length end end |