Class: Minilex::Lexer

Inherits:
Object
  • Object
show all
Defined in:
lib/minilex.rb

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(&block) ⇒ Lexer

Creates a Lexer instance

Expression = Minilex::Lexer.new do
  skip :whitespace, /\s+/
  tok :number, /\d+(?:\.\d+)?/
  tok :operator, /[\+\=\/\*]/
end

You don’t have to pass a block. This also works:

Expression = Minilex::Lexer.new
Expression.skip :whitespace, /\s+/
Expression.tok :number, /\d+(?:\.\d+)?/
Expression.tok :operator, /[\+\=\/\*]/


24
25
26
27
# File 'lib/minilex.rb', line 24

def initialize(&block)
  @rules = []
  instance_eval &block if block
end

Instance Attribute Details

#posObject (readonly)

Returns the value of attribute pos.



8
9
10
# File 'lib/minilex.rb', line 8

def pos
  @pos
end

#rulesObject (readonly)

Returns the value of attribute rules.



8
9
10
# File 'lib/minilex.rb', line 8

def rules
  @rules
end

#scannerObject (readonly)

Returns the value of attribute scanner.



8
9
10
# File 'lib/minilex.rb', line 8

def scanner
  @scanner
end

#tokensObject (readonly)

Returns the value of attribute tokens.



8
9
10
# File 'lib/minilex.rb', line 8

def tokens
  @tokens
end

Instance Method Details

#append_eosObject

Makes the end-of-stream token

Similar to ‘append_token`, used to make the final token. Append [:eos] to the `tokens` array.



90
91
92
# File 'lib/minilex.rb', line 90

def append_eos
  tokens << [:eos]
end

#append_token(id, value) ⇒ Object

Makes a token

id - the id of the matched rule value - the value that was matched

Called when a rule is matched to build the resulting token.

Override this method if you’d like your tokens in a different form. You have access to the array of tokens via ‘tokens` and the current token’s position information via ‘pos`.

returns an Array of [id, value, line, offset]



82
83
84
# File 'lib/minilex.rb', line 82

def append_token(id, value)
  tokens << [id, value, pos.line, pos.offset]
end

#lex(input) ⇒ Object

Runs the lexer on the given input

returns an Array of tokens



52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
# File 'lib/minilex.rb', line 52

def lex(input)
  @tokens = []
  @pos = Pos.new(1, 0)
  @scanner = StringScanner.new(input)

  until scanner.eos?
    rule, text = match
    value = rule.processor ? send(rule.processor, text) : text
    append_token(rule.id, value) unless rule.skip
    update_pos(text)
  end

  append_eos
  tokens
end

#matchObject

internal

Finds the matching rule

Tries the rules in defined order until there’s a match. Raise an UnrecognizedInput error if ther isn’t one.

returns a 2-element Array of [rule, matched_text]

Raises:



101
102
103
104
105
106
107
# File 'lib/minilex.rb', line 101

def match
  rules.each do |rule|
    next unless text = scanner.scan(rule.pattern)
    return [rule, text]
  end
  raise UnrecognizedInput.new(scanner, pos)
end

#skip(id, pattern) ⇒ Object

Defines patterns to ignore

id - an identifier, it’s nice to name things pattern - the Regexp to skip



45
46
47
# File 'lib/minilex.rb', line 45

def skip(id, pattern)
  rules << Rule.new(id, pattern, nil, true)
end

#tok(id, pattern, processor = nil) ⇒ Object

Defines a token-matching rule

id - this token’s identifier pattern - a Regexp to match this token processor - a Sym that references a method on

this Lexer instance, which will
be called to produce the `value`
for this token (defaults to nil)


37
38
39
# File 'lib/minilex.rb', line 37

def tok(id, pattern, processor=nil)
  rules << Rule.new(id, pattern, processor)
end

#update_pos(text) ⇒ Object

internal

Updates the position information

text - the String that was matched by ‘match`

Inspects the matched text for newlines and updates the line number and offset accordingly



115
116
117
118
119
120
121
122
# File 'lib/minilex.rb', line 115

def update_pos(text)
  pos.line += newlines = text.count(?\n)
  if newlines > 0
    pos.offset = text.rpartition(?\n)[2].length
  else
    pos.offset += text.length
  end
end