Class: Minilex::Lexer

Inherits:

Object

Object
Minilex::Lexer

show all

Defined in:: lib/minilex.rb

Instance Attribute Summary collapse

#pos ⇒ Object readonly

Returns the value of attribute pos.
#rules ⇒ Object readonly

Returns the value of attribute rules.
#scanner ⇒ Object readonly

Returns the value of attribute scanner.
#tokens ⇒ Object readonly

Returns the value of attribute tokens.

Instance Method Summary collapse

#append_eos ⇒ Object

Makes the end-of-stream token.
#append_token(id, value) ⇒ Object

Makes a token.
#initialize(&block) ⇒ Lexer constructor

Creates a Lexer instance.
#lex(input) ⇒ Object

Runs the lexer on the given input.
#match ⇒ Object
internal

Finds the matching rule.
#skip(id, pattern) ⇒ Object

Defines patterns to ignore.
#tok(id, pattern, processor = nil) ⇒ Object

Defines a token-matching rule.
#update_pos(text) ⇒ Object
internal

Updates the position information.

Constructor Details

#initialize(&block) ⇒ `Lexer`

Creates a Lexer instance

Expression = Minilex::Lexer.new do
  skip :whitespace, /\s+/
  tok :number, /\d+(?:\.\d+)?/
  tok :operator, /[\+\=\/\*]/
end

You don’t have to pass a block. This also works:

Expression = Minilex::Lexer.new
Expression.skip :whitespace, /\s+/
Expression.tok :number, /\d+(?:\.\d+)?/
Expression.tok :operator, /[\+\=\/\*]/

# File 'lib/minilex.rb', line 24

def initialize(&block)
  @rules = []
  instance_eval &block if block
end

Instance Attribute Details

#pos ⇒ `Object` (readonly)

Returns the value of attribute pos.



8
9
10

# File 'lib/minilex.rb', line 8

def pos
  @pos
end

#rules ⇒ `Object` (readonly)

Returns the value of attribute rules.



8
9
10

# File 'lib/minilex.rb', line 8

def rules
  @rules
end

#scanner ⇒ `Object` (readonly)

Returns the value of attribute scanner.



8
9
10

# File 'lib/minilex.rb', line 8

def scanner
  @scanner
end

#tokens ⇒ `Object` (readonly)

Returns the value of attribute tokens.



8
9
10

# File 'lib/minilex.rb', line 8

def tokens
  @tokens
end

Instance Method Details

#append_eos ⇒ `Object`

Makes the end-of-stream token

Similar to ‘append_token`, used to make the final token. Append [:eos] to the `tokens` array.



90
91
92

# File 'lib/minilex.rb', line 90

def append_eos
  tokens << [:eos]
end

#append_token(id, value) ⇒ `Object`

Makes a token

id - the id of the matched rule value - the value that was matched

Called when a rule is matched to build the resulting token.

Override this method if you’d like your tokens in a different form. You have access to the array of tokens via ‘tokens` and the current token’s position information via ‘pos`.

returns an Array of [id, value, line, offset]



82
83
84

# File 'lib/minilex.rb', line 82

def append_token(id, value)
  tokens << [id, value, pos.line, pos.offset]
end

#lex(input) ⇒ `Object`

Runs the lexer on the given input

returns an Array of tokens

# File 'lib/minilex.rb', line 52

def lex(input)
  @tokens = []
  @pos = Pos.new(1, 0)
  @scanner = StringScanner.new(input)

  until scanner.eos?
    rule, text = match
    value = rule.processor ? send(rule.processor, text) : text
    append_token(rule.id, value) unless rule.skip
    update_pos(text)
  end

  append_eos
  tokens
end

#match ⇒ `Object`

internal: Finds the matching rule

Tries the rules in defined order until there’s a match. Raise an UnrecognizedInput error if ther isn’t one.

returns a 2-element Array of [rule, matched_text]

Raises:

(UnrecognizedInput)

# File 'lib/minilex.rb', line 101

def match
  rules.each do |rule|
    next unless text = scanner.scan(rule.pattern)
    return [rule, text]
  end
  raise UnrecognizedInput.new(scanner, pos)
end

#skip(id, pattern) ⇒ `Object`

Defines patterns to ignore

id - an identifier, it’s nice to name things pattern - the Regexp to skip



45
46
47

# File 'lib/minilex.rb', line 45

def skip(id, pattern)
  rules << Rule.new(id, pattern, nil, true)
end

#tok(id, pattern, processor = nil) ⇒ `Object`

Defines a token-matching rule

id - this token’s identifier pattern - a Regexp to match this token processor - a Sym that references a method on

this Lexer instance, which will
be called to produce the `value`
for this token (defaults to nil)



37
38
39

# File 'lib/minilex.rb', line 37

def tok(id, pattern, processor=nil)
  rules << Rule.new(id, pattern, processor)
end

#update_pos(text) ⇒ `Object`

internal: Updates the position information

text - the String that was matched by ‘match`

Inspects the matched text for newlines and updates the line number and offset accordingly

# File 'lib/minilex.rb', line 115

def update_pos(text)
  pos.line += newlines = text.count(?\n)
  if newlines > 0
    pos.offset = text.rpartition(?\n)[2].length
  else
    pos.offset += text.length
  end
end

Class: Minilex::Lexer

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(&block) ⇒ Lexer

Instance Attribute Details

#pos ⇒ Object (readonly)

#rules ⇒ Object (readonly)

#scanner ⇒ Object (readonly)

#tokens ⇒ Object (readonly)

Instance Method Details

#append_eos ⇒ Object

#append_token(id, value) ⇒ Object

#lex(input) ⇒ Object

#match ⇒ Object

#skip(id, pattern) ⇒ Object

#tok(id, pattern, processor = nil) ⇒ Object

#update_pos(text) ⇒ Object