Module: Regexp::Lexer

Defined in:: lib/regexp_parser/lexer.rb

Overview

A very thin wrapper around the scanner that breaks quantified literal runs, collects emitted tokens into an array, calculates their nesting depth, and normalizes tokens for the parser, and checks if they are implemented by the given syntax flavor.

Constant Summary collapse

OPENING_TOKENS =

[:capture, :options, :passive, :atomic, :named,
 :lookahead, :nlookahead, :lookbehind, :nlookbehind
].freeze

CLOSING_TOKENS =

[:close].freeze

Class Method Summary collapse

.ascend(type, token) ⇒ Object
.break_literal(token) ⇒ Object

called by scan to break a literal run that is longer than one character into two separate tokens when it is followed by a quantifier.
.descend(type, token) ⇒ Object
.merge_literal(current) ⇒ Object

called by scan to merge two consecutive literals.
.scan(input, syntax = "ruby/#{RUBY_VERSION}", &block) ⇒ Object

Class Method Details

.ascend(type, token) ⇒ `Object`

# File 'lib/regexp_parser/lexer.rb', line 51

def self.ascend(type, token)
  if type == :group or type == :assertion
    @nesting -= 1 if CLOSING_TOKENS.include?(token)
  end

  if type == :set or type == :subset
    @set_nesting -= 1 if token == :close
  end
end

.break_literal(token) ⇒ `Object`

called by scan to break a literal run that is longer than one character into two separate tokens when it is followed by a quantifier

# File 'lib/regexp_parser/lexer.rb', line 73

def self.break_literal(token)
  text = token.text
  if text.scan(/./mu).length > 1
    lead = text.sub(/.\z/mu, "")
    last = text[/.\z/mu] || ''

    if RUBY_VERSION >= '1.9'
      lead_length = lead.bytesize
      last_length = last.bytesize
    else
      lead_length = lead.length
      last_length = last.length
    end

    @tokens.pop
    @tokens << Regexp::Token.new(:literal, :literal, lead, token.ts,
                                 (token.te - last_length), @nesting, @set_nesting)

    @tokens << Regexp::Token.new(:literal, :literal, last,
                                 (token.ts + lead_length),
                                 token.te, @nesting, @set_nesting)
  end
end

.descend(type, token) ⇒ `Object`

# File 'lib/regexp_parser/lexer.rb', line 61

def self.descend(type, token)
  if type == :group or type == :assertion
    @nesting += 1 if OPENING_TOKENS.include?(token)
  end

  if type == :set or type == :subset
    @set_nesting += 1 if token == :open
  end
end

.merge_literal(current) ⇒ `Object`

called by scan to merge two consecutive literals. this happens when tokens get normalized (as in the case of posix/bre) and end up becoming literals.

# File 'lib/regexp_parser/lexer.rb', line 99

def self.merge_literal(current)
  last = @tokens.pop
  replace = Regexp::Token.new(:literal, :literal, last.text + current.text,
                                 last.ts, current.te, @nesting, @set_nesting)
end

.scan(input, syntax = "ruby/#{RUBY_VERSION}", &block) ⇒ `Object`

# File 'lib/regexp_parser/lexer.rb', line 13

def self.scan(input, syntax = "ruby/#{RUBY_VERSION}", &block)
  syntax = Regexp::Syntax.new(syntax)

  @tokens = []
  @nesting, @set_nesting = 0, 0

  last = nil
  Regexp::Scanner.scan(input) do |type, token, text, ts, te|
    type, token = *syntax.normalize(type, token)
    syntax.check! type, token

    ascend(type, token)

    break_literal(last) if type == :quantifier and
      last and last.type == :literal

    current = Regexp::Token.new(type, token, text, ts, te,
                                @nesting, @set_nesting)

    current = merge_literal(current) if type == :literal and
      last and last.type == :literal

    last.next(current) if last
    current.previous(last) if last

    @tokens << current
    last = current

    descend(type, token)
  end

  if block_given?
    @tokens.each {|t| block.call(t)}
  else
    @tokens
  end
end

Module: Regexp::Lexer

Overview

Constant Summary collapse

Class Method Summary collapse

Class Method Details

.ascend(type, token) ⇒ Object

.break_literal(token) ⇒ Object

.descend(type, token) ⇒ Object

.merge_literal(current) ⇒ Object

.scan(input, syntax = "ruby/#{RUBY_VERSION}", &block) ⇒ Object

.ascend(type, token) ⇒ `Object`

.break_literal(token) ⇒ `Object`

.descend(type, token) ⇒ `Object`

.merge_literal(current) ⇒ `Object`

.scan(input, syntax = "ruby/#{RUBY_VERSION}", &block) ⇒ `Object`