Class: Rouge::RegexLexer Abstract

Inherits:
Lexer
  • Object
show all
Defined in:
lib/rouge/regex_lexer.rb

Overview

This class is abstract.

A stateful lexer that uses sets of regular expressions to tokenize a string. Most lexers are instances of RegexLexer.

Defined Under Namespace

Classes: Rule, State, StateDSL

Constant Summary collapse

MAX_NULL_SCANS =

The number of successive scans permitted without consuming the input stream. If this is exceeded, the match fails.

5

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from Lexer

aliases, all, analyze_text, assert_utf8!, #debug, default_options, demo, demo_file, desc, filenames, find, find_fancy, guess, guess_by_filename, guess_by_mimetype, guess_by_source, guesses, #initialize, #lex, lex, mimetypes, #option, #options, tag, #tag

Constructor Details

This class inherits a constructor from Rouge::Lexer

Class Method Details

.get_state(name) ⇒ Object


136
137
138
139
140
141
142
# File 'lib/rouge/regex_lexer.rb', line 136

def self.get_state(name)
  return name if name.is_a? State

  state = states[name.to_s]
  raise "unknown state: #{name}" unless state
  state.load!(self)
end

.start(&b) ⇒ Object

Specify an action to be run every fresh lex.

Examples:

start { puts "I'm lexing a new string!" }

124
125
126
# File 'lib/rouge/regex_lexer.rb', line 124

def self.start(&b)
  start_procs << b
end

.start_procsObject

The routines to run at the beginning of a fresh lex.

See Also:


115
116
117
# File 'lib/rouge/regex_lexer.rb', line 115

def self.start_procs
  @start_procs ||= InheritableList.new(superclass.start_procs)
end

.state(name, &b) ⇒ Object

Define a new state for this lexer with the given name. The block will be evaluated in the context of a StateDSL.


130
131
132
133
# File 'lib/rouge/regex_lexer.rb', line 130

def self.state(name, &b)
  name = name.to_s
  states[name] = State.new(name, &b)
end

.statesObject

The states hash for this lexer.

See Also:


109
110
111
# File 'lib/rouge/regex_lexer.rb', line 109

def self.states
  @states ||= {}
end

Instance Method Details

#delegate(lexer, text = nil) ⇒ Object

Delegate the lex to another lexer. The #lex method will be called with `:continue` set to true, so that #reset! will not be called. In this way, a single lexer can be repeatedly delegated to while maintaining its own internal state stack.

Parameters:

  • lexer (#lex)

    The lexer or lexer class to delegate to

  • text (String) (defaults to: nil)

    The text to delegate. This defaults to the last matched string.


300
301
302
303
304
305
306
307
308
# File 'lib/rouge/regex_lexer.rb', line 300

def delegate(lexer, text=nil)
  debug { "    delegating to #{lexer.inspect}" }
  text ||= @last_match[0]

  lexer.lex(text, :continue => true) do |tok, val|
    debug { "    delegated token: #{tok.inspect}, #{val.inspect}" }
    token(tok, val)
  end
end

#get_state(state_name) ⇒ Object


145
146
147
# File 'lib/rouge/regex_lexer.rb', line 145

def get_state(state_name)
  self.class.get_state(state_name)
end

#group(tok) ⇒ Object

Yield a token with the next matched group. Subsequent calls to this method will yield subsequent groups.


287
288
289
# File 'lib/rouge/regex_lexer.rb', line 287

def group(tok)
  token(tok, @last_match[@group_count += 1])
end

#in_state?(state_name) ⇒ Boolean

Check if `state_name` is in the state stack.

Returns:

  • (Boolean)

347
348
349
# File 'lib/rouge/regex_lexer.rb', line 347

def in_state?(state_name)
  stack.map(&:name).include? state_name.to_s
end

#pop!(times = 1) ⇒ Object

Pop the state stack. If a number is passed in, it will be popped that number of times.


329
330
331
332
333
334
335
336
337
# File 'lib/rouge/regex_lexer.rb', line 329

def pop!(times=1)
  raise 'empty stack!' if stack.empty?

  debug { "    popping stack: #{times}" }

  stack.pop(times)

  nil
end

#push(state_name = nil, &b) ⇒ Object

Push a state onto the stack. If no state name is given and you've passed a block, a state will be dynamically created using the StateDSL.


313
314
315
316
317
318
319
320
321
322
323
324
325
# File 'lib/rouge/regex_lexer.rb', line 313

def push(state_name=nil, &b)
  push_state = if state_name
    get_state(state_name)
  elsif block_given?
    State.new(b.inspect, &b).load!(self.class)
  else
    # use the top of the stack by default
    self.state
  end

  debug { "    pushing #{push_state.name}" }
  stack.push(push_state)
end

#reset!Object

reset this lexer to its initial state. This runs all of the start_procs.


166
167
168
169
170
171
172
# File 'lib/rouge/regex_lexer.rb', line 166

def reset!
  @stack = nil

  self.class.start_procs.each do |pr|
    instance_eval(&pr)
  end
end

#reset_stackObject

reset the stack back to `[:root]`.


340
341
342
343
344
# File 'lib/rouge/regex_lexer.rb', line 340

def reset_stack
  debug { '    resetting stack' }
  stack.clear
  stack.push get_state(:root)
end

#run_callback(stream, callback, &output_stream) ⇒ Object


231
232
233
234
235
236
237
238
# File 'lib/rouge/regex_lexer.rb', line 231

def run_callback(stream, callback, &output_stream)
  with_output_stream(output_stream) do
    @group_count = 0
    @last_match = stream
    instance_exec(stream, &callback)
    @last_match = nil
  end
end

#run_rule(rule, scanner, &b) ⇒ Object


245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
# File 'lib/rouge/regex_lexer.rb', line 245

def run_rule(rule, scanner, &b)
  # XXX HACK XXX
  # StringScanner's implementation of ^ is b0rken.
  # see http://bugs.ruby-lang.org/issues/7092
  # TODO: this doesn't cover cases like /(a|^b)/, but it's
  # the most common, for now...
  return false if rule.beginning_of_line? && !scanner.beginning_of_line?

  if (@null_steps ||= 0) >= MAX_NULL_SCANS
    debug { "    too many scans without consuming the string!" }
    return false
  end

  scanner.scan(rule.re) or return false

  if scanner.matched_size.zero?
    @null_steps += 1
  else
    @null_steps = 0
  end

  true
end

#stackObject

The state stack. This is initially the single state `[:root]`. It is an error for this stack to be empty.

See Also:


152
153
154
# File 'lib/rouge/regex_lexer.rb', line 152

def stack
  @stack ||= [get_state(:root)]
end

#stateObject

The current state - i.e. one on top of the state stack.

NB: if the state stack is empty, this will throw an error rather than returning nil.


160
161
162
# File 'lib/rouge/regex_lexer.rb', line 160

def state
  stack.last or raise 'empty stack!'
end

#state?(state_name) ⇒ Boolean

Check if `state_name` is the state on top of the state stack.

Returns:

  • (Boolean)

352
353
354
# File 'lib/rouge/regex_lexer.rb', line 352

def state?(state_name)
  state_name.to_s == state.name
end

#step(state, stream, &b) ⇒ Object

Runs one step of the lex. Rules in the current state are tried until one matches, at which point its callback is called.

Returns:

  • true if a rule was tried successfully

  • false otherwise.


207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
# File 'lib/rouge/regex_lexer.rb', line 207

def step(state, stream, &b)
  state.rules.each do |rule|
    case rule
    when State
      debug { "  entering mixin #{rule.name}" }
      return true if step(rule, stream, &b)
      debug { "  exiting  mixin #{rule.name}" }
    when Rule
      debug { "  trying #{rule.inspect}" }

      if run_rule(rule, stream)
        debug { "    got #{stream[0].inspect}" }

        run_callback(stream, rule.callback, &b)

        return true
      end
    end
  end

  false
end

#stream_tokens(str, &b) ⇒ Object

This implements the lexer protocol, by yielding [token, value] pairs.

The process for lexing works as follows, until the stream is empty:

  1. We look at the state on top of the stack (which by default is `[:root]`).

  2. Each rule in that state is tried until one is successful. If one is found, that rule's callback is evaluated - which may yield tokens and manipulate the state stack. Otherwise, one character is consumed with an `'Error'` token, and we continue at (1.)


186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
# File 'lib/rouge/regex_lexer.rb', line 186

def stream_tokens(str, &b)
  stream = StringScanner.new(str)

  until stream.eos?
    debug { "lexer: #{self.class.tag}" }
    debug { "stack: #{stack.map(&:name).inspect}" }
    debug { "stream: #{stream.peek(20).inspect}" }
    success = step(get_state(state), stream, &b)

    if !success
      debug { "    no match, yielding Error" }
      b.call(Token['Error'], stream.getch)
    end
  end
end

#token(tok, val = :__absent__) ⇒ Object

Yield a token.

Parameters:

  • tok

    the token type

  • val (defaults to: :__absent__)

    (optional) the string value to yield. If absent, this defaults to the entire last match.


276
277
278
279
280
281
282
283
# File 'lib/rouge/regex_lexer.rb', line 276

def token(tok, val=:__absent__)
  val = @last_match[0] if val == :__absent__
  val ||= ''

  raise 'no output stream' unless @output_stream

  @output_stream << [Token[tok], val] unless val.empty?
end