Class: Rouge::RegexLexer Abstract

Inherits:
Lexer
  • Object
show all
Defined in:
lib/rouge/regex_lexer.rb

Overview

This class is abstract.

A stateful lexer that uses sets of regular expressions to tokenize a string. Most lexers are instances of RegexLexer.

Defined Under Namespace

Classes: Rule, State, StateDSL

Constant Summary collapse

MAX_NULL_SCANS =

The number of successive scans permitted without consuming the input stream. If this is exceeded, the match fails.

5

Constants included from Token::Tokens

Token::Tokens::Num, Token::Tokens::Str

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from Lexer

aliases, all, analyze_text, assert_utf8!, #debug, default_options, demo, demo_file, desc, filenames, find, find_fancy, guess, guess_by_filename, guess_by_mimetype, guess_by_source, guesses, #initialize, lex, #lex, mimetypes, #option, #options, #tag, tag

Methods included from Token::Tokens

token

Constructor Details

This class inherits a constructor from Rouge::Lexer

Class Method Details

.append(state, &b) ⇒ Object



173
174
175
176
177
# File 'lib/rouge/regex_lexer.rb', line 173

def self.append(state, &b)
  name = name.to_s
  dsl = state_definitions[name] or raise "no such state #{name.inspect}"
  replace_state(name, dsl.appended(&b))
end

.get_state(name) ⇒ Object



180
181
182
183
184
185
186
187
188
189
# File 'lib/rouge/regex_lexer.rb', line 180

def self.get_state(name)
  return name if name.is_a? State

  name = name.to_s

  states[name] ||= begin
    defn = state_definitions[name] or raise "unknown state: #{name.inspect}"
    defn.to_state(self)
  end
end

.prepend(name, &b) ⇒ Object



167
168
169
170
171
# File 'lib/rouge/regex_lexer.rb', line 167

def self.prepend(name, &b)
  name = name.to_s
  dsl = state_definitions[name] or raise "no such state #{name.inspect}"
  replace_state(name, dsl.prepended(&b))
end

.replace_state(name, new_defn) ⇒ Object



140
141
142
143
# File 'lib/rouge/regex_lexer.rb', line 140

def self.replace_state(name, new_defn)
  states[name] = nil
  state_definitions[name] = new_defn
end

.start(&b) ⇒ Object

Specify an action to be run every fresh lex.

Examples:

start { puts "I'm lexing a new string!" }


156
157
158
# File 'lib/rouge/regex_lexer.rb', line 156

def self.start(&b)
  start_procs << b
end

.start_procsObject

The routines to run at the beginning of a fresh lex.

See Also:



147
148
149
# File 'lib/rouge/regex_lexer.rb', line 147

def self.start_procs
  @start_procs ||= InheritableList.new(superclass.start_procs)
end

.state(name, &b) ⇒ Object

Define a new state for this lexer with the given name. The block will be evaluated in the context of a StateDSL.



162
163
164
165
# File 'lib/rouge/regex_lexer.rb', line 162

def self.state(name, &b)
  name = name.to_s
  state_definitions[name] = StateDSL.new(name, &b)
end

.state_definitionsObject



135
136
137
# File 'lib/rouge/regex_lexer.rb', line 135

def self.state_definitions
  @state_definitions ||= InheritableHash.new(superclass.state_definitions)
end

.statesObject

The states hash for this lexer.

See Also:



131
132
133
# File 'lib/rouge/regex_lexer.rb', line 131

def self.states
  @states ||= {}
end

Instance Method Details

#delegate(lexer, text = nil) ⇒ Object

Delegate the lex to another lexer. The #lex method will be called with ‘:continue` set to true, so that #reset! will not be called. In this way, a single lexer can be repeatedly delegated to while maintaining its own internal state stack.

Parameters:

  • lexer (#lex)

    The lexer or lexer class to delegate to

  • text (String) (defaults to: nil)

    The text to delegate. This defaults to the last matched string.



350
351
352
353
354
355
356
357
358
# File 'lib/rouge/regex_lexer.rb', line 350

def delegate(lexer, text=nil)
  debug { "    delegating to #{lexer.inspect}" }
  text ||= @current_stream[0]

  lexer.lex(text, :continue => true) do |tok, val|
    debug { "    delegated token: #{tok.inspect}, #{val.inspect}" }
    yield_token(tok, val)
  end
end

#get_state(state_name) ⇒ Object



192
193
194
# File 'lib/rouge/regex_lexer.rb', line 192

def get_state(state_name)
  self.class.get_state(state_name)
end

#goto(state_name) ⇒ Object

replace the head of the stack with the given state



394
395
396
397
# File 'lib/rouge/regex_lexer.rb', line 394

def goto(state_name)
  raise 'empty stack!' if stack.empty?
  stack[-1] = get_state(state_name)
end

#group(tok) ⇒ Object

Yield a token with the next matched group. Subsequent calls to this method will yield subsequent groups.



331
332
333
# File 'lib/rouge/regex_lexer.rb', line 331

def group(tok)
  yield_token(tok, @current_stream[@group_count += 1])
end

#groups(*tokens) ⇒ Object



335
336
337
338
339
# File 'lib/rouge/regex_lexer.rb', line 335

def groups(*tokens)
  tokens.each_with_index do |tok, i|
    yield_token(tok, @current_stream[i+1])
  end
end

#in_state?(state_name) ⇒ Boolean

Check if ‘state_name` is in the state stack.

Returns:

  • (Boolean)


407
408
409
410
411
412
# File 'lib/rouge/regex_lexer.rb', line 407

def in_state?(state_name)
  state_name = state_name.to_s
  stack.any? do |state|
    state.name == state_name.to_s
  end
end

#pop!(times = 1) ⇒ Object

Pop the state stack. If a number is passed in, it will be popped that number of times.



383
384
385
386
387
388
389
390
391
# File 'lib/rouge/regex_lexer.rb', line 383

def pop!(times=1)
  raise 'empty stack!' if stack.empty?

  debug { "    popping stack: #{times}" }

  stack.pop(times)

  nil
end

#push(state_name = nil, &b) ⇒ Object

Push a state onto the stack. If no state name is given and you’ve passed a block, a state will be dynamically created using the StateDSL.



367
368
369
370
371
372
373
374
375
376
377
378
379
# File 'lib/rouge/regex_lexer.rb', line 367

def push(state_name=nil, &b)
  push_state = if state_name
    get_state(state_name)
  elsif block_given?
    StateDSL.new(b.inspect, &b).to_state(self.class)
  else
    # use the top of the stack by default
    self.state
  end

  debug { "    pushing #{push_state.name}" }
  stack.push(push_state)
end

#recurse(text = nil) ⇒ Object



360
361
362
# File 'lib/rouge/regex_lexer.rb', line 360

def recurse(text=nil)
  delegate(self.class, text)
end

#reset!Object

reset this lexer to its initial state. This runs all of the start_procs.



213
214
215
216
217
218
219
220
# File 'lib/rouge/regex_lexer.rb', line 213

def reset!
  @stack = nil
  @current_stream = nil

  self.class.start_procs.each do |pr|
    instance_eval(&pr)
  end
end

#reset_stackObject

reset the stack back to ‘[:root]`.



400
401
402
403
404
# File 'lib/rouge/regex_lexer.rb', line 400

def reset_stack
  debug { '    resetting stack' }
  stack.clear
  stack.push get_state(:root)
end

#run_callback(stream, callback, &output_stream) ⇒ Object



281
282
283
284
285
286
# File 'lib/rouge/regex_lexer.rb', line 281

def run_callback(stream, callback, &output_stream)
  with_output_stream(output_stream) do
    @group_count = 0
    instance_exec(stream, &callback)
  end
end

#run_rule(rule, scanner, &b) ⇒ Object



293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
# File 'lib/rouge/regex_lexer.rb', line 293

def run_rule(rule, scanner, &b)
  # XXX HACK XXX
  # StringScanner's implementation of ^ is b0rken.
  # see http://bugs.ruby-lang.org/issues/7092
  # TODO: this doesn't cover cases like /(a|^b)/, but it's
  # the most common, for now...
  return false if rule.beginning_of_line? && !scanner.beginning_of_line?

  if (@null_steps ||= 0) >= MAX_NULL_SCANS
    debug { "    too many scans without consuming the string!" }
    return false
  end

  scanner.scan(rule.re) or return false

  if scanner.matched_size.zero?
    @null_steps += 1
  else
    @null_steps = 0
  end

  true
end

#stackObject

The state stack. This is initially the single state ‘[:root]`. It is an error for this stack to be empty.

See Also:



199
200
201
# File 'lib/rouge/regex_lexer.rb', line 199

def stack
  @stack ||= [get_state(:root)]
end

#stateObject

The current state - i.e. one on top of the state stack.

NB: if the state stack is empty, this will throw an error rather than returning nil.



207
208
209
# File 'lib/rouge/regex_lexer.rb', line 207

def state
  stack.last or raise 'empty stack!'
end

#state?(state_name) ⇒ Boolean

Check if ‘state_name` is the state on top of the state stack.

Returns:

  • (Boolean)


415
416
417
# File 'lib/rouge/regex_lexer.rb', line 415

def state?(state_name)
  state_name.to_s == state.name
end

#step(state, stream, &b) ⇒ Object

Runs one step of the lex. Rules in the current state are tried until one matches, at which point its callback is called.

Returns:

  • true if a rule was tried successfully

  • false otherwise.



257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
# File 'lib/rouge/regex_lexer.rb', line 257

def step(state, stream, &b)
  state.rules.each do |rule|
    case rule
    when State
      debug { "  entering mixin #{rule.name}" }
      return true if step(rule, stream, &b)
      debug { "  exiting  mixin #{rule.name}" }
    when Rule
      debug { "  trying #{rule.inspect}" }

      if run_rule(rule, stream)
        debug { "    got #{stream[0].inspect}" }

        run_callback(stream, rule.callback, &b)

        return true
      end
    end
  end

  false
end

#stream_tokens(str, &b) ⇒ Object

This implements the lexer protocol, by yielding [token, value] pairs.

The process for lexing works as follows, until the stream is empty:

  1. We look at the state on top of the stack (which by default is ‘[:root]`).

  2. Each rule in that state is tried until one is successful. If one is found, that rule’s callback is evaluated - which may yield tokens and manipulate the state stack. Otherwise, one character is consumed with an ‘’Error’‘ token, and we continue at (1.)



234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
# File 'lib/rouge/regex_lexer.rb', line 234

def stream_tokens(str, &b)
  stream = StringScanner.new(str)

  @current_stream = stream

  until stream.eos?
    debug { "lexer: #{self.class.tag}" }
    debug { "stack: #{stack.map(&:name).inspect}" }
    debug { "stream: #{stream.peek(20).inspect}" }
    success = step(get_state(state), stream, &b)

    if !success
      debug { "    no match, yielding Error" }
      b.call(Token::Tokens::Error, stream.getch)
    end
  end
end

#token(tok, val = :__absent__) ⇒ Object

Yield a token.

Parameters:

  • tok

    the token type

  • val (defaults to: :__absent__)

    (optional) the string value to yield. If absent, this defaults to the entire last match.



324
325
326
327
# File 'lib/rouge/regex_lexer.rb', line 324

def token(tok, val=:__absent__)
  val = @current_stream[0] if val == :__absent__
  yield_token(tok, val)
end