Class: Rouge::RegexLexer Abstract

Inherits:
Lexer
  • Object
show all
Defined in:
lib/rouge/regex_lexer.rb

Overview

This class is abstract.

A stateful lexer that uses sets of regular expressions to tokenize a string. Most lexers are instances of RegexLexer.

Defined Under Namespace

Classes: Rule, State, StateDSL

Constant Summary collapse

MAX_NULL_SCANS =

The number of successive scans permitted without consuming the input stream. If this is exceeded, the match fails.

5

Constants included from Token::Tokens

Token::Tokens::Num, Token::Tokens::Str

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from Lexer

aliases, all, analyze_text, assert_utf8!, #debug, default_options, demo, demo_file, desc, filenames, find, find_fancy, guess, guess_by_filename, guess_by_mimetype, guess_by_source, guesses, #initialize, #lex, lex, mimetypes, #option, #options, tag, #tag

Methods included from Token::Tokens

token

Constructor Details

This class inherits a constructor from Rouge::Lexer

Class Method Details

.get_state(name) ⇒ Object


136
137
138
139
140
141
142
# File 'lib/rouge/regex_lexer.rb', line 136

def self.get_state(name)
  return name if name.is_a? State

  state = states[name.to_s]
  raise "unknown state: #{name}" unless state
  state.load!(self)
end

.start(&b) ⇒ Object

Specify an action to be run every fresh lex.

Examples:

start { puts "I'm lexing a new string!" }

124
125
126
# File 'lib/rouge/regex_lexer.rb', line 124

def self.start(&b)
  start_procs << b
end

.start_procsObject

The routines to run at the beginning of a fresh lex.

See Also:


115
116
117
# File 'lib/rouge/regex_lexer.rb', line 115

def self.start_procs
  @start_procs ||= InheritableList.new(superclass.start_procs)
end

.state(name, &b) ⇒ Object

Define a new state for this lexer with the given name. The block will be evaluated in the context of a StateDSL.


130
131
132
133
# File 'lib/rouge/regex_lexer.rb', line 130

def self.state(name, &b)
  name = name.to_s
  states[name] = State.new(name, &b)
end

.statesObject

The states hash for this lexer.

See Also:


109
110
111
# File 'lib/rouge/regex_lexer.rb', line 109

def self.states
  @states ||= {}
end

Instance Method Details

#delegate(lexer, text = nil) ⇒ Object

Delegate the lex to another lexer. The #lex method will be called with `:continue` set to true, so that #reset! will not be called. In this way, a single lexer can be repeatedly delegated to while maintaining its own internal state stack.

Parameters:

  • lexer (#lex)

    The lexer or lexer class to delegate to

  • text (String) (defaults to: nil)

    The text to delegate. This defaults to the last matched string.


303
304
305
306
307
308
309
310
311
# File 'lib/rouge/regex_lexer.rb', line 303

def delegate(lexer, text=nil)
  debug { "    delegating to #{lexer.inspect}" }
  text ||= @current_stream[0]

  lexer.lex(text, :continue => true) do |tok, val|
    debug { "    delegated token: #{tok.inspect}, #{val.inspect}" }
    token(tok, val)
  end
end

#get_state(state_name) ⇒ Object


145
146
147
# File 'lib/rouge/regex_lexer.rb', line 145

def get_state(state_name)
  self.class.get_state(state_name)
end

#goto(state_name) ⇒ Object

replace the head of the stack with the given state


343
344
345
346
# File 'lib/rouge/regex_lexer.rb', line 343

def goto(state_name)
  raise 'empty stack!' if stack.empty?
  stack[-1] = get_state(state_name)
end

#group(tok) ⇒ Object

Yield a token with the next matched group. Subsequent calls to this method will yield subsequent groups.


284
285
286
# File 'lib/rouge/regex_lexer.rb', line 284

def group(tok)
  yield_token(tok, @current_stream[@group_count += 1])
end

#groups(*tokens) ⇒ Object


288
289
290
291
292
# File 'lib/rouge/regex_lexer.rb', line 288

def groups(*tokens)
  tokens.each_with_index do |tok, i|
    yield_token(tok, @current_stream[i+1])
  end
end

#in_state?(state_name) ⇒ Boolean

Check if `state_name` is in the state stack.

Returns:

  • (Boolean)

356
357
358
359
360
361
# File 'lib/rouge/regex_lexer.rb', line 356

def in_state?(state_name)
  state_name = state_name.to_s
  stack.any? do |state|
    state.name == state_name.to_s
  end
end

#pop!(times = 1) ⇒ Object

Pop the state stack. If a number is passed in, it will be popped that number of times.


332
333
334
335
336
337
338
339
340
# File 'lib/rouge/regex_lexer.rb', line 332

def pop!(times=1)
  raise 'empty stack!' if stack.empty?

  debug { "    popping stack: #{times}" }

  stack.pop(times)

  nil
end

#push(state_name = nil, &b) ⇒ Object

Push a state onto the stack. If no state name is given and you've passed a block, a state will be dynamically created using the StateDSL.


316
317
318
319
320
321
322
323
324
325
326
327
328
# File 'lib/rouge/regex_lexer.rb', line 316

def push(state_name=nil, &b)
  push_state = if state_name
    get_state(state_name)
  elsif block_given?
    State.new(b.inspect, &b).load!(self.class)
  else
    # use the top of the stack by default
    self.state
  end

  debug { "    pushing #{push_state.name}" }
  stack.push(push_state)
end

#reset!Object

reset this lexer to its initial state. This runs all of the start_procs.


166
167
168
169
170
171
172
173
# File 'lib/rouge/regex_lexer.rb', line 166

def reset!
  @stack = nil
  @current_stream = nil

  self.class.start_procs.each do |pr|
    instance_eval(&pr)
  end
end

#reset_stackObject

reset the stack back to `[:root]`.


349
350
351
352
353
# File 'lib/rouge/regex_lexer.rb', line 349

def reset_stack
  debug { '    resetting stack' }
  stack.clear
  stack.push get_state(:root)
end

#run_callback(stream, callback, &output_stream) ⇒ Object


234
235
236
237
238
239
# File 'lib/rouge/regex_lexer.rb', line 234

def run_callback(stream, callback, &output_stream)
  with_output_stream(output_stream) do
    @group_count = 0
    instance_exec(stream, &callback)
  end
end

#run_rule(rule, scanner, &b) ⇒ Object


246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
# File 'lib/rouge/regex_lexer.rb', line 246

def run_rule(rule, scanner, &b)
  # XXX HACK XXX
  # StringScanner's implementation of ^ is b0rken.
  # see http://bugs.ruby-lang.org/issues/7092
  # TODO: this doesn't cover cases like /(a|^b)/, but it's
  # the most common, for now...
  return false if rule.beginning_of_line? && !scanner.beginning_of_line?

  if (@null_steps ||= 0) >= MAX_NULL_SCANS
    debug { "    too many scans without consuming the string!" }
    return false
  end

  scanner.scan(rule.re) or return false

  if scanner.matched_size.zero?
    @null_steps += 1
  else
    @null_steps = 0
  end

  true
end

#stackObject

The state stack. This is initially the single state `[:root]`. It is an error for this stack to be empty.

See Also:


152
153
154
# File 'lib/rouge/regex_lexer.rb', line 152

def stack
  @stack ||= [get_state(:root)]
end

#stateObject

The current state - i.e. one on top of the state stack.

NB: if the state stack is empty, this will throw an error rather than returning nil.


160
161
162
# File 'lib/rouge/regex_lexer.rb', line 160

def state
  stack.last or raise 'empty stack!'
end

#state?(state_name) ⇒ Boolean

Check if `state_name` is the state on top of the state stack.

Returns:

  • (Boolean)

364
365
366
# File 'lib/rouge/regex_lexer.rb', line 364

def state?(state_name)
  state_name.to_s == state.name
end

#step(state, stream, &b) ⇒ Object

Runs one step of the lex. Rules in the current state are tried until one matches, at which point its callback is called.

Returns:

  • true if a rule was tried successfully

  • false otherwise.


210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
# File 'lib/rouge/regex_lexer.rb', line 210

def step(state, stream, &b)
  state.rules.each do |rule|
    case rule
    when State
      debug { "  entering mixin #{rule.name}" }
      return true if step(rule, stream, &b)
      debug { "  exiting  mixin #{rule.name}" }
    when Rule
      debug { "  trying #{rule.inspect}" }

      if run_rule(rule, stream)
        debug { "    got #{stream[0].inspect}" }

        run_callback(stream, rule.callback, &b)

        return true
      end
    end
  end

  false
end

#stream_tokens(str, &b) ⇒ Object

This implements the lexer protocol, by yielding [token, value] pairs.

The process for lexing works as follows, until the stream is empty:

  1. We look at the state on top of the stack (which by default is `[:root]`).

  2. Each rule in that state is tried until one is successful. If one is found, that rule's callback is evaluated - which may yield tokens and manipulate the state stack. Otherwise, one character is consumed with an `'Error'` token, and we continue at (1.)


187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
# File 'lib/rouge/regex_lexer.rb', line 187

def stream_tokens(str, &b)
  stream = StringScanner.new(str)

  @current_stream = stream

  until stream.eos?
    debug { "lexer: #{self.class.tag}" }
    debug { "stack: #{stack.map(&:name).inspect}" }
    debug { "stream: #{stream.peek(20).inspect}" }
    success = step(get_state(state), stream, &b)

    if !success
      debug { "    no match, yielding Error" }
      b.call(Token::Tokens::Error, stream.getch)
    end
  end
end

#token(tok, val = :__absent__) ⇒ Object

Yield a token.

Parameters:

  • tok

    the token type

  • val (defaults to: :__absent__)

    (optional) the string value to yield. If absent, this defaults to the entire last match.


277
278
279
280
# File 'lib/rouge/regex_lexer.rb', line 277

def token(tok, val=:__absent__)
  val = @current_stream[0] if val == :__absent__
  yield_token(tok, val)
end