Class: Rouge::RegexLexer Abstract
Overview
A stateful lexer that uses sets of regular expressions to tokenize a string. Most lexers are instances of RegexLexer.
Direct Known Subclasses
Lexers::C, Lexers::CSS, Lexers::Clojure, Lexers::Coffeescript, Lexers::CommonLisp, Lexers::Cpp, Lexers::Diff, Lexers::Factor, Lexers::Groovy, Lexers::HTML, Lexers::Haml, Lexers::Haskell, Lexers::JSON, Lexers::Java, Lexers::Javascript, Lexers::Make, Lexers::Markdown, Lexers::Perl, Lexers::Python, Lexers::Ruby, Lexers::SQL, Lexers::Sass, Lexers::Scheme, Lexers::Scss, Lexers::Shell, Lexers::Smalltalk, Lexers::TCL, Lexers::TeX, Lexers::VimL, Lexers::XML, Lexers::YAML, TemplateLexer
Defined Under Namespace
Classes: Rule, State, StateDSL
Constant Summary collapse
- MAX_NULL_SCANS =
The number of successive scans permitted without consuming the input stream. If this is exceeded, the match fails.
5
Class Method Summary collapse
- .get_state(name) ⇒ Object
-
.postprocess(toktype) {|tok, val| ... } ⇒ Object
Specify a filter to be applied as the lexer yields tokens.
-
.postprocesses ⇒ Object
where the postprocess blocks are stored.
-
.start(&b) ⇒ Object
Specify an action to be run every fresh lex.
-
.start_procs ⇒ Object
The routines to run at the beginning of a fresh lex.
-
.state(name, &b) ⇒ Object
Define a new state for this lexer with the given name.
-
.states ⇒ Object
The states hash for this lexer.
Instance Method Summary collapse
-
#delegate(lexer, text = nil) ⇒ Object
Delegate the lex to another lexer.
- #get_state(state_name) ⇒ Object
-
#group(tok) ⇒ Object
Yield a token with the next matched group.
-
#in_state?(state_name) ⇒ Boolean
Check if ‘state_name` is in the state stack.
-
#pop!(times = 1) ⇒ Object
Pop the state stack.
-
#push(state_name = nil, &b) ⇒ Object
Push a state onto the stack.
-
#reset! ⇒ Object
reset this lexer to its initial state.
-
#reset_stack ⇒ Object
reset the stack back to ‘[:root]`.
- #run_callback(stream, callback, &output_stream) ⇒ Object
- #run_rule(rule, stream, &b) ⇒ Object
- #scan(scanner, re, &b) ⇒ Object
-
#stack ⇒ Object
The state stack.
-
#state ⇒ Object
The current state - i.e.
-
#state?(state_name) ⇒ Boolean
Check if ‘state_name` is the state on top of the state stack.
-
#step(state, stream, &b) ⇒ Object
Runs one step of the lex.
-
#stream_tokens(stream, &b) ⇒ Object
This implements the lexer protocol, by yielding [token, value] pairs.
- #stream_without_postprocessing(stream, &b) ⇒ Object
-
#token(tok, val = :__absent__) ⇒ Object
Yield a token.
Methods inherited from Lexer
aliases, all, analyze_text, assert_utf8!, #debug, default_options, demo, demo_file, desc, filenames, find, find_fancy, guess, guess_by_filename, guess_by_mimetype, guess_by_source, #initialize, lex, #lex, mimetypes, #option, #options, register, #tag, tag
Constructor Details
This class inherits a constructor from Rouge::Lexer
Class Method Details
.get_state(name) ⇒ Object
147 148 149 150 151 152 153 |
# File 'lib/rouge/regex_lexer.rb', line 147 def self.get_state(name) return name if name.is_a? State state = states[name.to_s] raise "unknown state: #{name}" unless state state.load! end |
.postprocess(toktype) {|tok, val| ... } ⇒ Object
Specify a filter to be applied as the lexer yields tokens.
128 129 130 |
# File 'lib/rouge/regex_lexer.rb', line 128 def self.postprocess(toktype, &b) postprocesses << [Token[toktype], b] end |
.postprocesses ⇒ Object
where the postprocess blocks are stored.
134 135 136 |
# File 'lib/rouge/regex_lexer.rb', line 134 def self.postprocesses @postprocesses ||= InheritableList.new(superclass.postprocesses) end |
.start(&b) ⇒ Object
Specify an action to be run every fresh lex.
116 117 118 |
# File 'lib/rouge/regex_lexer.rb', line 116 def self.start(&b) start_procs << b end |
.start_procs ⇒ Object
The routines to run at the beginning of a fresh lex.
107 108 109 |
# File 'lib/rouge/regex_lexer.rb', line 107 def self.start_procs @start_procs ||= InheritableList.new(superclass.start_procs) end |
.state(name, &b) ⇒ Object
Define a new state for this lexer with the given name. The block will be evaluated in the context of a StateDSL.
141 142 143 144 |
# File 'lib/rouge/regex_lexer.rb', line 141 def self.state(name, &b) name = name.to_s states[name] = State.new(name, &b) end |
.states ⇒ Object
The states hash for this lexer.
101 102 103 |
# File 'lib/rouge/regex_lexer.rb', line 101 def self.states @states ||= {} end |
Instance Method Details
#delegate(lexer, text = nil) ⇒ Object
Delegate the lex to another lexer. The #lex method will be called with ‘:continue` set to true, so that #reset! will not be called. In this way, a single lexer can be repeatedly delegated to while maintaining its own internal state stack.
333 334 335 336 337 338 339 340 341 |
# File 'lib/rouge/regex_lexer.rb', line 333 def delegate(lexer, text=nil) debug { " delegating to #{lexer.inspect}" } text ||= @last_match[0] lexer.lex(text, :continue => true) do |tok, val| debug { " delegated token: #{tok.inspect}, #{val.inspect}" } token(tok, val) end end |
#get_state(state_name) ⇒ Object
156 157 158 |
# File 'lib/rouge/regex_lexer.rb', line 156 def get_state(state_name) self.class.get_state(state_name) end |
#group(tok) ⇒ Object
Yield a token with the next matched group. Subsequent calls to this method will yield subsequent groups.
320 321 322 |
# File 'lib/rouge/regex_lexer.rb', line 320 def group(tok) token(tok, @last_match[@group_count += 1]) end |
#in_state?(state_name) ⇒ Boolean
Check if ‘state_name` is in the state stack.
377 378 379 |
# File 'lib/rouge/regex_lexer.rb', line 377 def in_state?(state_name) stack.map(&:name).include? state_name.to_s end |
#pop!(times = 1) ⇒ Object
Pop the state stack. If a number is passed in, it will be popped that number of times.
362 363 364 365 366 367 |
# File 'lib/rouge/regex_lexer.rb', line 362 def pop!(times=1) raise 'empty stack!' if stack.empty? debug { " popping stack: #{times}" } times.times { stack.pop } end |
#push(state_name = nil, &b) ⇒ Object
Push a state onto the stack. If no state name is given and you’ve passed a block, a state will be dynamically created using the StateDSL.
346 347 348 349 350 351 352 353 354 355 356 357 358 |
# File 'lib/rouge/regex_lexer.rb', line 346 def push(state_name=nil, &b) push_state = if state_name get_state(state_name) elsif block_given? State.new(b.inspect, &b).load! else # use the top of the stack by default self.state end debug { " pushing #{push_state.name}" } stack.push(push_state) end |
#reset! ⇒ Object
reset this lexer to its initial state. This runs all of the start_procs.
177 178 179 180 181 182 183 |
# File 'lib/rouge/regex_lexer.rb', line 177 def reset! @stack = nil self.class.start_procs.each do |pr| instance_eval(&pr) end end |
#reset_stack ⇒ Object
reset the stack back to ‘[:root]`.
370 371 372 373 374 |
# File 'lib/rouge/regex_lexer.rb', line 370 def reset_stack debug { ' resetting stack' } stack.clear stack.push get_state(:root) end |
#run_callback(stream, callback, &output_stream) ⇒ Object
258 259 260 261 262 263 264 265 |
# File 'lib/rouge/regex_lexer.rb', line 258 def run_callback(stream, callback, &output_stream) with_output_stream(output_stream) do @group_count = 0 @last_match = stream instance_exec(stream, &callback) @last_match = nil end end |
#run_rule(rule, stream, &b) ⇒ Object
240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 |
# File 'lib/rouge/regex_lexer.rb', line 240 def run_rule(rule, stream, &b) case rule when String debug { " entering mixin #{rule}" } res = step(get_state(rule), stream, &b) debug { " exiting mixin #{rule}" } res when Rule debug { " trying #{rule.inspect}" } scan(stream, rule.re) do debug { " got #{stream[0].inspect}" } run_callback(stream, rule.callback, &b) end end end |
#scan(scanner, re, &b) ⇒ Object
272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 |
# File 'lib/rouge/regex_lexer.rb', line 272 def scan(scanner, re, &b) # XXX HACK XXX # StringScanner's implementation of ^ is b0rken. # TODO: this doesn't cover cases like /(a|^b)/, but it's # the most common, for now... return false if re.source[0] == ?^ && !scanner.beginning_of_line? @null_steps ||= 0 if @null_steps >= MAX_NULL_SCANS debug { " too many scans without consuming the string!" } return false end scanner.scan(re) if scanner.matched? if scanner.matched_size == 0 @null_steps += 1 else @null_steps = 0 end yield self return true end return false end |
#stack ⇒ Object
The state stack. This is initially the single state ‘[:root]`. It is an error for this stack to be empty.
163 164 165 |
# File 'lib/rouge/regex_lexer.rb', line 163 def stack @stack ||= [get_state(:root)] end |
#state ⇒ Object
The current state - i.e. one on top of the state stack.
NB: if the state stack is empty, this will throw an error rather than returning nil.
171 172 173 |
# File 'lib/rouge/regex_lexer.rb', line 171 def state stack.last or raise 'empty stack!' end |
#state?(state_name) ⇒ Boolean
Check if ‘state_name` is the state on top of the state stack.
382 383 384 |
# File 'lib/rouge/regex_lexer.rb', line 382 def state?(state_name) state_name.to_s == state.name end |
#step(state, stream, &b) ⇒ Object
Runs one step of the lex. Rules in the current state are tried until one matches, at which point its callback is called.
231 232 233 234 235 236 237 |
# File 'lib/rouge/regex_lexer.rb', line 231 def step(state, stream, &b) state.rules.each do |rule| return true if run_rule(rule, stream, &b) end false end |
#stream_tokens(stream, &b) ⇒ Object
This implements the lexer protocol, by yielding [token, value] pairs.
The process for lexing works as follows, until the stream is empty:
-
We look at the state on top of the stack (which by default is ‘[:root]`).
-
Each rule in that state is tried until one is successful. If one is found, that rule’s callback is evaluated - which may yield tokens and manipulate the state stack. Otherwise, one character is consumed with an ‘’Error’‘ token, and we continue at (1.)
197 198 199 200 201 202 203 204 205 206 207 208 209 |
# File 'lib/rouge/regex_lexer.rb', line 197 def stream_tokens(stream, &b) stream_without_postprocessing(stream) do |tok, val| _, processor = self.class.postprocesses.find { |t, _| t === tok } if processor with_output_stream(b) do instance_exec(tok, val, &processor) end else yield tok, val end end end |
#stream_without_postprocessing(stream, &b) ⇒ Object
212 213 214 215 216 217 218 219 220 221 222 223 224 |
# File 'lib/rouge/regex_lexer.rb', line 212 def stream_without_postprocessing(stream, &b) until stream.eos? debug { "lexer: #{self.class.tag}" } debug { "stack: #{stack.map(&:name).inspect}" } debug { "stream: #{stream.peek(20).inspect}" } success = step(get_state(state), stream, &b) if !success debug { " no match, yielding Error" } b.call(Token['Error'], stream.getch) end end end |
#token(tok, val = :__absent__) ⇒ Object
Yield a token.
309 310 311 312 313 314 315 316 |
# File 'lib/rouge/regex_lexer.rb', line 309 def token(tok, val=:__absent__) val = @last_match[0] if val == :__absent__ val ||= '' raise 'no output stream' unless @output_stream @output_stream << [Token[tok], val] unless val.empty? end |