Class: Rouge::Lexer Abstract
- Inherits:
-
Object
- Object
- Rouge::Lexer
- Defined in:
- lib/rouge/lexer.rb
Overview
A lexer transforms text into a stream of ‘[token, chunk]` pairs.
Direct Known Subclasses
Defined Under Namespace
Classes: AmbiguousGuess
Class Method Summary collapse
-
.aliases(*args) ⇒ Object
Used to specify alternate names this lexer class may be found by.
-
.all ⇒ Object
A list of all lexers.
-
.analyze_text(text) ⇒ Object
abstract
Return a number between 0 and 1 indicating the likelihood that the text given should be lexed with this lexer.
- .assert_utf8!(str) ⇒ Object
- .default_options(o = {}) ⇒ Object
-
.demo(arg = :absent) ⇒ Object
Specify or get a small demo string for this lexer.
-
.demo_file(arg = :absent) ⇒ Object
Specify or get the path name containing a small demo for this lexer (can be overriden by Lexer.demo).
-
.desc(arg = :absent) ⇒ Object
Specify or get this lexer’s description.
-
.filenames(*fnames) ⇒ Object
Specify a list of filename globs associated with this lexer.
-
.find(name) ⇒ Object
Given a string, return the correct lexer class.
-
.find_fancy(str, code = nil) ⇒ Object
Find a lexer, with fancy shiny features.
-
.guess(info = {}) ⇒ Object
Guess which lexer to use based on a hash of info.
- .guess_by_filename(fname) ⇒ Object
- .guess_by_mimetype(mt) ⇒ Object
- .guess_by_source(source) ⇒ Object
-
.guesses(info = {}) ⇒ Object
Guess which lexer to use based on a hash of info.
-
.lex(stream, opts = {}, &b) ⇒ Object
Lexes ‘stream` with the given options.
-
.mimetypes(*mts) ⇒ Object
Specify a list of mimetypes associated with this lexer.
-
.tag(t = nil) ⇒ Object
Used to specify or get the canonical name of this lexer class.
Instance Method Summary collapse
-
#debug(&b) ⇒ Object
Leave a debug message if the ‘:debug` option is set.
-
#initialize(opts = {}) ⇒ Lexer
constructor
Create a new lexer with the given options.
-
#lex(string, opts = {}, &b) ⇒ Object
Given a string, yield [token, chunk] pairs.
-
#option(k, v = :absent) ⇒ Object
get or specify one option for this lexer.
-
#options(o = {}) ⇒ Object
get and/or specify the options for this lexer.
-
#reset! ⇒ Object
abstract
Called after each lex is finished.
-
#stream_tokens(stream, &b) ⇒ Object
abstract
Yield ‘[token, chunk]` pairs, given a prepared input stream.
-
#tag ⇒ Object
delegated to Lexer.tag.
Constructor Details
#initialize(opts = {}) ⇒ Lexer
Create a new lexer with the given options. Individual lexers may specify extra options. The only current globally accepted option is ‘:debug`.
304 305 306 |
# File 'lib/rouge/lexer.rb', line 304 def initialize(opts={}) (opts) end |
Class Method Details
.aliases(*args) ⇒ Object
Used to specify alternate names this lexer class may be found by.
252 253 254 255 256 |
# File 'lib/rouge/lexer.rb', line 252 def aliases(*args) args.map!(&:to_s) args.each { |arg| Lexer.register(arg, self) } (@aliases ||= []).concat(args) end |
.all ⇒ Object
Returns a list of all lexers.
88 89 90 |
# File 'lib/rouge/lexer.rb', line 88 def all registry.values.uniq end |
.analyze_text(text) ⇒ Object
Return a number between 0 and 1 indicating the likelihood that the text given should be lexed with this lexer. The default implementation returns 0. Values under 0.5 will only be used to disambiguate filename or mimetype matches.
409 410 411 |
# File 'lib/rouge/lexer.rb', line 409 def self.analyze_text(text) 0 end |
.assert_utf8!(str) ⇒ Object
279 280 281 282 283 284 285 |
# File 'lib/rouge/lexer.rb', line 279 def assert_utf8!(str) return if %w(US-ASCII UTF-8).include? str.encoding.name raise EncodingError.new( "Bad encoding: #{str.encoding.names.join(',')}. " + "Please convert your string to UTF-8." ) end |
.default_options(o = {}) ⇒ Object
18 19 20 21 22 |
# File 'lib/rouge/lexer.rb', line 18 def (o={}) @default_options ||= {} @default_options.merge!(o) @default_options end |
.demo(arg = :absent) ⇒ Object
Specify or get a small demo string for this lexer
81 82 83 84 85 |
# File 'lib/rouge/lexer.rb', line 81 def demo(arg=:absent) return @demo = arg unless arg == :absent @demo = File.read(demo_file) end |
.demo_file(arg = :absent) ⇒ Object
Specify or get the path name containing a small demo for this lexer (can be overriden by demo).
74 75 76 77 78 |
# File 'lib/rouge/lexer.rb', line 74 def demo_file(arg=:absent) return @demo_file = Pathname.new(arg) unless arg == :absent @demo_file = Pathname.new(__FILE__).dirname.join('demos', tag) end |
.desc(arg = :absent) ⇒ Object
Specify or get this lexer’s description.
64 65 66 67 68 69 70 |
# File 'lib/rouge/lexer.rb', line 64 def desc(arg=:absent) if arg == :absent @desc else @desc = arg end end |
.filenames(*fnames) ⇒ Object
Specify a list of filename globs associated with this lexer.
264 265 266 |
# File 'lib/rouge/lexer.rb', line 264 def filenames(*fnames) (@filenames ||= []).concat(fnames) end |
.find(name) ⇒ Object
Given a string, return the correct lexer class.
25 26 27 |
# File 'lib/rouge/lexer.rb', line 25 def find(name) registry[name.to_s] end |
.find_fancy(str, code = nil) ⇒ Object
Find a lexer, with fancy shiny features.
-
The string you pass can include CGI-style options
Lexer.find_fancy('erb?parent=tex')
-
You can pass the special name ‘guess’ so we guess for you, and you can pass a second argument of the code to guess by
Lexer.find_fancy('guess', "#!/bin/bash\necho Hello, world")
This is used in the Redcarpet plugin as well as Rouge’s own markdown lexer for highlighting internal code blocks.
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
# File 'lib/rouge/lexer.rb', line 43 def find_fancy(str, code=nil) name, opts = str ? str.split('?', 2) : [nil, ''] # parse the options hash from a cgi-style string opts = CGI.parse(opts || '').map do |k, vals| [ k.to_sym, vals.empty? ? true : vals[0] ] end opts = Hash[opts] lexer_class = case name when 'guess', nil self.guess(:source => code, :mimetype => opts[:mimetype]) when String self.find(name) end lexer_class && lexer_class.new(opts) end |
.guess(info = {}) ⇒ Object
Guess which lexer to use based on a hash of info.
141 142 143 144 145 146 147 148 |
# File 'lib/rouge/lexer.rb', line 141 def guess(info={}) lexers = guesses(info) return Lexers::Text if lexers.empty? return lexers[0] if lexers.size == 1 raise AmbiguousGuess.new(lexers) end |
.guess_by_filename(fname) ⇒ Object
154 155 156 |
# File 'lib/rouge/lexer.rb', line 154 def guess_by_filename(fname) guess :filename => fname end |
.guess_by_mimetype(mt) ⇒ Object
150 151 152 |
# File 'lib/rouge/lexer.rb', line 150 def guess_by_mimetype(mt) guess :mimetype => mt end |
.guess_by_source(source) ⇒ Object
158 159 160 |
# File 'lib/rouge/lexer.rb', line 158 def guess_by_source(source) guess :source => source end |
.guesses(info = {}) ⇒ Object
Guess which lexer to use based on a hash of info.
This accepts the same arguments as Lexer.guess, but will never throw an error. It will return a (possibly empty) list of potential lexers to use.
97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
# File 'lib/rouge/lexer.rb', line 97 def guesses(info={}) mimetype, filename, source = info.values_at(:mimetype, :filename, :source) lexers = registry.values.uniq total_size = lexers.size lexers = filter_by_mimetype(lexers, mimetype) if mimetype return lexers if lexers.size == 1 lexers = filter_by_filename(lexers, filename) if filename return lexers if lexers.size == 1 if source # If we're filtering against *all* lexers, we only use confident return # values from analyze_text. But if we've filtered down already, we can trust # the analysis more. source_threshold = lexers.size < total_size ? 0 : 0.5 return [best_by_source(lexers, source, source_threshold)].compact end [] end |
.lex(stream, opts = {}, &b) ⇒ Object
Lexes ‘stream` with the given options. The lex is delegated to a new instance.
14 15 16 |
# File 'lib/rouge/lexer.rb', line 14 def lex(stream, opts={}, &b) new(opts).lex(stream, &b) end |
.mimetypes(*mts) ⇒ Object
Specify a list of mimetypes associated with this lexer.
274 275 276 |
# File 'lib/rouge/lexer.rb', line 274 def mimetypes(*mts) (@mimetypes ||= []).concat(mts) end |
.tag(t = nil) ⇒ Object
Used to specify or get the canonical name of this lexer class.
236 237 238 239 240 241 |
# File 'lib/rouge/lexer.rb', line 236 def tag(t=nil) return @tag if t.nil? @tag = t.to_s Lexer.register(@tag, self) end |
Instance Method Details
#debug(&b) ⇒ Object
Leave a debug message if the ‘:debug` option is set. The message is given as a block because some debug messages contain calculated information that is unnecessary for lexing in the real world.
330 331 332 333 334 335 336 337 338 339 340 341 342 343 |
# File 'lib/rouge/lexer.rb', line 330 def debug(&b) # This method is a hotspot, unfortunately. # # For performance reasons, the "debug" option of a lexer cannot # be changed once it has begun lexing. This method will redefine # itself on the first call to a noop if "debug" is not set. if option(:debug) def self.debug; puts yield; end else def self.debug; end end debug(&b) end |
#lex(string, opts = {}, &b) ⇒ Object
Given a string, yield [token, chunk] pairs. If no block is given, an enumerator is returned.
357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 |
# File 'lib/rouge/lexer.rb', line 357 def lex(string, opts={}, &b) return enum_for(:lex, string) unless block_given? Lexer.assert_utf8!(string) reset! unless opts[:continue] # consolidate consecutive tokens of the same type last_token = nil last_val = nil stream_tokens(string) do |tok, val| next if val.empty? if tok == last_token last_val << val next end b.call(last_token, last_val) if last_token last_token = tok last_val = val end b.call(last_token, last_val) if last_token end |
#option(k, v = :absent) ⇒ Object
get or specify one option for this lexer
316 317 318 319 320 321 322 |
# File 'lib/rouge/lexer.rb', line 316 def option(k, v=:absent) if v == :absent [k] else ({ k => v }) end end |
#options(o = {}) ⇒ Object
get and/or specify the options for this lexer.
309 310 311 312 313 |
# File 'lib/rouge/lexer.rb', line 309 def (o={}) (@options ||= {}).merge!(o) self.class..merge(@options) end |
#reset! ⇒ Object
Called after each lex is finished. The default implementation is a noop.
349 350 |
# File 'lib/rouge/lexer.rb', line 349 def reset! end |
#stream_tokens(stream, &b) ⇒ Object
Yield ‘[token, chunk]` pairs, given a prepared input stream. This must be implemented.
395 396 397 |
# File 'lib/rouge/lexer.rb', line 395 def stream_tokens(stream, &b) raise 'abstract' end |
#tag ⇒ Object
delegated to tag
384 385 386 |
# File 'lib/rouge/lexer.rb', line 384 def tag self.class.tag end |