Class: Rouge::Lexer Abstract
- Inherits:
-
Object
- Object
- Rouge::Lexer
- Defined in:
- lib/rouge/lexer.rb
Overview
A lexer transforms text into a stream of ‘[token, chunk]` pairs.
Direct Known Subclasses
Class Method Summary collapse
-
.aliases(*args) ⇒ Object
Used to specify alternate names this lexer class may be found by.
-
.all ⇒ Object
A list of all lexers.
-
.analyze_text(text) ⇒ Object
abstract
Return a number between 0 and 1 indicating the likelihood that the text given should be lexed with this lexer.
- .assert_utf8!(str) ⇒ Object
- .default_options(o = {}) ⇒ Object
-
.demo(arg = :absent) ⇒ Object
Specify or get a small demo string for this lexer.
-
.demo_file(arg = :absent) ⇒ Object
Specify or get the path name containing a small demo for this lexer (can be overriden by Lexer.demo).
-
.desc(arg = :absent) ⇒ Object
Specify or get this lexer’s description.
-
.filenames(*fnames) ⇒ Object
Specify a list of filename globs associated with this lexer.
-
.find(name) ⇒ Object
Given a string, return the correct lexer class.
-
.find_fancy(str, code = nil) ⇒ Object
Find a lexer, with fancy shiny features.
-
.guess(info = {}) ⇒ Object
Guess which lexer to use based on a hash of info.
- .guess_by_filename(fname) ⇒ Object
- .guess_by_mimetype(mt) ⇒ Object
- .guess_by_source(source) ⇒ Object
-
.lex(stream, opts = {}, &b) ⇒ Object
Lexes ‘stream` with the given options.
-
.mimetypes(*mts) ⇒ Object
Specify a list of mimetypes associated with this lexer.
- .register(name, lexer) ⇒ Object
-
.tag(t = nil) ⇒ Object
Used to specify or get the canonical name of this lexer class.
Instance Method Summary collapse
-
#debug(&b) ⇒ Object
Leave a debug message if the ‘:debug` option is set.
-
#initialize(opts = {}) ⇒ Lexer
constructor
Create a new lexer with the given options.
-
#lex(string, opts = {}, &b) ⇒ Object
Given a string, yield [token, chunk] pairs.
-
#option(k, v = :absent) ⇒ Object
get or specify one option for this lexer.
-
#options(o = {}) ⇒ Object
get and/or specify the options for this lexer.
-
#reset! ⇒ Object
abstract
Called after each lex is finished.
-
#stream_tokens(stream, &b) ⇒ Object
abstract
Yield ‘[token, chunk]` pairs, given a prepared input stream.
-
#tag ⇒ Object
delegated to Lexer.tag.
Constructor Details
#initialize(opts = {}) ⇒ Lexer
Create a new lexer with the given options. Individual lexers may specify extra options. The only current globally accepted option is ‘:debug`.
235 236 237 |
# File 'lib/rouge/lexer.rb', line 235 def initialize(opts={}) (opts) end |
Class Method Details
.aliases(*args) ⇒ Object
Used to specify alternate names this lexer class may be found by.
183 184 185 186 187 |
# File 'lib/rouge/lexer.rb', line 183 def aliases(*args) args.map!(&:to_s) args.each { |arg| Lexer.register(arg, self) } (@aliases ||= []).concat(args) end |
.all ⇒ Object
Returns a list of all lexers.
87 88 89 |
# File 'lib/rouge/lexer.rb', line 87 def all registry.values.uniq end |
.analyze_text(text) ⇒ Object
Return a number between 0 and 1 indicating the likelihood that the text given should be lexed with this lexer. The default implementation returns 0.
328 329 330 |
# File 'lib/rouge/lexer.rb', line 328 def self.analyze_text(text) 0 end |
.assert_utf8!(str) ⇒ Object
210 211 212 213 214 215 216 |
# File 'lib/rouge/lexer.rb', line 210 def assert_utf8!(str) return if %w(US-ASCII UTF-8).include? str.encoding.name raise EncodingError.new( "Bad encoding: #{str.encoding.names.join(',')}. " + "Please convert your string to UTF-8." ) end |
.default_options(o = {}) ⇒ Object
17 18 19 20 21 |
# File 'lib/rouge/lexer.rb', line 17 def (o={}) @default_options ||= {} @default_options.merge!(o) @default_options end |
.demo(arg = :absent) ⇒ Object
Specify or get a small demo string for this lexer
80 81 82 83 84 |
# File 'lib/rouge/lexer.rb', line 80 def demo(arg=:absent) return @demo = arg unless arg == :absent @demo = File.read(demo_file) end |
.demo_file(arg = :absent) ⇒ Object
Specify or get the path name containing a small demo for this lexer (can be overriden by demo).
73 74 75 76 77 |
# File 'lib/rouge/lexer.rb', line 73 def demo_file(arg=:absent) return @demo_file = Pathname.new(arg) unless arg == :absent @demo_file = Pathname.new(__FILE__).dirname.join('demos', tag) end |
.desc(arg = :absent) ⇒ Object
Specify or get this lexer’s description.
63 64 65 66 67 68 69 |
# File 'lib/rouge/lexer.rb', line 63 def desc(arg=:absent) if arg == :absent @desc else @desc = arg end end |
.filenames(*fnames) ⇒ Object
Specify a list of filename globs associated with this lexer.
195 196 197 |
# File 'lib/rouge/lexer.rb', line 195 def filenames(*fnames) (@filenames ||= []).concat(fnames) end |
.find(name) ⇒ Object
Given a string, return the correct lexer class.
24 25 26 |
# File 'lib/rouge/lexer.rb', line 24 def find(name) registry[name.to_s] end |
.find_fancy(str, code = nil) ⇒ Object
Find a lexer, with fancy shiny features.
-
The string you pass can include CGI-style options
Lexer.find_fancy('erb?parent=tex')
-
You can pass the special name ‘guess’ so we guess for you, and you can pass a second argument of the code to guess by
Lexer.find_fancy('guess', "#!/bin/bash\necho Hello, world")
This is used in the Redcarpet plugin as well as Rouge’s own markdown lexer for highlighting internal code blocks.
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
# File 'lib/rouge/lexer.rb', line 42 def find_fancy(str, code=nil) name, opts = str ? str.split('?', 2) : [nil, ''] # parse the options hash from a cgi-style string opts = CGI.parse(opts || '').map do |k, vals| [ k.to_sym, vals.empty? ? true : vals[0] ] end opts = Hash[opts] lexer_class = case name when 'guess', nil self.guess(:source => code, :mimetype => opts[:mimetype]) when String self.find(name) end lexer_class && lexer_class.new(opts) end |
.guess(info = {}) ⇒ Object
Guess which lexer to use based on a hash of info.
103 104 105 106 107 108 109 110 111 112 113 114 115 |
# File 'lib/rouge/lexer.rb', line 103 def guess(info={}) by_mimetype = guess_by_mimetype(info[:mimetype]) if info[:mimetype] return by_mimetype if by_mimetype by_filename = guess_by_filename(info[:filename]) if info[:filename] return by_filename if by_filename by_source = guess_by_source(info[:source]) if info[:source] return by_source if by_source # guessing failed, just parse it as text return Lexers::Text end |
.guess_by_filename(fname) ⇒ Object
123 124 125 126 127 128 129 130 |
# File 'lib/rouge/lexer.rb', line 123 def guess_by_filename(fname) fname = File.basename(fname) registry.values.detect do |lexer| lexer.filenames.any? do |pattern| File.fnmatch?(pattern, fname, File::FNM_DOTMATCH) end end end |
.guess_by_mimetype(mt) ⇒ Object
117 118 119 120 121 |
# File 'lib/rouge/lexer.rb', line 117 def guess_by_mimetype(mt) registry.values.detect do |lexer| lexer.mimetypes.include? mt end end |
.guess_by_source(source) ⇒ Object
132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
# File 'lib/rouge/lexer.rb', line 132 def guess_by_source(source) assert_utf8!(source) source = TextAnalyzer.new(source) best_result = 0 best_match = nil registry.values.each do |lexer| result = lexer.analyze_text(source) || 0 return lexer if result == 1 if result > best_result best_match = lexer best_result = result end end best_match end |
.lex(stream, opts = {}, &b) ⇒ Object
Lexes ‘stream` with the given options. The lex is delegated to a new instance.
13 14 15 |
# File 'lib/rouge/lexer.rb', line 13 def lex(stream, opts={}, &b) new(opts).lex(stream, &b) end |
.mimetypes(*mts) ⇒ Object
Specify a list of mimetypes associated with this lexer.
205 206 207 |
# File 'lib/rouge/lexer.rb', line 205 def mimetypes(*mts) (@mimetypes ||= []).concat(mts) end |
.register(name, lexer) ⇒ Object
153 154 155 |
# File 'lib/rouge/lexer.rb', line 153 def register(name, lexer) registry[name.to_s] = lexer end |
Instance Method Details
#debug(&b) ⇒ Object
Leave a debug message if the ‘:debug` option is set. The message is given as a block because some debug messages contain calculated information that is unnecessary for lexing in the real world.
261 262 263 |
# File 'lib/rouge/lexer.rb', line 261 def debug(&b) puts(b.call) if option :debug end |
#lex(string, opts = {}, &b) ⇒ Object
Given a string, yield [token, chunk] pairs. If no block is given, an enumerator is returned.
277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 |
# File 'lib/rouge/lexer.rb', line 277 def lex(string, opts={}, &b) return enum_for(:lex, string) unless block_given? Lexer.assert_utf8!(string) reset! unless opts[:continue] # consolidate consecutive tokens of the same type last_token = nil last_val = nil stream_tokens(StringScanner.new(string)) do |tok, val| next if val.empty? if tok == last_token last_val << val next end b.call(last_token, last_val) if last_token last_token = tok last_val = val end b.call(last_token, last_val) if last_token end |
#option(k, v = :absent) ⇒ Object
get or specify one option for this lexer
247 248 249 250 251 252 253 |
# File 'lib/rouge/lexer.rb', line 247 def option(k, v=:absent) if v == :absent [k] else ({ k => v }) end end |
#options(o = {}) ⇒ Object
get and/or specify the options for this lexer.
240 241 242 243 244 |
# File 'lib/rouge/lexer.rb', line 240 def (o={}) (@options ||= {}).merge!(o) self.class..merge(@options) end |
#reset! ⇒ Object
Called after each lex is finished. The default implementation is a noop.
269 270 |
# File 'lib/rouge/lexer.rb', line 269 def reset! end |
#stream_tokens(stream, &b) ⇒ Object
Yield ‘[token, chunk]` pairs, given a prepared input stream. This must be implemented.
315 316 317 |
# File 'lib/rouge/lexer.rb', line 315 def stream_tokens(stream, &b) raise 'abstract' end |
#tag ⇒ Object
delegated to tag
304 305 306 |
# File 'lib/rouge/lexer.rb', line 304 def tag self.class.tag end |