Class: Rouge::Lexer Abstract

Inherits:
Object
  • Object
show all
Includes:
Token::Tokens
Defined in:
lib/rouge/lexer.rb

Overview

This class is abstract.

A lexer transforms text into a stream of ‘[token, chunk]` pairs.

Constant Summary

Constants included from Token::Tokens

Token::Tokens::Num, Token::Tokens::Str

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Token::Tokens

token

Constructor Details

#initialize(opts = {}) ⇒ Lexer

Create a new lexer with the given options. Individual lexers may specify extra options. The only current globally accepted option is ‘:debug`.

Parameters:

  • opts (Hash) (defaults to: {})

    a customizable set of options

Options Hash (opts):

  • :debug (Object)

    Prints debug information to stdout. The particular info depends on the lexer in question. In regex lexers, this will log the state stack at the beginning of each step, along with each regex tried and each stream consumed. Try it, it’s pretty useful.



272
273
274
275
276
277
# File 'lib/rouge/lexer.rb', line 272

def initialize(opts={})
  @options = {}
  opts.each { |k, v| @options[k.to_s] = v }

  @debug = Lexer.debug_enabled? && bool_option(:debug)
end

Instance Attribute Details

#optionsObject (readonly)

-*- instance methods -*- #



262
263
264
# File 'lib/rouge/lexer.rb', line 262

def options
  @options
end

Class Method Details

.aliases(*args) ⇒ Object

Used to specify alternate names this lexer class may be found by.

Examples:

class Erb < Lexer
  tag 'erb'
  aliases 'eruby', 'rhtml'
end

Lexer.find('eruby') # => Erb


219
220
221
222
223
# File 'lib/rouge/lexer.rb', line 219

def aliases(*args)
  args.map!(&:to_s)
  args.each { |arg| Lexer.register(arg, self) }
  (@aliases ||= []).concat(args)
end

.allObject

Returns a list of all lexers.

Returns:

  • a list of all lexers.



116
117
118
# File 'lib/rouge/lexer.rb', line 116

def all
  registry.values.uniq
end

.analyze_text(text) ⇒ Object

This method is abstract.

Return a number between 0 and 1 indicating the likelihood that the text given should be lexed with this lexer. The default implementation returns 0. Values under 0.5 will only be used to disambiguate filename or mimetype matches.

Parameters:



436
437
438
# File 'lib/rouge/lexer.rb', line 436

def self.analyze_text(text)
  0
end

.assert_utf8!(str) ⇒ Object

Raises:

  • (EncodingError)


246
247
248
249
250
251
252
# File 'lib/rouge/lexer.rb', line 246

def assert_utf8!(str)
  return if %w(US-ASCII UTF-8 ASCII-8BIT).include? str.encoding.name
  raise EncodingError.new(
    "Bad encoding: #{str.encoding.names.join(',')}. " +
    "Please convert your string to UTF-8."
  )
end

.debug_enabled?Boolean

Returns:

  • (Boolean)


182
183
184
# File 'lib/rouge/lexer.rb', line 182

def debug_enabled?
  !!@debug_enabled
end

.demo(arg = :absent) ⇒ Object

Specify or get a small demo string for this lexer



109
110
111
112
113
# File 'lib/rouge/lexer.rb', line 109

def demo(arg=:absent)
  return @demo = arg unless arg == :absent

  @demo = File.read(demo_file, encoding: 'utf-8')
end

.demo_file(arg = :absent) ⇒ Object

Specify or get the path name containing a small demo for this lexer (can be overriden by demo).



102
103
104
105
106
# File 'lib/rouge/lexer.rb', line 102

def demo_file(arg=:absent)
  return @demo_file = Pathname.new(arg) unless arg == :absent

  @demo_file = Pathname.new(__FILE__).dirname.join('demos', tag)
end

.desc(arg = :absent) ⇒ Object

Specify or get this lexer’s description.



84
85
86
87
88
89
90
# File 'lib/rouge/lexer.rb', line 84

def desc(arg=:absent)
  if arg == :absent
    @desc
  else
    @desc = arg
  end
end

.disable_debug!Object



178
179
180
# File 'lib/rouge/lexer.rb', line 178

def disable_debug!
  @debug_enabled = false
end

.enable_debug!Object



174
175
176
# File 'lib/rouge/lexer.rb', line 174

def enable_debug!
  @debug_enabled = true
end

.filenames(*fnames) ⇒ Object

Specify a list of filename globs associated with this lexer.

Examples:

class Ruby < Lexer
  filenames '*.rb', '*.ruby', 'Gemfile', 'Rakefile'
end


231
232
233
# File 'lib/rouge/lexer.rb', line 231

def filenames(*fnames)
  (@filenames ||= []).concat(fnames)
end

.find(name) ⇒ Object

Given a string, return the correct lexer class.



26
27
28
# File 'lib/rouge/lexer.rb', line 26

def find(name)
  registry[name.to_s]
end

.find_fancy(str, code = nil, additional_options = {}) ⇒ Object

Find a lexer, with fancy shiny features.

  • The string you pass can include CGI-style options

    Lexer.find_fancy('erb?parent=tex')
    
  • You can pass the special name ‘guess’ so we guess for you, and you can pass a second argument of the code to guess by

    Lexer.find_fancy('guess', "#!/bin/bash\necho Hello, world")
    

This is used in the Redcarpet plugin as well as Rouge’s own markdown lexer for highlighting internal code blocks.



44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# File 'lib/rouge/lexer.rb', line 44

def find_fancy(str, code=nil, additional_options={})
  if str && !str.include?('?') && str != 'guess'
    lexer_class = find(str)
    return lexer_class && lexer_class.new(additional_options)
  end

  name, opts = str ? str.split('?', 2) : [nil, '']

  # parse the options hash from a cgi-style string
  opts = CGI.parse(opts || '').map do |k, vals|
    val = case vals.size
    when 0 then true
    when 1 then vals[0]
    else vals
    end

    [ k.to_s, val ]
  end

  opts = additional_options.merge(Hash[opts])

  lexer_class = case name
  when 'guess', nil
    self.guess(:source => code, :mimetype => opts['mimetype'])
  when String
    self.find(name)
  end

  lexer_class && lexer_class.new(opts)
end

.guess(info = {}) ⇒ Object

Guess which lexer to use based on a hash of info.

Parameters:

  • info (Hash) (defaults to: {})

    a customizable set of options

Options Hash (info):

  • :mimetype (Object)

    A mimetype to guess by

  • :filename (Object)

    A filename to guess by

  • :source (Object)

    The source itself, which, if guessing by mimetype or filename fails, will be searched for shebangs, <!DOCTYPE …> tags, and other hints.

Raises:

See Also:



153
154
155
156
157
158
159
160
# File 'lib/rouge/lexer.rb', line 153

def guess(info={})
  lexers = guesses(info)

  return Lexers::PlainText if lexers.empty?
  return lexers[0] if lexers.size == 1

  raise Guesser::Ambiguous.new(lexers)
end

.guess_by_filename(fname) ⇒ Object



166
167
168
# File 'lib/rouge/lexer.rb', line 166

def guess_by_filename(fname)
  guess :filename => fname
end

.guess_by_mimetype(mt) ⇒ Object



162
163
164
# File 'lib/rouge/lexer.rb', line 162

def guess_by_mimetype(mt)
  guess :mimetype => mt
end

.guess_by_source(source) ⇒ Object



170
171
172
# File 'lib/rouge/lexer.rb', line 170

def guess_by_source(source)
  guess :source => source
end

.guesses(info = {}) ⇒ Object

Guess which lexer to use based on a hash of info.

This accepts the same arguments as Lexer.guess, but will never throw an error. It will return a (possibly empty) list of potential lexers to use.



125
126
127
128
129
130
131
132
133
134
135
136
137
138
# File 'lib/rouge/lexer.rb', line 125

def guesses(info={})
  mimetype, filename, source = info.values_at(:mimetype, :filename, :source)
  custom_globs = info[:custom_globs]

  guessers = (info[:guessers] || []).dup

  guessers << Guessers::Mimetype.new(mimetype) if mimetype
  guessers << Guessers::GlobMapping.by_pairs(custom_globs, filename) if custom_globs && filename
  guessers << Guessers::Filename.new(filename) if filename
  guessers << Guessers::Modeline.new(source) if source
  guessers << Guessers::Source.new(source) if source

  Guesser.guess(guessers, Lexer.all)
end

.lex(stream, opts = {}, &b) ⇒ Object

Lexes ‘stream` with the given options. The lex is delegated to a new instance.

See Also:



21
22
23
# File 'lib/rouge/lexer.rb', line 21

def lex(stream, opts={}, &b)
  new(opts).lex(stream, &b)
end

.mimetypes(*mts) ⇒ Object

Specify a list of mimetypes associated with this lexer.

Examples:

class Html < Lexer
  mimetypes 'text/html', 'application/xhtml+xml'
end


241
242
243
# File 'lib/rouge/lexer.rb', line 241

def mimetypes(*mts)
  (@mimetypes ||= []).concat(mts)
end

.option(name, desc) ⇒ Object



96
97
98
# File 'lib/rouge/lexer.rb', line 96

def option(name, desc)
  option_docs[name.to_s] = desc
end

.option_docsObject



92
93
94
# File 'lib/rouge/lexer.rb', line 92

def option_docs
  @option_docs ||= InheritableHash.new(superclass.option_docs)
end

.tag(t = nil) ⇒ Object

Used to specify or get the canonical name of this lexer class.

Examples:

class MyLexer < Lexer
  tag 'foo'
end

MyLexer.tag # => 'foo'

Lexer.find('foo') # => MyLexer


203
204
205
206
207
208
# File 'lib/rouge/lexer.rb', line 203

def tag(t=nil)
  return @tag if t.nil?

  @tag = t.to_s
  Lexer.register(@tag, self)
end

.title(t = nil) ⇒ Object

Specify or get this lexer’s title. Meant to be human-readable.



76
77
78
79
80
81
# File 'lib/rouge/lexer.rb', line 76

def title(t=nil)
  if t.nil?
    t = tag.capitalize
  end
  @title ||= t
end

Instance Method Details

#as_bool(val) ⇒ Object



279
280
281
282
283
284
285
286
287
288
# File 'lib/rouge/lexer.rb', line 279

def as_bool(val)
  case val
  when nil, false, 0, '0', 'off'
    false
  when Array
    val.empty? ? true : as_bool(val.last)
  else
    true
  end
end

#as_lexer(val) ⇒ Object



307
308
309
310
311
312
313
314
315
316
317
318
# File 'lib/rouge/lexer.rb', line 307

def as_lexer(val)
  return as_lexer(val.last) if val.is_a?(Array)
  return val.new(@options) if val.is_a?(Class) && val < Lexer

  case val
  when Lexer
    val
  when String
    lexer_class = Lexer.find(val)
    lexer_class && lexer_class.new(@options)
  end
end

#as_list(val) ⇒ Object



296
297
298
299
300
301
302
303
304
305
# File 'lib/rouge/lexer.rb', line 296

def as_list(val)
  case val
  when Array
    val.flat_map { |v| as_list(v) }
  when String
    val.split(',')
  else
    []
  end
end

#as_string(val) ⇒ Object



290
291
292
293
294
# File 'lib/rouge/lexer.rb', line 290

def as_string(val)
  return as_string(val.last) if val.is_a?(Array)

  val ? val.to_s : nil
end

#as_token(val) ⇒ Object



320
321
322
323
324
325
326
327
328
# File 'lib/rouge/lexer.rb', line 320

def as_token(val)
  return as_token(val.last) if val.is_a?(Array)
  case val
  when Token
    val
  else
    Token[val]
  end
end

#bool_option(name, &default) ⇒ Object



330
331
332
333
334
335
336
# File 'lib/rouge/lexer.rb', line 330

def bool_option(name, &default)
  if @options.key?(name.to_s)
    as_bool(@options[name.to_s])
  else
    default ? default.call : false
  end
end

#hash_option(name, defaults, &val_cast) ⇒ Object



354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
# File 'lib/rouge/lexer.rb', line 354

def hash_option(name, defaults, &val_cast)
  name = name.to_s
  out = defaults.dup

  base = @options.delete(name.to_s)
  base = {} unless base.is_a?(Hash)
  base.each { |k, v| out[k.to_s] = val_cast ? val_cast.call(v) : v }

  @options.keys.each do |key|
    next unless key =~ /(\w+)\[(\w+)\]/ and $1 == name
    value = @options.delete(key)

    out[$2] = val_cast ? val_cast.call(value) : value
  end

  out
end

#lex(string, opts = {}, &b) ⇒ Object

Given a string, yield [token, chunk] pairs. If no block is given, an enumerator is returned.

Parameters:

  • opts (Hash) (defaults to: {})

    a customizable set of options

Options Hash (opts):

  • :continue (Object)

    Continue the lex from the previous state (i.e. don’t call #reset!)



384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
# File 'lib/rouge/lexer.rb', line 384

def lex(string, opts={}, &b)
  return enum_for(:lex, string, opts) unless block_given?

  Lexer.assert_utf8!(string)

  reset! unless opts[:continue]

  # consolidate consecutive tokens of the same type
  last_token = nil
  last_val = nil
  stream_tokens(string) do |tok, val|
    next if val.empty?

    if tok == last_token
      last_val << val
      next
    end

    b.call(last_token, last_val) if last_token
    last_token = tok
    last_val = val
  end

  b.call(last_token, last_val) if last_token
end

#lexer_option(name, &default) ⇒ Object



342
343
344
# File 'lib/rouge/lexer.rb', line 342

def lexer_option(name, &default)
  as_lexer(@options.delete(name.to_s, &default))
end

#list_option(name, &default) ⇒ Object



346
347
348
# File 'lib/rouge/lexer.rb', line 346

def list_option(name, &default)
  as_list(@options.delete(name.to_s, &default))
end

#reset!Object

This method is abstract.

Called after each lex is finished. The default implementation is a noop.



376
377
# File 'lib/rouge/lexer.rb', line 376

def reset!
end

#stream_tokens(stream, &b) ⇒ Object

This method is abstract.

Yield ‘[token, chunk]` pairs, given a prepared input stream. This must be implemented.

Parameters:

  • stream (StringScanner)

    the stream



422
423
424
# File 'lib/rouge/lexer.rb', line 422

def stream_tokens(stream, &b)
  raise 'abstract'
end

#string_option(name, &default) ⇒ Object



338
339
340
# File 'lib/rouge/lexer.rb', line 338

def string_option(name, &default)
  as_string(@options.delete(name.to_s, &default))
end

#tagObject

delegated to tag



411
412
413
# File 'lib/rouge/lexer.rb', line 411

def tag
  self.class.tag
end

#token_option(name, &default) ⇒ Object



350
351
352
# File 'lib/rouge/lexer.rb', line 350

def token_option(name, &default)
  as_token(@options.delete(name.to_s, &default))
end