Class: Rouge::Lexer Abstract

Inherits:
Object
  • Object
show all
Includes:
Token::Tokens
Defined in:
lib/rouge/lexer.rb

Overview

This class is abstract.

A lexer transforms text into a stream of [token, chunk] pairs.

Constant Summary

Constants included from Token::Tokens

Token::Tokens::Num, Token::Tokens::Str

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Token::Tokens

token

Constructor Details

#initialize(opts = {}) ⇒ Lexer

Create a new lexer with the given options. Individual lexers may specify extra options. The only current globally accepted option is :debug.

Options Hash (opts):

  • :debug (Object)

    Prints debug information to stdout. The particular info depends on the lexer in question. In regex lexers, this will log the state stack at the beginning of each step, along with each regex tried and each stream consumed. Try it, it's pretty useful.


309
310
311
312
313
314
# File 'lib/rouge/lexer.rb', line 309

def initialize(opts={})
  @options = {}
  opts.each { |k, v| @options[k.to_s] = v }

  @debug = Lexer.debug_enabled? && bool_option('debug')
end

Instance Attribute Details

#optionsObject (readonly)

-- instance methods -- #


299
300
301
# File 'lib/rouge/lexer.rb', line 299

def options
  @options
end

Class Method Details

.aliases(*args) ⇒ Object

Used to specify alternate names this lexer class may be found by.

Examples:

class Erb < Lexer
  tag 'erb'
  aliases 'eruby', 'rhtml'
end

Lexer.find('eruby') # => Erb

247
248
249
250
251
# File 'lib/rouge/lexer.rb', line 247

def aliases(*args)
  args.map!(&:to_s)
  args.each { |arg| Lexer.register(arg, self) }
  (@aliases ||= []).concat(args)
end

.allObject


128
129
130
# File 'lib/rouge/lexer.rb', line 128

def all
  @all ||= registry.values.uniq
end

.continue_lex(*a, &b) ⇒ Object

In case #continue_lex is called statically, we simply begin a new lex from the beginning, since there is no state.

See Also:


30
31
32
# File 'lib/rouge/lexer.rb', line 30

def continue_lex(*a, &b)
  lex(*a, &b)
end

.debug_enabled?Boolean


202
203
204
# File 'lib/rouge/lexer.rb', line 202

def debug_enabled?
  (defined? @debug_enabled) ? true : false
end

.demo(arg = :absent) ⇒ Object

Specify or get a small demo string for this lexer


121
122
123
124
125
# File 'lib/rouge/lexer.rb', line 121

def demo(arg=:absent)
  return @demo = arg unless arg == :absent

  @demo = File.read(demo_file, mode: 'rt:bom|utf-8')
end

.demo_file(arg = :absent) ⇒ Object

Specify or get the path name containing a small demo for this lexer (can be overriden by demo).


114
115
116
117
118
# File 'lib/rouge/lexer.rb', line 114

def demo_file(arg=:absent)
  return @demo_file = Pathname.new(arg) unless arg == :absent

  @demo_file = Pathname.new(File.join(__dir__, 'demos', tag))
end

.desc(arg = :absent) ⇒ Object

Specify or get this lexer's description.


96
97
98
99
100
101
102
# File 'lib/rouge/lexer.rb', line 96

def desc(arg=:absent)
  if arg == :absent
    @desc
  else
    @desc = arg
  end
end

.detect?(text) ⇒ Boolean

This method is abstract.

Return true if there is an in-text indication (such as a shebang or DOCTYPE declaration) that this lexer should be used.


498
499
500
# File 'lib/rouge/lexer.rb', line 498

def self.detect?(text)
  false
end

.detectable?Boolean

Determine if a lexer has a method named +:detect?+ defined in its singleton class.


208
209
210
# File 'lib/rouge/lexer.rb', line 208

def detectable?
  @detectable ||= methods(false).include?(:detect?)
end

.disable_debug!Object


198
199
200
# File 'lib/rouge/lexer.rb', line 198

def disable_debug!
  remove_instance_variable :@debug_enabled if defined? @debug_enabled
end

.enable_debug!Object


194
195
196
# File 'lib/rouge/lexer.rb', line 194

def enable_debug!
  @debug_enabled = true
end

.filenames(*fnames) ⇒ Object

Specify a list of filename globs associated with this lexer.

If a filename glob is associated with more than one lexer, this can cause a Guesser::Ambiguous error to be raised in various guessing methods. These errors can be avoided by disambiguation. Filename globs are disambiguated in one of two ways. Either the lexer will define a self.detect? method (intended for use with shebangs and doctypes) or a manual rule will be specified in Guessers::Disambiguation.

Examples:

class Ruby < Lexer
  filenames '*.rb', '*.ruby', 'Gemfile', 'Rakefile'
end

266
267
268
# File 'lib/rouge/lexer.rb', line 266

def filenames(*fnames)
  (@filenames ||= []).concat(fnames)
end

.find(name) ⇒ Class<Rouge::Lexer>?

Given a name in string, return the correct lexer class.


37
38
39
# File 'lib/rouge/lexer.rb', line 37

def find(name)
  registry[name.to_s]
end

.find_fancy(str, code = nil, additional_options = {}) ⇒ Object

Find a lexer, with fancy shiny features.

  • The string you pass can include CGI-style options

    Lexer.find_fancy('erb?parent=tex')

  • You can pass the special name 'guess' so we guess for you, and you can pass a second argument of the code to guess by

    Lexer.find_fancy('guess', "#!/bin/bash\necho Hello, world")

This is used in the Redcarpet plugin as well as Rouge's own markdown lexer for highlighting internal code blocks.


55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
# File 'lib/rouge/lexer.rb', line 55

def find_fancy(str, code=nil, additional_options={})

  if str && !str.include?('?') && str != 'guess'
    lexer_class = find(str)
    return lexer_class && lexer_class.new(additional_options)
  end

  name, opts = str ? str.split('?', 2) : [nil, '']

  # parse the options hash from a cgi-style string
  opts = CGI.parse(opts || '').map do |k, vals|
    val = case vals.size
    when 0 then true
    when 1 then vals[0]
    else vals
    end

    [ k.to_s, val ]
  end

  opts = additional_options.merge(Hash[opts])

  lexer_class = case name
  when 'guess', nil
    self.guess(:source => code, :mimetype => opts['mimetype'])
  when String
    self.find(name)
  end

  lexer_class && lexer_class.new(opts)
end

.guess(info = {}, &fallback) ⇒ Class<Rouge::Lexer>

Guess which lexer to use based on a hash of info.

Options Hash (info):

  • :mimetype (Object)

    A mimetype to guess by

  • :filename (Object)

    A filename to guess by

  • :source (Object)

    The source itself, which, if guessing by mimetype or filename fails, will be searched for shebangs, <!DOCTYPE ...> tags, and other hints.

See Also:


169
170
171
172
173
174
175
176
177
178
179
180
# File 'lib/rouge/lexer.rb', line 169

def guess(info={}, &fallback)
  lexers = guesses(info)

  return Lexers::PlainText if lexers.empty?
  return lexers[0] if lexers.size == 1

  if fallback
    fallback.call(lexers)
  else
    raise Guesser::Ambiguous.new(lexers)
  end
end

.guess_by_filename(fname) ⇒ Object


186
187
188
# File 'lib/rouge/lexer.rb', line 186

def guess_by_filename(fname)
  guess :filename => fname
end

.guess_by_mimetype(mt) ⇒ Object


182
183
184
# File 'lib/rouge/lexer.rb', line 182

def guess_by_mimetype(mt)
  guess :mimetype => mt
end

.guess_by_source(source) ⇒ Object


190
191
192
# File 'lib/rouge/lexer.rb', line 190

def guess_by_source(source)
  guess :source => source
end

.guesses(info = {}) ⇒ Object

Guess which lexer to use based on a hash of info.

This accepts the same arguments as Lexer.guess, but will never throw an error. It will return a (possibly empty) list of potential lexers to use.


137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
# File 'lib/rouge/lexer.rb', line 137

def guesses(info={})
  mimetype, filename, source = info.values_at(:mimetype, :filename, :source)
  custom_globs = info[:custom_globs]

  guessers = (info[:guessers] || []).dup

  guessers << Guessers::Mimetype.new(mimetype) if mimetype
  guessers << Guessers::GlobMapping.by_pairs(custom_globs, filename) if custom_globs && filename
  guessers << Guessers::Filename.new(filename) if filename
  guessers << Guessers::Modeline.new(source) if source
  guessers << Guessers::Source.new(source) if source
  guessers << Guessers::Disambiguation.new(filename, source) if source && filename

  Guesser.guess(guessers, Lexer.all)
end

.lex(stream, opts = {}, &b) ⇒ Object

Lexes stream with the given options. The lex is delegated to a new instance.

See Also:


22
23
24
# File 'lib/rouge/lexer.rb', line 22

def lex(stream, opts={}, &b)
  new(opts).lex(stream, &b)
end

.mimetypes(*mts) ⇒ Object

Specify a list of mimetypes associated with this lexer.

Examples:

class Html < Lexer
  mimetypes 'text/html', 'application/xhtml+xml'
end

276
277
278
# File 'lib/rouge/lexer.rb', line 276

def mimetypes(*mts)
  (@mimetypes ||= []).concat(mts)
end

.option(name, desc) ⇒ Object


108
109
110
# File 'lib/rouge/lexer.rb', line 108

def option(name, desc)
  option_docs[name.to_s] = desc
end

.option_docsObject


104
105
106
# File 'lib/rouge/lexer.rb', line 104

def option_docs
  @option_docs ||= InheritableHash.new(superclass.option_docs)
end

.tag(t = nil) ⇒ Object

Used to specify or get the canonical name of this lexer class.

Examples:

class MyLexer < Lexer
  tag 'foo'
end

MyLexer.tag # => 'foo'

Lexer.find('foo') # => MyLexer

231
232
233
234
235
236
# File 'lib/rouge/lexer.rb', line 231

def tag(t=nil)
  return @tag if t.nil?

  @tag = t.to_s
  Lexer.register(@tag, self)
end

.title(t = nil) ⇒ Object

Specify or get this lexer's title. Meant to be human-readable.


88
89
90
91
92
93
# File 'lib/rouge/lexer.rb', line 88

def title(t=nil)
  if t.nil?
    t = tag.capitalize
  end
  @title ||= t
end

Instance Method Details

#as_bool(val) ⇒ Object


316
317
318
319
320
321
322
323
324
325
# File 'lib/rouge/lexer.rb', line 316

def as_bool(val)
  case val
  when nil, false, 0, '0', 'off'
    false
  when Array
    val.empty? ? true : as_bool(val.last)
  else
    true
  end
end

#as_lexer(val) ⇒ Object


344
345
346
347
348
349
350
351
352
353
354
355
# File 'lib/rouge/lexer.rb', line 344

def as_lexer(val)
  return as_lexer(val.last) if val.is_a?(Array)
  return val.new(@options) if val.is_a?(Class) && val < Lexer

  case val
  when Lexer
    val
  when String
    lexer_class = Lexer.find(val)
    lexer_class && lexer_class.new(@options)
  end
end

#as_list(val) ⇒ Object


333
334
335
336
337
338
339
340
341
342
# File 'lib/rouge/lexer.rb', line 333

def as_list(val)
  case val
  when Array
    val.flat_map { |v| as_list(v) }
  when String
    val.split(',')
  else
    []
  end
end

#as_string(val) ⇒ Object


327
328
329
330
331
# File 'lib/rouge/lexer.rb', line 327

def as_string(val)
  return as_string(val.last) if val.is_a?(Array)

  val ? val.to_s : nil
end

#as_token(val) ⇒ Object


357
358
359
360
361
362
363
364
365
# File 'lib/rouge/lexer.rb', line 357

def as_token(val)
  return as_token(val.last) if val.is_a?(Array)
  case val
  when Token
    val
  else
    Token[val]
  end
end

#bool_option(name, &default) ⇒ Object


367
368
369
370
371
372
373
374
375
# File 'lib/rouge/lexer.rb', line 367

def bool_option(name, &default)
  name_str = name.to_s

  if @options.key?(name_str)
    as_bool(@options[name_str])
  else
    default ? default.call : false
  end
end

#continue_lex(string, &b) ⇒ Object

Continue the lex from the the current state without resetting


452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
# File 'lib/rouge/lexer.rb', line 452

def continue_lex(string, &b)
  return enum_for(:continue_lex, string, &b) unless block_given?

  # consolidate consecutive tokens of the same type
  last_token = nil
  last_val = nil
  stream_tokens(string) do |tok, val|
    next if val.empty?

    if tok == last_token
      last_val << val
      next
    end

    b.call(last_token, last_val) if last_token
    last_token = tok
    last_val = val
  end

  b.call(last_token, last_val) if last_token
end

#hash_option(name, defaults, &val_cast) ⇒ Object


393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
# File 'lib/rouge/lexer.rb', line 393

def hash_option(name, defaults, &val_cast)
  name = name.to_s
  out = defaults.dup

  base = @options.delete(name.to_s)
  base = {} unless base.is_a?(Hash)
  base.each { |k, v| out[k.to_s] = val_cast ? val_cast.call(v) : v }

  @options.keys.each do |key|
    next unless key =~ /(\w+)\[(\w+)\]/ and $1 == name
    value = @options.delete(key)

    out[$2] = val_cast ? val_cast.call(value) : value
  end

  out
end

#lex(string, opts = nil, &b) ⇒ Object

Note:

The use of :continue => true has been deprecated. A warning is issued if run with $VERBOSE set to true.

Note:

The use of arbitrary opts has never been supported, but we previously ignored them with no error. We now warn unconditionally.

Given a string, yield [token, chunk] pairs. If no block is given, an enumerator is returned.

Options Hash (opts):

  • :continue (Object)

    Continue the lex from the previous state (i.e. don't call #reset!)


429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
# File 'lib/rouge/lexer.rb', line 429

def lex(string, opts=nil, &b)
  if opts
    if (opts.keys - [:continue]).size > 0
      # improper use of options hash
      warn('Improper use of Lexer#lex - this method does not receive options.' +
           ' This will become an error in a future version.')
    end

    if opts[:continue]
      warn '`lex :continue => true` is deprecated, please use #continue_lex instead'
      return continue_lex(string, &b)
    end
  end

  return enum_for(:lex, string) unless block_given?

  Lexer.assert_utf8!(string)
  reset!

  continue_lex(string, &b)
end

#lexer_option(name, &default) ⇒ Object


381
382
383
# File 'lib/rouge/lexer.rb', line 381

def lexer_option(name, &default)
  as_lexer(@options.delete(name.to_s, &default))
end

#list_option(name, &default) ⇒ Object


385
386
387
# File 'lib/rouge/lexer.rb', line 385

def list_option(name, &default)
  as_list(@options.delete(name.to_s, &default))
end

#reset!Object

This method is abstract.

Called after each lex is finished. The default implementation is a noop.


415
416
# File 'lib/rouge/lexer.rb', line 415

def reset!
end

#stream_tokens(stream, &b) ⇒ Object

This method is abstract.

Yield [token, chunk] pairs, given a prepared input stream. This must be implemented.


486
487
488
# File 'lib/rouge/lexer.rb', line 486

def stream_tokens(stream, &b)
  raise 'abstract'
end

#string_option(name, &default) ⇒ Object


377
378
379
# File 'lib/rouge/lexer.rb', line 377

def string_option(name, &default)
  as_string(@options.delete(name.to_s, &default))
end

#tagObject

delegated to tag


475
476
477
# File 'lib/rouge/lexer.rb', line 475

def tag
  self.class.tag
end

#token_option(name, &default) ⇒ Object


389
390
391
# File 'lib/rouge/lexer.rb', line 389

def token_option(name, &default)
  as_token(@options.delete(name.to_s, &default))
end