Class: Rouge::Lexer Abstract

Inherits:
Object
  • Object
show all
Includes:
Token::Tokens
Defined in:
lib/rouge/lexer.rb

Overview

This class is abstract.

A lexer transforms text into a stream of `[token, chunk]` pairs.

Direct Known Subclasses

Rouge::Lexers::PlainText, RegexLexer

Defined Under Namespace

Classes: AmbiguousGuess

Constant Summary

Constants included from Token::Tokens

Token::Tokens::Num, Token::Tokens::Str

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Token::Tokens

token

Constructor Details

#initialize(opts = {}) ⇒ Lexer

Create a new lexer with the given options. Individual lexers may specify extra options. The only current globally accepted option is `:debug`.

Parameters:

  • opts (Hash) (defaults to: {})

    a customizable set of options

Options Hash (opts):

  • :debug (Object)

    Prints debug information to stdout. The particular info depends on the lexer in question. In regex lexers, this will log the state stack at the beginning of each step, along with each regex tried and each stream consumed. Try it, it's pretty useful.


329
330
331
332
333
# File 'lib/rouge/lexer.rb', line 329

def initialize(opts={})
  options(opts)

  @debug = option(:debug)
end

Class Method Details

.aliases(*args) ⇒ Object

Used to specify alternate names this lexer class may be found by.

Examples:

class Erb < Lexer
  tag 'erb'
  aliases 'eruby', 'rhtml'
end

Lexer.find('eruby') # => Erb

277
278
279
280
281
# File 'lib/rouge/lexer.rb', line 277

def aliases(*args)
  args.map!(&:to_s)
  args.each { |arg| Lexer.register(arg, self) }
  (@aliases ||= []).concat(args)
end

.allObject

Returns a list of all lexers.

Returns:

  • a list of all lexers.


101
102
103
# File 'lib/rouge/lexer.rb', line 101

def all
  registry.values.uniq
end

.analyze_text(text) ⇒ Object

This method is abstract.

Return a number between 0 and 1 indicating the likelihood that the text given should be lexed with this lexer. The default implementation returns 0. Values under 0.5 will only be used to disambiguate filename or mimetype matches.

Parameters:


432
433
434
# File 'lib/rouge/lexer.rb', line 432

def self.analyze_text(text)
  0
end

.assert_utf8!(str) ⇒ Object

Raises:

  • (EncodingError)

304
305
306
307
308
309
310
# File 'lib/rouge/lexer.rb', line 304

def assert_utf8!(str)
  return if %w(US-ASCII UTF-8 ASCII-8BIT).include? str.encoding.name
  raise EncodingError.new(
    "Bad encoding: #{str.encoding.names.join(',')}. " +
    "Please convert your string to UTF-8."
  )
end

.default_options(o = {}) ⇒ Object


23
24
25
26
27
# File 'lib/rouge/lexer.rb', line 23

def default_options(o={})
  @default_options ||= {}
  @default_options.merge!(o)
  @default_options
end

.demo(arg = :absent) ⇒ Object

Specify or get a small demo string for this lexer


94
95
96
97
98
# File 'lib/rouge/lexer.rb', line 94

def demo(arg=:absent)
  return @demo = arg unless arg == :absent

  @demo = File.read(demo_file, encoding: 'utf-8')
end

.demo_file(arg = :absent) ⇒ Object

Specify or get the path name containing a small demo for this lexer (can be overriden by demo).


87
88
89
90
91
# File 'lib/rouge/lexer.rb', line 87

def demo_file(arg=:absent)
  return @demo_file = Pathname.new(arg) unless arg == :absent

  @demo_file = Pathname.new(__FILE__).dirname.join('demos', tag)
end

.desc(arg = :absent) ⇒ Object

Specify or get this lexer's description.


77
78
79
80
81
82
83
# File 'lib/rouge/lexer.rb', line 77

def desc(arg=:absent)
  if arg == :absent
    @desc
  else
    @desc = arg
  end
end

.filenames(*fnames) ⇒ Object

Specify a list of filename globs associated with this lexer.

Examples:

class Ruby < Lexer
  filenames '*.rb', '*.ruby', 'Gemfile', 'Rakefile'
end

289
290
291
# File 'lib/rouge/lexer.rb', line 289

def filenames(*fnames)
  (@filenames ||= []).concat(fnames)
end

.find(name) ⇒ Object

Given a string, return the correct lexer class.


30
31
32
# File 'lib/rouge/lexer.rb', line 30

def find(name)
  registry[name.to_s]
end

.find_fancy(str, code = nil) ⇒ Object

Find a lexer, with fancy shiny features.

  • The string you pass can include CGI-style options

    Lexer.find_fancy('erb?parent=tex')
    
  • You can pass the special name 'guess' so we guess for you, and you can pass a second argument of the code to guess by

    Lexer.find_fancy('guess', "#!/bin/bash\necho Hello, world")
    

This is used in the Redcarpet plugin as well as Rouge's own markdown lexer for highlighting internal code blocks.


48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
# File 'lib/rouge/lexer.rb', line 48

def find_fancy(str, code=nil)
  name, opts = str ? str.split('?', 2) : [nil, '']

  # parse the options hash from a cgi-style string
  opts = CGI.parse(opts || '').map do |k, vals|
    [ k.to_sym, vals.empty? ? true : vals[0] ]
  end

  opts = Hash[opts]

  lexer_class = case name
  when 'guess', nil
    self.guess(:source => code, :mimetype => opts[:mimetype])
  when String
    self.find(name)
  end

  lexer_class && lexer_class.new(opts)
end

.guess(info = {}) ⇒ Object

Guess which lexer to use based on a hash of info.

Parameters:

  • info (Hash) (defaults to: {})

    a customizable set of options

Options Hash (info):

  • :mimetype (Object)

    A mimetype to guess by

  • :filename (Object)

    A filename to guess by

  • :source (Object)

    The source itself, which, if guessing by mimetype or filename fails, will be searched for shebangs, <!DOCTYPE …> tags, and other hints.

Raises:

See Also:


156
157
158
159
160
161
162
163
# File 'lib/rouge/lexer.rb', line 156

def guess(info={})
  lexers = guesses(info)

  return Lexers::PlainText if lexers.empty?
  return lexers[0] if lexers.size == 1

  raise AmbiguousGuess.new(lexers)
end

.guess_by_filename(fname) ⇒ Object


169
170
171
# File 'lib/rouge/lexer.rb', line 169

def guess_by_filename(fname)
  guess :filename => fname
end

.guess_by_mimetype(mt) ⇒ Object


165
166
167
# File 'lib/rouge/lexer.rb', line 165

def guess_by_mimetype(mt)
  guess :mimetype => mt
end

.guess_by_source(source) ⇒ Object


173
174
175
# File 'lib/rouge/lexer.rb', line 173

def guess_by_source(source)
  guess :source => source
end

.guesses(info = {}) ⇒ Object

Guess which lexer to use based on a hash of info.

This accepts the same arguments as Lexer.guess, but will never throw an error. It will return a (possibly empty) list of potential lexers to use.


110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
# File 'lib/rouge/lexer.rb', line 110

def guesses(info={})
  mimetype, filename, source = info.values_at(:mimetype, :filename, :source)
  lexers = registry.values.uniq
  total_size = lexers.size

  lexers = filter_by_mimetype(lexers, mimetype) if mimetype
  return lexers if lexers.size == 1

  lexers = filter_by_filename(lexers, filename) if filename
  return lexers if lexers.size == 1

  if source
    # If we're filtering against *all* lexers, we only use confident return
    # values from analyze_text.  But if we've filtered down already, we can trust
    # the analysis more.
    source_threshold = lexers.size < total_size ? 0 : 0.5
    return [best_by_source(lexers, source, source_threshold)].compact
  elsif lexers.size < total_size
    return lexers
  else
    return []
  end
end

.lex(stream, opts = {}, &b) ⇒ Object

Lexes `stream` with the given options. The lex is delegated to a new instance.

See Also:


19
20
21
# File 'lib/rouge/lexer.rb', line 19

def lex(stream, opts={}, &b)
  new(opts).lex(stream, &b)
end

.mimetypes(*mts) ⇒ Object

Specify a list of mimetypes associated with this lexer.

Examples:

class Html < Lexer
  mimetypes 'text/html', 'application/xhtml+xml'
end

299
300
301
# File 'lib/rouge/lexer.rb', line 299

def mimetypes(*mts)
  (@mimetypes ||= []).concat(mts)
end

.tag(t = nil) ⇒ Object

Used to specify or get the canonical name of this lexer class.

Examples:

class MyLexer < Lexer
  tag 'foo'
end

MyLexer.tag # => 'foo'

Lexer.find('foo') # => MyLexer

261
262
263
264
265
266
# File 'lib/rouge/lexer.rb', line 261

def tag(t=nil)
  return @tag if t.nil?

  @tag = t.to_s
  Lexer.register(@tag, self)
end

.title(t = nil) ⇒ Object

Specify or get this lexer's title. Meant to be human-readable.


69
70
71
72
73
74
# File 'lib/rouge/lexer.rb', line 69

def title(t=nil)
  if t.nil?
    t = tag.capitalize
  end
  @title ||= t
end

Instance Method Details

#debugObject

Deprecated.

Instead of `debug { “foo” }`, simply `puts “foo” if @debug`.

Leave a debug message if the `:debug` option is set. The message is given as a block because some debug messages contain calculated information that is unnecessary for lexing in the real world.

Calls to this method should be guarded with “if @debug” for best performance when debugging is turned off.

Examples:

debug { "hello, world!" } if @debug

363
364
365
366
# File 'lib/rouge/lexer.rb', line 363

def debug
  warn "Lexer#debug is deprecated.  Simply puts if @debug instead."
  puts yield if @debug
end

#lex(string, opts = {}, &b) ⇒ Object

Given a string, yield [token, chunk] pairs. If no block is given, an enumerator is returned.

Parameters:

  • opts (Hash) (defaults to: {})

    a customizable set of options

Options Hash (opts):

  • :continue (Object)

    Continue the lex from the previous state (i.e. don't call #reset!)


380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
# File 'lib/rouge/lexer.rb', line 380

def lex(string, opts={}, &b)
  return enum_for(:lex, string, opts) unless block_given?

  Lexer.assert_utf8!(string)

  reset! unless opts[:continue]

  # consolidate consecutive tokens of the same type
  last_token = nil
  last_val = nil
  stream_tokens(string) do |tok, val|
    next if val.empty?

    if tok == last_token
      last_val << val
      next
    end

    b.call(last_token, last_val) if last_token
    last_token = tok
    last_val = val
  end

  b.call(last_token, last_val) if last_token
end

#option(k, v = :absent) ⇒ Object

get or specify one option for this lexer


343
344
345
346
347
348
349
# File 'lib/rouge/lexer.rb', line 343

def option(k, v=:absent)
  if v == :absent
    options[k]
  else
    options({ k => v })
  end
end

#options(o = {}) ⇒ Object

get and/or specify the options for this lexer.


336
337
338
339
340
# File 'lib/rouge/lexer.rb', line 336

def options(o={})
  (@options ||= {}).merge!(o)

  self.class.default_options.merge(@options)
end

#reset!Object

This method is abstract.

Called after each lex is finished. The default implementation is a noop.


372
373
# File 'lib/rouge/lexer.rb', line 372

def reset!
end

#stream_tokens(stream, &b) ⇒ Object

This method is abstract.

Yield `[token, chunk]` pairs, given a prepared input stream. This must be implemented.

Parameters:

  • stream (StringScanner)

    the stream


418
419
420
# File 'lib/rouge/lexer.rb', line 418

def stream_tokens(stream, &b)
  raise 'abstract'
end

#tagObject

delegated to tag


407
408
409
# File 'lib/rouge/lexer.rb', line 407

def tag
  self.class.tag
end