Class: Rouge::Lexer Abstract

Inherits:
Object
  • Object
show all
Defined in:
lib/rouge/lexer.rb

Overview

This class is abstract.

A lexer transforms text into a stream of ‘[token, chunk]` pairs.

Direct Known Subclasses

Rouge::Lexers::Text, RegexLexer

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(opts = {}) ⇒ Lexer

Create a new lexer with the given options. Individual lexers may specify extra options. The only current globally accepted option is ‘:debug`.

Parameters:

  • opts (Hash) (defaults to: {})

    a customizable set of options

Options Hash (opts):

  • :debug (Object)

    Prints debug information to stdout. The particular info depends on the lexer in question. In regex lexers, this will log the state stack at the beginning of each step, along with each regex tried and each stream consumed. Try it, it’s pretty useful.



235
236
237
# File 'lib/rouge/lexer.rb', line 235

def initialize(opts={})
  options(opts)
end

Class Method Details

.aliases(*args) ⇒ Object

Used to specify alternate names this lexer class may be found by.

Examples:

class Erb < Lexer
  tag 'erb'
  aliases 'eruby', 'rhtml'
end

Lexer.find('eruby') # => Erb


183
184
185
186
187
# File 'lib/rouge/lexer.rb', line 183

def aliases(*args)
  args.map!(&:to_s)
  args.each { |arg| Lexer.register(arg, self) }
  (@aliases ||= []).concat(args)
end

.allObject

Returns a list of all lexers.

Returns:

  • a list of all lexers.



87
88
89
# File 'lib/rouge/lexer.rb', line 87

def all
  registry.values.uniq
end

.analyze_text(text) ⇒ Object

This method is abstract.

Return a number between 0 and 1 indicating the likelihood that the text given should be lexed with this lexer. The default implementation returns 0.

Parameters:



328
329
330
# File 'lib/rouge/lexer.rb', line 328

def self.analyze_text(text)
  0
end

.assert_utf8!(str) ⇒ Object

Raises:

  • (EncodingError)


210
211
212
213
214
215
216
# File 'lib/rouge/lexer.rb', line 210

def assert_utf8!(str)
  return if %w(US-ASCII UTF-8).include? str.encoding.name
  raise EncodingError.new(
    "Bad encoding: #{str.encoding.names.join(',')}. " +
    "Please convert your string to UTF-8."
  )
end

.default_options(o = {}) ⇒ Object



17
18
19
20
21
# File 'lib/rouge/lexer.rb', line 17

def default_options(o={})
  @default_options ||= {}
  @default_options.merge!(o)
  @default_options
end

.demo(arg = :absent) ⇒ Object

Specify or get a small demo string for this lexer



80
81
82
83
84
# File 'lib/rouge/lexer.rb', line 80

def demo(arg=:absent)
  return @demo = arg unless arg == :absent

  @demo = File.read(demo_file)
end

.demo_file(arg = :absent) ⇒ Object

Specify or get the path name containing a small demo for this lexer (can be overriden by demo).



73
74
75
76
77
# File 'lib/rouge/lexer.rb', line 73

def demo_file(arg=:absent)
  return @demo_file = Pathname.new(arg) unless arg == :absent

  @demo_file = Pathname.new(__FILE__).dirname.join('demos', tag)
end

.desc(arg = :absent) ⇒ Object

Specify or get this lexer’s description.



63
64
65
66
67
68
69
# File 'lib/rouge/lexer.rb', line 63

def desc(arg=:absent)
  if arg == :absent
    @desc
  else
    @desc = arg
  end
end

.filenames(*fnames) ⇒ Object

Specify a list of filename globs associated with this lexer.

Examples:

class Ruby < Lexer
  filenames '*.rb', '*.ruby', 'Gemfile', 'Rakefile'
end


195
196
197
# File 'lib/rouge/lexer.rb', line 195

def filenames(*fnames)
  (@filenames ||= []).concat(fnames)
end

.find(name) ⇒ Object

Given a string, return the correct lexer class.



24
25
26
# File 'lib/rouge/lexer.rb', line 24

def find(name)
  registry[name.to_s]
end

.find_fancy(str, code = nil) ⇒ Object

Find a lexer, with fancy shiny features.

  • The string you pass can include CGI-style options

    Lexer.find_fancy('erb?parent=tex')
    
  • You can pass the special name ‘guess’ so we guess for you, and you can pass a second argument of the code to guess by

    Lexer.find_fancy('guess', "#!/bin/bash\necho Hello, world")
    

This is used in the Redcarpet plugin as well as Rouge’s own markdown lexer for highlighting internal code blocks.



42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# File 'lib/rouge/lexer.rb', line 42

def find_fancy(str, code=nil)
  name, opts = str ? str.split('?', 2) : [nil, '']

  # parse the options hash from a cgi-style string
  opts = CGI.parse(opts || '').map do |k, vals|
    [ k.to_sym, vals.empty? ? true : vals[0] ]
  end

  opts = Hash[opts]

  lexer_class = case name
  when 'guess', nil
    self.guess(:source => code, :mimetype => opts[:mimetype])
  when String
    self.find(name)
  end

  lexer_class && lexer_class.new(opts)
end

.guess(info = {}) ⇒ Object

Guess which lexer to use based on a hash of info.

Parameters:

  • info (Hash) (defaults to: {})

    a customizable set of options

Options Hash (info):

  • :mimetype (Object)

    A mimetype to guess by

  • :filename (Object)

    A filename to guess by

  • :source (Object)

    The source itself, which, if guessing by mimetype or filename fails, will be searched for shebangs, <!DOCTYPE …> tags, and other hints.

See Also:



103
104
105
106
107
108
109
110
111
112
113
114
115
# File 'lib/rouge/lexer.rb', line 103

def guess(info={})
  by_mimetype = guess_by_mimetype(info[:mimetype]) if info[:mimetype]
  return by_mimetype if by_mimetype

  by_filename = guess_by_filename(info[:filename]) if info[:filename]
  return by_filename if by_filename

  by_source = guess_by_source(info[:source]) if info[:source]
  return by_source if by_source

  # guessing failed, just parse it as text
  return Lexers::Text
end

.guess_by_filename(fname) ⇒ Object



123
124
125
126
127
128
129
130
# File 'lib/rouge/lexer.rb', line 123

def guess_by_filename(fname)
  fname = File.basename(fname)
  registry.values.detect do |lexer|
    lexer.filenames.any? do |pattern|
      File.fnmatch?(pattern, fname, File::FNM_DOTMATCH)
    end
  end
end

.guess_by_mimetype(mt) ⇒ Object



117
118
119
120
121
# File 'lib/rouge/lexer.rb', line 117

def guess_by_mimetype(mt)
  registry.values.detect do |lexer|
    lexer.mimetypes.include? mt
  end
end

.guess_by_source(source) ⇒ Object



132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
# File 'lib/rouge/lexer.rb', line 132

def guess_by_source(source)
  assert_utf8!(source)

  source = TextAnalyzer.new(source)

  best_result = 0
  best_match = nil
  registry.values.each do |lexer|
    result = lexer.analyze_text(source) || 0
    return lexer if result == 1

    if result > best_result
      best_match = lexer
      best_result = result
    end
  end

  best_match
end

.lex(stream, opts = {}, &b) ⇒ Object

Lexes ‘stream` with the given options. The lex is delegated to a new instance.

See Also:



13
14
15
# File 'lib/rouge/lexer.rb', line 13

def lex(stream, opts={}, &b)
  new(opts).lex(stream, &b)
end

.mimetypes(*mts) ⇒ Object

Specify a list of mimetypes associated with this lexer.

Examples:

class Html < Lexer
  mimetypes 'text/html', 'application/xhtml+xml'
end


205
206
207
# File 'lib/rouge/lexer.rb', line 205

def mimetypes(*mts)
  (@mimetypes ||= []).concat(mts)
end

.register(name, lexer) ⇒ Object



153
154
155
# File 'lib/rouge/lexer.rb', line 153

def register(name, lexer)
  registry[name.to_s] = lexer
end

.tag(t = nil) ⇒ Object

Used to specify or get the canonical name of this lexer class.

Examples:

class MyLexer < Lexer
  tag 'foo'
end

MyLexer.tag # => 'foo'

Lexer.find('foo') # => MyLexer


167
168
169
170
171
172
# File 'lib/rouge/lexer.rb', line 167

def tag(t=nil)
  return @tag if t.nil?

  @tag = t.to_s
  Lexer.register(@tag, self)
end

Instance Method Details

#debug(&b) ⇒ Object

Leave a debug message if the ‘:debug` option is set. The message is given as a block because some debug messages contain calculated information that is unnecessary for lexing in the real world.

Examples:

debug { "hello, world!" }


261
262
263
# File 'lib/rouge/lexer.rb', line 261

def debug(&b)
  puts(b.call) if option :debug
end

#lex(string, opts = {}, &b) ⇒ Object

Given a string, yield [token, chunk] pairs. If no block is given, an enumerator is returned.

Parameters:

  • opts (Hash) (defaults to: {})

    a customizable set of options

Options Hash (opts):

  • :continue (Object)

    Continue the lex from the previous state (i.e. don’t call #reset!)



277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
# File 'lib/rouge/lexer.rb', line 277

def lex(string, opts={}, &b)
  return enum_for(:lex, string) unless block_given?

  Lexer.assert_utf8!(string)

  reset! unless opts[:continue]

  # consolidate consecutive tokens of the same type
  last_token = nil
  last_val = nil
  stream_tokens(StringScanner.new(string)) do |tok, val|
    next if val.empty?

    if tok == last_token
      last_val << val
      next
    end

    b.call(last_token, last_val) if last_token
    last_token = tok
    last_val = val
  end

  b.call(last_token, last_val) if last_token
end

#option(k, v = :absent) ⇒ Object

get or specify one option for this lexer



247
248
249
250
251
252
253
# File 'lib/rouge/lexer.rb', line 247

def option(k, v=:absent)
  if v == :absent
    options[k]
  else
    options({ k => v })
  end
end

#options(o = {}) ⇒ Object

get and/or specify the options for this lexer.



240
241
242
243
244
# File 'lib/rouge/lexer.rb', line 240

def options(o={})
  (@options ||= {}).merge!(o)

  self.class.default_options.merge(@options)
end

#reset!Object

This method is abstract.

Called after each lex is finished. The default implementation is a noop.



269
270
# File 'lib/rouge/lexer.rb', line 269

def reset!
end

#stream_tokens(stream, &b) ⇒ Object

This method is abstract.

Yield ‘[token, chunk]` pairs, given a prepared input stream. This must be implemented.

Parameters:

  • stream (StringScanner)

    the stream



315
316
317
# File 'lib/rouge/lexer.rb', line 315

def stream_tokens(stream, &b)
  raise 'abstract'
end

#tagObject

delegated to tag



304
305
306
# File 'lib/rouge/lexer.rb', line 304

def tag
  self.class.tag
end