Class: CodeRay::Scanners::Scanner

Inherits:
StringScanner
  • Object
show all
Extended by:
Plugin
Includes:
Enumerable
Defined in:
lib/coderay/scanner.rb

Overview

Scanner

The base class for all Scanners.

It is a subclass of Ruby’s great StringScanner, which makes it easy to access the scanning methods inside.

It is also Enumerable, so you can use it like an Array of Tokens:

require 'coderay'

c_scanner = CodeRay::Scanners[:c].new "if (*p == '{') nest++;"

for text, kind in c_scanner
  puts text if kind == :operator
end

# prints: (*==)++;

OK, this is a very simple example :) You can also use map, any?, find and even sort_by, if you want.

Constant Summary collapse

ScanError =

Raised if a Scanner fails while scanning

Class.new StandardError
DEFAULT_OPTIONS =

The default options for all scanner classes.

Define @default_options for subclasses.

{ }
KINDS_NOT_LOC =
[:comment, :doctype, :docstring]

Instance Attribute Summary collapse

Attributes included from Plugin

#plugin_id

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Plugin

aliases, plugin_host, register_for, title

Constructor Details

#initialize(code = '', options = {}) ⇒ Scanner

Create a new Scanner.

  • code is the input String and is handled by the superclass StringScanner.

  • options is a Hash with Symbols as keys. It is merged with the default options of the class (you can overwrite default options here.)

Else, a Tokens object is used.



143
144
145
146
147
148
149
150
151
152
153
154
155
156
# File 'lib/coderay/scanner.rb', line 143

def initialize code = '', options = {}
  if self.class == Scanner
    raise NotImplementedError, "I am only the basic Scanner class. I can't scan anything. :( Use my subclasses."
  end
  
  @options = self.class::DEFAULT_OPTIONS.merge options
  
  super self.class.normalize(code)
  
  @tokens = options[:tokens] || Tokens.new
  @tokens.scanner = self if @tokens.respond_to? :scanner=
  
  setup
end

Instance Attribute Details

#stateObject

Returns the value of attribute state.



62
63
64
# File 'lib/coderay/scanner.rb', line 62

def state
  @state
end

Class Method Details

.encoding(name = 'UTF-8') ⇒ Object

The encoding used internally by this scanner.



89
90
91
# File 'lib/coderay/scanner.rb', line 89

def encoding name = 'UTF-8'
  @encoding ||= defined?(Encoding.find) && Encoding.find(name)
end

.file_extension(extension = lang) ⇒ Object

The typical filename suffix for this scanner’s language.



84
85
86
# File 'lib/coderay/scanner.rb', line 84

def file_extension extension = lang
  @file_extension ||= extension.to_s
end

.langObject

The lang of this Scanner class, which is equal to its Plugin ID.



94
95
96
# File 'lib/coderay/scanner.rb', line 94

def lang
  @plugin_id
end

.normalize(code) ⇒ Object

Normalizes the given code into a string with UNIX newlines, in the scanner’s internal encoding, with invalid and undefined charachters replaced by placeholders. Always returns a new object.



69
70
71
72
73
74
75
76
77
78
79
80
81
# File 'lib/coderay/scanner.rb', line 69

def normalize code
  # original = code
  code = code.to_s unless code.is_a? ::String
  return code if code.empty?
  
  if code.respond_to? :encoding
    code = encode_with_encoding code, self.encoding
  else
    code = to_unix code
  end
  # code = code.dup if code.eql? original
  code
end

Instance Method Details

#binary_stringObject

The string in binary encoding.

To be used with #pos, which is the index of the byte the scanner will scan next.



236
237
238
239
240
241
242
243
244
245
# File 'lib/coderay/scanner.rb', line 236

def binary_string
  @binary_string ||=
    if string.respond_to?(:bytesize) && string.bytesize != string.size
      #:nocov:
      string.dup.force_encoding('binary')
      #:nocov:
    else
      string
    end
end

#column(pos = self.pos) ⇒ Object

The current column position of the scanner, starting with 1. See also: #line.



227
228
229
230
# File 'lib/coderay/scanner.rb', line 227

def column pos = self.pos
  return 1 if pos <= 0
  pos - (binary_string.rindex(?\n, pos - 1) || -1)
end

#each(&block) ⇒ Object

Traverse the tokens.



210
211
212
# File 'lib/coderay/scanner.rb', line 210

def each &block
  tokens.each(&block)
end

#file_extensionObject

the default file extension for this scanner



178
179
180
# File 'lib/coderay/scanner.rb', line 178

def file_extension
  self.class.file_extension
end

#langObject

the Plugin ID for this scanner



173
174
175
# File 'lib/coderay/scanner.rb', line 173

def lang
  self.class.lang
end

#line(pos = self.pos) ⇒ Object

The current line position of the scanner, starting with 1. See also: #column.

Beware, this is implemented inefficiently. It should be used for debugging only.



220
221
222
223
# File 'lib/coderay/scanner.rb', line 220

def line pos = self.pos
  return 1 if pos <= 0
  binary_string[0...pos].count("\n") + 1
end

#resetObject

Sets back the scanner. Subclasses should redefine the reset_instance method instead of this one.



160
161
162
163
# File 'lib/coderay/scanner.rb', line 160

def reset
  super
  reset_instance
end

#string=(code) ⇒ Object

Set a new string to be scanned.



166
167
168
169
170
# File 'lib/coderay/scanner.rb', line 166

def string= code
  code = self.class.normalize(code)
  super code
  reset_instance
end

#tokenize(source = nil, options = {}) ⇒ Object

Scan the code and returns all tokens in a Tokens object.



183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
# File 'lib/coderay/scanner.rb', line 183

def tokenize source = nil, options = {}
  options = @options.merge(options)
  
  set_tokens_from_options options
  set_string_from_source source
  
  begin
    scan_tokens @tokens, options
  rescue => e
    message = "Error in %s#scan_tokens, initial state was: %p" % [self.class, defined?(state) && state]
    raise_inspect e.message, @tokens, message, 30, e.backtrace
  end
  
  @cached_tokens = @tokens
  if source.is_a? Array
    @tokens.split_into_parts(*source.map { |part| part.size })
  else
    @tokens
  end
end

#tokensObject

Cache the result of tokenize.



205
206
207
# File 'lib/coderay/scanner.rb', line 205

def tokens
  @cached_tokens ||= tokenize
end