Class: CodeRay::Scanners::Scanner
- Inherits:
-
StringScanner
- Object
- StringScanner
- CodeRay::Scanners::Scanner
- Extended by:
- Plugin
- Includes:
- Enumerable
- Defined in:
- lib/coderay/scanner.rb
Overview
Scanner
The base class for all Scanners.
It is a subclass of Ruby’s great StringScanner
, which makes it easy to access the scanning methods inside.
It is also Enumerable
, so you can use it like an Array of Tokens:
require 'coderay'
c_scanner = CodeRay::Scanners[:c].new "if (*p == '{') nest++;"
for text, kind in c_scanner
puts text if kind == :operator
end
# prints: (*==)++;
OK, this is a very simple example :) You can also use map
, any?
, find
and even sort_by
, if you want.
Direct Known Subclasses
C, CPlusPlus, CSS, Clojure, Debug, Delphi, Diff, ERB, HAML, HTML, JSON, Java, JavaScript, Python, Raydebug, Ruby, SQL, Text, YAML
Constant Summary collapse
- ScanError =
Raised if a Scanner fails while scanning
Class.new StandardError
- DEFAULT_OPTIONS =
The default options for all scanner classes.
Define @default_options for subclasses.
{ }
- KINDS_NOT_LOC =
[:comment, :doctype, :docstring]
Instance Attribute Summary collapse
-
#state ⇒ Object
Returns the value of attribute state.
Attributes included from Plugin
Class Method Summary collapse
-
.encoding(name = 'UTF-8') ⇒ Object
The encoding used internally by this scanner.
-
.file_extension(extension = lang) ⇒ Object
The typical filename suffix for this scanner’s language.
-
.lang ⇒ Object
The lang of this Scanner class, which is equal to its Plugin ID.
-
.normalize(code) ⇒ Object
Normalizes the given code into a string with UNIX newlines, in the scanner’s internal encoding, with invalid and undefined charachters replaced by placeholders.
Instance Method Summary collapse
-
#binary_string ⇒ Object
The string in binary encoding.
-
#column(pos = self.pos) ⇒ Object
The current column position of the scanner, starting with 1.
-
#each(&block) ⇒ Object
Traverse the tokens.
-
#file_extension ⇒ Object
the default file extension for this scanner.
-
#initialize(code = '', options = {}) ⇒ Scanner
constructor
Create a new Scanner.
-
#lang ⇒ Object
the Plugin ID for this scanner.
-
#line(pos = self.pos) ⇒ Object
The current line position of the scanner, starting with 1.
-
#reset ⇒ Object
Sets back the scanner.
-
#string=(code) ⇒ Object
Set a new string to be scanned.
-
#tokenize(source = nil, options = {}) ⇒ Object
Scan the code and returns all tokens in a Tokens object.
-
#tokens ⇒ Object
Cache the result of tokenize.
Methods included from Plugin
aliases, plugin_host, register_for, title
Constructor Details
#initialize(code = '', options = {}) ⇒ Scanner
Create a new Scanner.
-
code
is the input String and is handled by the superclass StringScanner. -
options
is a Hash with Symbols as keys. It is merged with the default options of the class (you can overwrite default options here.)
Else, a Tokens object is used.
143 144 145 146 147 148 149 150 151 152 153 154 155 156 |
# File 'lib/coderay/scanner.rb', line 143 def initialize code = '', = {} if self.class == Scanner raise NotImplementedError, "I am only the basic Scanner class. I can't scan anything. :( Use my subclasses." end @options = self.class::DEFAULT_OPTIONS.merge super self.class.normalize(code) @tokens = [:tokens] || Tokens.new @tokens.scanner = self if @tokens.respond_to? :scanner= setup end |
Instance Attribute Details
#state ⇒ Object
Returns the value of attribute state.
62 63 64 |
# File 'lib/coderay/scanner.rb', line 62 def state @state end |
Class Method Details
.encoding(name = 'UTF-8') ⇒ Object
The encoding used internally by this scanner.
89 90 91 |
# File 'lib/coderay/scanner.rb', line 89 def encoding name = 'UTF-8' @encoding ||= defined?(Encoding.find) && Encoding.find(name) end |
.file_extension(extension = lang) ⇒ Object
The typical filename suffix for this scanner’s language.
84 85 86 |
# File 'lib/coderay/scanner.rb', line 84 def file_extension extension = lang @file_extension ||= extension.to_s end |
.lang ⇒ Object
The lang of this Scanner class, which is equal to its Plugin ID.
94 95 96 |
# File 'lib/coderay/scanner.rb', line 94 def lang @plugin_id end |
.normalize(code) ⇒ Object
Normalizes the given code into a string with UNIX newlines, in the scanner’s internal encoding, with invalid and undefined charachters replaced by placeholders. Always returns a new object.
69 70 71 72 73 74 75 76 77 78 79 80 81 |
# File 'lib/coderay/scanner.rb', line 69 def normalize code # original = code code = code.to_s unless code.is_a? ::String return code if code.empty? if code.respond_to? :encoding code = encode_with_encoding code, self.encoding else code = to_unix code end # code = code.dup if code.eql? original code end |
Instance Method Details
#binary_string ⇒ Object
The string in binary encoding.
To be used with #pos, which is the index of the byte the scanner will scan next.
243 244 245 246 247 248 249 250 251 252 |
# File 'lib/coderay/scanner.rb', line 243 def binary_string @binary_string ||= if string.respond_to?(:bytesize) && string.bytesize != string.size #:nocov: string.dup.force_encoding('binary') #:nocov: else string end end |
#column(pos = self.pos) ⇒ Object
The current column position of the scanner, starting with 1. See also: #line.
234 235 236 237 |
# File 'lib/coderay/scanner.rb', line 234 def column pos = self.pos return 1 if pos <= 0 pos - (binary_string.rindex(?\n, pos - 1) || -1) end |
#each(&block) ⇒ Object
Traverse the tokens.
217 218 219 |
# File 'lib/coderay/scanner.rb', line 217 def each &block tokens.each(&block) end |
#file_extension ⇒ Object
the default file extension for this scanner
178 179 180 |
# File 'lib/coderay/scanner.rb', line 178 def file_extension self.class.file_extension end |
#lang ⇒ Object
the Plugin ID for this scanner
173 174 175 |
# File 'lib/coderay/scanner.rb', line 173 def lang self.class.lang end |
#line(pos = self.pos) ⇒ Object
The current line position of the scanner, starting with 1. See also: #column.
Beware, this is implemented inefficiently. It should be used for debugging only.
227 228 229 230 |
# File 'lib/coderay/scanner.rb', line 227 def line pos = self.pos return 1 if pos <= 0 binary_string[0...pos].count("\n") + 1 end |
#reset ⇒ Object
Sets back the scanner. Subclasses should redefine the reset_instance method instead of this one.
160 161 162 163 |
# File 'lib/coderay/scanner.rb', line 160 def reset super reset_instance end |
#string=(code) ⇒ Object
Set a new string to be scanned.
166 167 168 169 170 |
# File 'lib/coderay/scanner.rb', line 166 def string= code code = self.class.normalize(code) super code reset_instance end |
#tokenize(source = nil, options = {}) ⇒ Object
Scan the code and returns all tokens in a Tokens object.
183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 |
# File 'lib/coderay/scanner.rb', line 183 def tokenize source = nil, = {} = @options.merge() @tokens = [:tokens] || @tokens || Tokens.new @tokens.scanner = self if @tokens.respond_to? :scanner= case source when Array self.string = self.class.normalize(source.join) when nil reset else self.string = self.class.normalize(source) end begin scan_tokens @tokens, rescue => e = "Error in %s#scan_tokens, initial state was: %p" % [self.class, defined?(state) && state] raise_inspect e., @tokens, , 30, e.backtrace end @cached_tokens = @tokens if source.is_a? Array @tokens.split_into_parts(*source.map { |part| part.size }) else @tokens end end |
#tokens ⇒ Object
Cache the result of tokenize.
212 213 214 |
# File 'lib/coderay/scanner.rb', line 212 def tokens @cached_tokens ||= tokenize end |