Class: CodeRay::Scanners::Scanner
- Inherits:
-
StringScanner
- Object
- StringScanner
- CodeRay::Scanners::Scanner
- Extended by:
- Plugin
- Includes:
- Enumerable
- Defined in:
- lib/coderay/scanner.rb
Overview
Scanner
The base class for all Scanners.
It is a subclass of Ruby’s great StringScanner
, which makes it easy to access the scanning methods inside.
It is also Enumerable
, so you can use it like an Array of Tokens:
require 'coderay'
c_scanner = CodeRay::Scanners[:c].new "if (*p == '{') nest++;"
for text, kind in c_scanner
puts text if kind == :operator
end
# prints: (*==)++;
OK, this is a very simple example :) You can also use map
, any?
, find
and even sort_by
, if you want.
Direct Known Subclasses
C, CPlusPlus, CSS, Clojure, Debug, Delphi, Diff, ERB, Go, HAML, HTML, JSON, Java, JavaScript, Lua, PHP, Python, Raydebug, Ruby, SQL, Taskpaper, Text, YAML
Constant Summary collapse
- ScanError =
Raised if a Scanner fails while scanning
Class.new StandardError
- DEFAULT_OPTIONS =
The default options for all scanner classes.
Define @default_options for subclasses.
{ }
- KINDS_NOT_LOC =
[:comment, :doctype, :docstring]
Instance Attribute Summary collapse
-
#state ⇒ Object
Returns the value of attribute state.
Attributes included from Plugin
Class Method Summary collapse
-
.encoding(name = 'UTF-8') ⇒ Object
The encoding used internally by this scanner.
-
.file_extension(extension = lang) ⇒ Object
The typical filename suffix for this scanner’s language.
-
.lang ⇒ Object
The lang of this Scanner class, which is equal to its Plugin ID.
-
.normalize(code) ⇒ Object
Normalizes the given code into a string with UNIX newlines, in the scanner’s internal encoding, with invalid and undefined charachters replaced by placeholders.
Instance Method Summary collapse
-
#binary_string ⇒ Object
The string in binary encoding.
-
#column(pos = self.pos) ⇒ Object
The current column position of the scanner, starting with 1.
-
#each(&block) ⇒ Object
Traverse the tokens.
-
#file_extension ⇒ Object
the default file extension for this scanner.
-
#initialize(code = '', options = {}) ⇒ Scanner
constructor
Create a new Scanner.
-
#lang ⇒ Object
the Plugin ID for this scanner.
-
#line(pos = self.pos) ⇒ Object
The current line position of the scanner, starting with 1.
-
#reset ⇒ Object
Sets back the scanner.
-
#string=(code) ⇒ Object
Set a new string to be scanned.
-
#tokenize(source = nil, options = {}) ⇒ Object
Scan the code and returns all tokens in a Tokens object.
-
#tokens ⇒ Object
Cache the result of tokenize.
Methods included from Plugin
aliases, plugin_host, register_for, title
Constructor Details
#initialize(code = '', options = {}) ⇒ Scanner
Create a new Scanner.
-
code
is the input String and is handled by the superclass StringScanner. -
options
is a Hash with Symbols as keys. It is merged with the default options of the class (you can overwrite default options here.)
Else, a Tokens object is used.
143 144 145 146 147 148 149 150 151 152 153 154 155 156 |
# File 'lib/coderay/scanner.rb', line 143 def initialize code = '', = {} if self.class == Scanner raise NotImplementedError, "I am only the basic Scanner class. I can't scan anything. :( Use my subclasses." end @options = self.class::DEFAULT_OPTIONS.merge super self.class.normalize(code) @tokens = [:tokens] || Tokens.new @tokens.scanner = self if @tokens.respond_to? :scanner= setup end |
Instance Attribute Details
#state ⇒ Object
Returns the value of attribute state.
62 63 64 |
# File 'lib/coderay/scanner.rb', line 62 def state @state end |
Class Method Details
.encoding(name = 'UTF-8') ⇒ Object
The encoding used internally by this scanner.
89 90 91 |
# File 'lib/coderay/scanner.rb', line 89 def encoding name = 'UTF-8' @encoding ||= defined?(Encoding.find) && Encoding.find(name) end |
.file_extension(extension = lang) ⇒ Object
The typical filename suffix for this scanner’s language.
84 85 86 |
# File 'lib/coderay/scanner.rb', line 84 def file_extension extension = lang @file_extension ||= extension.to_s end |
.lang ⇒ Object
The lang of this Scanner class, which is equal to its Plugin ID.
94 95 96 |
# File 'lib/coderay/scanner.rb', line 94 def lang @plugin_id end |
.normalize(code) ⇒ Object
Normalizes the given code into a string with UNIX newlines, in the scanner’s internal encoding, with invalid and undefined charachters replaced by placeholders. Always returns a new object.
69 70 71 72 73 74 75 76 77 78 79 80 81 |
# File 'lib/coderay/scanner.rb', line 69 def normalize code # original = code code = code.to_s unless code.is_a? ::String return code if code.empty? if code.respond_to? :encoding code = encode_with_encoding code, self.encoding else code = to_unix code end # code = code.dup if code.eql? original code end |
Instance Method Details
#binary_string ⇒ Object
The string in binary encoding.
To be used with #pos, which is the index of the byte the scanner will scan next.
236 237 238 239 240 241 242 243 244 245 |
# File 'lib/coderay/scanner.rb', line 236 def binary_string @binary_string ||= if string.respond_to?(:bytesize) && string.bytesize != string.size #:nocov: string.dup.force_encoding('binary') #:nocov: else string end end |
#column(pos = self.pos) ⇒ Object
The current column position of the scanner, starting with 1. See also: #line.
227 228 229 230 |
# File 'lib/coderay/scanner.rb', line 227 def column pos = self.pos return 1 if pos <= 0 pos - (binary_string.rindex(?\n, pos - 1) || -1) end |
#each(&block) ⇒ Object
Traverse the tokens.
210 211 212 |
# File 'lib/coderay/scanner.rb', line 210 def each &block tokens.each(&block) end |
#file_extension ⇒ Object
the default file extension for this scanner
178 179 180 |
# File 'lib/coderay/scanner.rb', line 178 def file_extension self.class.file_extension end |
#lang ⇒ Object
the Plugin ID for this scanner
173 174 175 |
# File 'lib/coderay/scanner.rb', line 173 def lang self.class.lang end |
#line(pos = self.pos) ⇒ Object
The current line position of the scanner, starting with 1. See also: #column.
Beware, this is implemented inefficiently. It should be used for debugging only.
220 221 222 223 |
# File 'lib/coderay/scanner.rb', line 220 def line pos = self.pos return 1 if pos <= 0 binary_string[0...pos].count("\n") + 1 end |
#reset ⇒ Object
Sets back the scanner. Subclasses should redefine the reset_instance method instead of this one.
160 161 162 163 |
# File 'lib/coderay/scanner.rb', line 160 def reset super reset_instance end |
#string=(code) ⇒ Object
Set a new string to be scanned.
166 167 168 169 170 |
# File 'lib/coderay/scanner.rb', line 166 def string= code code = self.class.normalize(code) super code reset_instance end |
#tokenize(source = nil, options = {}) ⇒ Object
Scan the code and returns all tokens in a Tokens object.
183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 |
# File 'lib/coderay/scanner.rb', line 183 def tokenize source = nil, = {} = @options.merge() set_string_from_source source begin scan_tokens @tokens, rescue => e = "Error in %s#scan_tokens, initial state was: %p" % [self.class, defined?(state) && state] raise_inspect e., @tokens, , 30, e.backtrace end @cached_tokens = @tokens if source.is_a? Array @tokens.split_into_parts(*source.map { |part| part.size }) else @tokens end end |
#tokens ⇒ Object
Cache the result of tokenize.
205 206 207 |
# File 'lib/coderay/scanner.rb', line 205 def tokens @cached_tokens ||= tokenize end |