Class: EBNF::LL1::Lexer
- Inherits:
-
Object
- Object
- EBNF::LL1::Lexer
- Includes:
- Unescape, Enumerable
- Defined in:
- lib/ebnf/ll1/lexer.rb
Overview
A lexical analyzer
Defined Under Namespace
Classes: Error, Terminal, Token
Constant Summary
Constants included from Unescape
Unescape::ECHAR, Unescape::ESCAPE_CHAR4, Unescape::ESCAPE_CHAR8, Unescape::ESCAPE_CHARS, Unescape::UCHAR
Instance Attribute Summary collapse
-
#input ⇒ String
The current input string being processed.
-
#options ⇒ Hash
readonly
Any additional options for the lexer.
-
#whitespace ⇒ Regexp
readonly
Defines whitespace, including comments, otherwise whitespace must be explicit in terminals.
Class Method Summary collapse
-
.tokenize(input, terminals, **options) {|lexer| ... } ⇒ Lexer
Tokenizes the given ‘input` string or stream.
Instance Method Summary collapse
-
#each_token {|token| ... } ⇒ Enumerator
(also: #each)
Enumerates each token in the input string.
-
#first(*types) ⇒ Token
Returns first token in input stream.
-
#initialize(input = nil, terminals = nil, **options) ⇒ Lexer
constructor
Initializes a new lexer instance.
-
#lineno ⇒ Integer
The current line number (one-based).
-
#recover(*types) ⇒ Token
Skip input until a token is matched.
-
#shift ⇒ Token
Returns first token and shifts to next.
-
#valid? ⇒ Boolean
Returns ‘true` if the input string is lexically valid.
Methods included from Unescape
unescape, unescape_codepoints, unescape_string
Constructor Details
#initialize(input = nil, terminals = nil, **options) ⇒ Lexer
Initializes a new lexer instance.
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
# File 'lib/ebnf/ll1/lexer.rb', line 70 def initialize(input = nil, terminals = nil, **) @options = .dup @whitespace = @options[:whitespace] @terminals = terminals.map do |term| if term.is_a?(Array) && term.length ==3 # Last element is options Terminal.new(term[0], term[1], **term[2]) elsif term.is_a?(Array) Terminal.new(*term) else term end end raise Error, "Terminal patterns not defined" unless @terminals && @terminals.length > 0 @scanner = Scanner.new(input, **) end |
Instance Attribute Details
#input ⇒ String
The current input string being processed.
99 100 101 |
# File 'lib/ebnf/ll1/lexer.rb', line 99 def input @input end |
#options ⇒ Hash (readonly)
Any additional options for the lexer.
93 94 95 |
# File 'lib/ebnf/ll1/lexer.rb', line 93 def @options end |
#whitespace ⇒ Regexp (readonly)
Returns defines whitespace, including comments, otherwise whitespace must be explicit in terminals.
39 40 41 |
# File 'lib/ebnf/ll1/lexer.rb', line 39 def whitespace @whitespace end |
Class Method Details
.tokenize(input, terminals, **options) {|lexer| ... } ⇒ Lexer
Tokenizes the given ‘input` string or stream.
53 54 55 56 |
# File 'lib/ebnf/ll1/lexer.rb', line 53 def self.tokenize(input, terminals, **, &block) lexer = self.new(input, terminals, **) block_given? ? block.call(lexer) : lexer end |
Instance Method Details
#each_token {|token| ... } ⇒ Enumerator Also known as: each
Enumerates each token in the input string.
122 123 124 125 126 127 128 129 |
# File 'lib/ebnf/ll1/lexer.rb', line 122 def each_token(&block) if block_given? while token = shift yield token end end enum_for(:each_token) end |
#first(*types) ⇒ Token
Returns first token in input stream
137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 |
# File 'lib/ebnf/ll1/lexer.rb', line 137 def first(*types) return nil unless scanner @first ||= begin {} while !scanner.eos? && skip_whitespace return nil if scanner.eos? token = match_token(*types) if token.nil? lexme = (scanner.rest.split(@whitespace || /\s/).first rescue nil) || scanner.rest raise Error.new("Invalid token #{lexme[0..100].inspect}", input: scanner.rest[0..100], token: lexme, lineno: lineno) end token end rescue ArgumentError, Encoding::CompatibilityError => e raise Error.new(e., input: (scanner.rest[0..100] rescue '??'), token: lexme, lineno: lineno) rescue Error raise rescue STDERR.puts "Expected ArgumentError, got #{$!.class}" raise end |
#lineno ⇒ Integer
The current line number (one-based).
196 197 198 |
# File 'lib/ebnf/ll1/lexer.rb', line 196 def lineno scanner.lineno end |
#recover(*types) ⇒ Token
Skip input until a token is matched
179 180 181 182 183 184 185 186 187 188 189 190 |
# File 'lib/ebnf/ll1/lexer.rb', line 179 def recover(*types) until scanner.eos? || tok = match_token(*types) if scanner.skip_until(@whitespace || /\s+/m).nil? # Skip past current "token" # No whitespace at the end, must be and end of string scanner.terminate else skip_whitespace end end scanner.unscan if tok first end |
#shift ⇒ Token
Returns first token and shifts to next
168 169 170 171 172 |
# File 'lib/ebnf/ll1/lexer.rb', line 168 def shift cur = first @first = nil cur end |
#valid? ⇒ Boolean
Returns ‘true` if the input string is lexically valid.
To be considered valid, the input string must contain more than zero terminals, and must not contain any invalid terminals.
108 109 110 111 112 113 114 |
# File 'lib/ebnf/ll1/lexer.rb', line 108 def valid? begin !count.zero? rescue Error false end end |