Class: ORB::Tokenizer2
- Inherits:
-
Object
- Object
- ORB::Tokenizer2
- Includes:
- Patterns
- Defined in:
- lib/orb/tokenizer2.rb
Overview
Tokenizer2 is a streaming, non-recursive tokenizer for ORB templates.
It scans the source sequentially and emits tokens as it passes over the input. During scanning, it keeps track of the current state and the list of tokens. Any consumption of the source, either by buffering or skipping moves the cursor. The cursor position is used to keep track of the current line and column in the virtual source document. When tokens are generated, they are annotated with the position they were found in the virtual document.
Constant Summary collapse
- IGNORED_BODY_TAGS =
Tags that should be ignored
%w[script style].freeze
- VOID_ELEMENTS =
Tags that are self-closing by HTML5 spec
%w[area base br col command embed hr img input keygen link meta param source track wbr].freeze
Constants included from Patterns
Patterns::ATTRIBUTE_ASSIGN, Patterns::ATTRIBUTE_NAME, Patterns::BLANK, Patterns::BLOCK_CLOSE, Patterns::BLOCK_NAME_CHARS, Patterns::BLOCK_OPEN, Patterns::BRACE_CLOSE, Patterns::BRACE_OPEN, Patterns::CONTROL_EXPRESSION_END, Patterns::CONTROL_EXPRESSION_START, Patterns::CR, Patterns::CRLF, Patterns::DOUBLE_QUOTE, Patterns::END_TAG_END, Patterns::END_TAG_END_VERBATIM, Patterns::END_TAG_START, Patterns::NEWLINE, Patterns::OTHER, Patterns::PRINTING_EXPRESSION_END, Patterns::PRINTING_EXPRESSION_START, Patterns::PRIVATE_COMMENT_END, Patterns::PRIVATE_COMMENT_START, Patterns::PUBLIC_COMMENT_END, Patterns::PUBLIC_COMMENT_START, Patterns::SINGLE_QUOTE, Patterns::SPACE_CHARS, Patterns::SPLAT_START, Patterns::START_TAG_END, Patterns::START_TAG_END_SELF_CLOSING, Patterns::START_TAG_END_VERBATIM, Patterns::START_TAG_START, Patterns::TAG_NAME, Patterns::UNQUOTED_VALUE, Patterns::UNQUOTED_VALUE_INVALID_CHARS
Instance Attribute Summary collapse
-
#errors ⇒ Object
readonly
Returns the value of attribute errors.
-
#tokens ⇒ Object
readonly
Returns the value of attribute tokens.
Instance Method Summary collapse
-
#initialize(source, options = {}) ⇒ Tokenizer2
constructor
A new instance of Tokenizer2.
-
#tokenize ⇒ Object
(also: #tokenize!)
Main Entry.
Constructor Details
#initialize(source, options = {}) ⇒ Tokenizer2
Returns a new instance of Tokenizer2.
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
# File 'lib/orb/tokenizer2.rb', line 26 def initialize(source, = {}) @source = StringScanner.new(source) @raise_errors = .fetch(:raise_errors, true) # Streaming Tokenizer State @cursor = 0 @column = 1 @line = 1 @errors = [] @tokens = [] @attributes = [] @braces = [] @state = :initial @buffer = StringIO.new end |
Instance Attribute Details
#errors ⇒ Object (readonly)
Returns the value of attribute errors.
24 25 26 |
# File 'lib/orb/tokenizer2.rb', line 24 def errors @errors end |
#tokens ⇒ Object (readonly)
Returns the value of attribute tokens.
24 25 26 |
# File 'lib/orb/tokenizer2.rb', line 24 def tokens @tokens end |
Instance Method Details
#tokenize ⇒ Object Also known as: tokenize!
Main Entry
43 44 45 46 47 48 49 50 51 |
# File 'lib/orb/tokenizer2.rb', line 43 def tokenize next_token until @source.eos? # Consume remaining buffer buffer_to_text_token # Return the tokens @tokens end |