Class: ORB::Tokenizer2

Inherits:
Object
  • Object
show all
Includes:
Patterns
Defined in:
lib/orb/tokenizer2.rb

Overview

Tokenizer2 is a streaming, non-recursive tokenizer for ORB templates.

It scans the source sequentially and emits tokens as it passes over the input. During scanning, it keeps track of the current state and the list of tokens. Any consumption of the source, either by buffering or skipping moves the cursor. The cursor position is used to keep track of the current line and column in the virtual source document. When tokens are generated, they are annotated with the position they were found in the virtual document.

Constant Summary collapse

IGNORED_BODY_TAGS =

Tags that should be ignored

%w[script style].freeze
VOID_ELEMENTS =

Tags that are self-closing by HTML5 spec

%w[area base br col command embed hr img input keygen link meta param source track wbr].freeze

Constants included from Patterns

Patterns::ATTRIBUTE_ASSIGN, Patterns::ATTRIBUTE_NAME, Patterns::BLANK, Patterns::BLOCK_CLOSE, Patterns::BLOCK_NAME_CHARS, Patterns::BLOCK_OPEN, Patterns::BRACE_CLOSE, Patterns::BRACE_OPEN, Patterns::CONTROL_EXPRESSION_END, Patterns::CONTROL_EXPRESSION_START, Patterns::CR, Patterns::CRLF, Patterns::DOUBLE_QUOTE, Patterns::END_TAG_END, Patterns::END_TAG_END_VERBATIM, Patterns::END_TAG_START, Patterns::NEWLINE, Patterns::OTHER, Patterns::PRINTING_EXPRESSION_END, Patterns::PRINTING_EXPRESSION_START, Patterns::PRIVATE_COMMENT_END, Patterns::PRIVATE_COMMENT_START, Patterns::PUBLIC_COMMENT_END, Patterns::PUBLIC_COMMENT_START, Patterns::SINGLE_QUOTE, Patterns::SPACE_CHARS, Patterns::SPLAT_START, Patterns::START_TAG_END, Patterns::START_TAG_END_SELF_CLOSING, Patterns::START_TAG_END_VERBATIM, Patterns::START_TAG_START, Patterns::TAG_NAME, Patterns::UNQUOTED_VALUE, Patterns::UNQUOTED_VALUE_INVALID_CHARS

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(source, options = {}) ⇒ Tokenizer2

Returns a new instance of Tokenizer2.



26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# File 'lib/orb/tokenizer2.rb', line 26

def initialize(source, options = {})
  @source = StringScanner.new(source)
  @raise_errors = options.fetch(:raise_errors, true)

  # Streaming Tokenizer State
  @cursor = 0
  @column = 1
  @line = 1
  @errors = []
  @tokens = []
  @attributes = []
  @braces = []
  @state = :initial
  @buffer = StringIO.new
end

Instance Attribute Details

#errorsObject (readonly)

Returns the value of attribute errors.



24
25
26
# File 'lib/orb/tokenizer2.rb', line 24

def errors
  @errors
end

#tokensObject (readonly)

Returns the value of attribute tokens.



24
25
26
# File 'lib/orb/tokenizer2.rb', line 24

def tokens
  @tokens
end

Instance Method Details

#tokenizeObject Also known as: tokenize!

Main Entry



43
44
45
46
47
48
49
50
51
# File 'lib/orb/tokenizer2.rb', line 43

def tokenize
  next_token until @source.eos?

  # Consume remaining buffer
  buffer_to_text_token

  # Return the tokens
  @tokens
end