Class: Mani::Tokenizer

Inherits:
Object
  • Object
show all
Defined in:
lib/mani/tokenizer.rb

Overview

This class contains methods to handle the tokenization of strings.

Constant Summary collapse

ESCAPE_CHARACTER =

The escape character

'%'
SEQUENCE_OPEN_DELIMITER =

The delimiter signifying the start of a sequence

'{{'
SEQUENCE_CLOSE_DELIMITER =

The delimiter signifying the end of a sequence

'}}'
LITERAL_OPEN_DELIMITER =

The delimiter signifying an “open sequence” escape sequence

ESCAPE_CHARACTER + SEQUENCE_OPEN_DELIMITER
LITERAL_CLOSE_DELIMITER =

The delimiter signifying a “close sequence” escape sequence

ESCAPE_CHARACTER + SEQUENCE_CLOSE_DELIMITER
SEQUENCE_OPEN =

The pattern to match the start of a sequence

/
  # find opening delimiter at beginning of string...
  ^#{SEQUENCE_OPEN_DELIMITER}
  # ...or elsewhere in the string, provided it's not preceded by
  # ESCAPE_CHARACTER
  |[^#{ESCAPE_CHARACTER}]#{SEQUENCE_OPEN_DELIMITER}
/x
SEQUENCE_CLOSE =

The pattern to match the end of a sequence

/
  # find closing delimiter at beginning of string...
  ^#{SEQUENCE_CLOSE_DELIMITER}
  # ...or elsewhere in the string, provided it's not preceded by
  # ESCAPE_CHARACTER
  |[^#{ESCAPE_CHARACTER}]#{SEQUENCE_CLOSE_DELIMITER}
/x

Class Method Summary collapse

Class Method Details

.get_tokens(text) ⇒ Array

Retrieves the tokens comprising the supplied text.

Parameters:

  • text (String)

    The text

Returns:

  • (Array)


43
44
45
# File 'lib/mani/tokenizer.rb', line 43

def self.get_tokens(text)
  tokenize StringScanner.new(text), []
end

.strip_comment_delimiters(text) ⇒ String

Strips the comment delimiters from the supplied text.

Parameters:

  • text (String)

    The text

Returns:

  • (String)


51
52
53
54
55
# File 'lib/mani/tokenizer.rb', line 51

def self.strip_comment_delimiters(text)
  text
    .gsub(LITERAL_OPEN_DELIMITER, SEQUENCE_OPEN_DELIMITER)
    .gsub(LITERAL_CLOSE_DELIMITER, SEQUENCE_CLOSE_DELIMITER)
end

.tokenize(scanner, tokens) ⇒ Array

Recursively scans the string within the supplied scanner to produce a list of tokens.

Parameters:

  • scanner (StringScanner)

    The string scanner

  • tokens (Array)

    The tokens

Returns:

  • (Array)


63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
# File 'lib/mani/tokenizer.rb', line 63

def self.tokenize(scanner, tokens)
  match = scanner.scan_until SEQUENCE_OPEN
  unless match
    static = strip_comment_delimiters scanner.rest
    tokens.concat [[:static, static]] unless static.empty?
    return tokens
  end

  if scanner.check_until SEQUENCE_CLOSE
    static = strip_comment_delimiters match.chomp(SEQUENCE_OPEN_DELIMITER)
    tokens.concat [[:static, static]] unless static.empty?

    match = scanner.scan_until SEQUENCE_CLOSE
    match.chomp! SEQUENCE_CLOSE_DELIMITER

    sequence = strip_comment_delimiters match
    tokens.concat [[:sequence, sequence]] unless sequence.empty?

    tokenize scanner, tokens
  else
    static = strip_comment_delimiters(match + scanner.rest)
    tokens.concat [[:static, static]]
  end
end