Class: AsciiMath::Tokenizer

Inherits:
Object
  • Object
show all
Defined in:
lib/asciimath/parser.rb

Overview

Internal: Splits an ASCIIMath expression into a sequence of tokens. Each token is represented as a Hash containing the keys :value and :type. The :value key is used to store the text associated with each token. The :type key indicates the semantics of the token. The value for :type will be one of the following symbols:

  • :symbol a symbolic name or a bit of text without any further semantics

  • :text a bit of arbitrary text

  • :number a number

  • :operator a mathematical operator symbol

  • :unary a unary operator (e.g., sqrt, text, …)

  • :infix an infix operator (e.g, /, _, ^, …)

  • :binary a binary operator (e.g., frac, root, …)

  • :eof indicates no more tokens are available

Constant Summary collapse

WHITESPACE =
/\s+/
NUMBER =
/[0-9]+(?:\.[0-9]+)?/
QUOTED_TEXT =
/"[^"]*"/
TEX_TEXT =
/text\([^)]*\)/

Instance Method Summary collapse

Constructor Details

#initialize(string, symbols) ⇒ Tokenizer

Public: Initializes an ASCIIMath tokenizer.

string - The ASCIIMath expression to tokenize symbols - The symbol table to use while tokenizing



58
59
60
61
62
63
64
# File 'lib/asciimath/parser.rb', line 58

def initialize(string, symbols)
  @string = StringScanner.new(string)
  @symbols = symbols
  lookahead = @symbols.keys.map { |k| k.length }.max
  @symbol_regexp = /((?:\\[\s0-9]|[^\s0-9]){1,#{lookahead}})/
  @push_back = nil
end

Instance Method Details

#next_tokenObject

Public: Read the next token from the ASCIIMath expression and move the tokenizer ahead by one token.

Returns the next token as a Hash



70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
# File 'lib/asciimath/parser.rb', line 70

def next_token
  if @push_back
    t = @push_back
    @push_back = nil
    return t
  end

  @string.scan(WHITESPACE)

  return {:value => nil, :type => :eof} if @string.eos?

  case @string.peek(1)
    when '"'
      read_quoted_text
    when 't'
      case @string.peek(5)
        when 'text('
          read_tex_text
        else
          read_symbol
      end
    when '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
      read_number || read_symbol
    else
      read_symbol
  end
end

#push_back(token) ⇒ Object

Public: Pushes the given token back to the tokenizer. A subsequent call to next_token will return the given token rather than generating a new one. At most one token can be pushed back.

token - The token to push back



103
104
105
# File 'lib/asciimath/parser.rb', line 103

def push_back(token)
  @push_back = token unless token[:type] == :eof
end