Class: AsciiMath::Tokenizer

Inherits:
Object
  • Object
show all
Defined in:
lib/asciimath/parser.rb

Overview

Internal: Splits an ASCIIMath expression into a sequence of tokens. Each token is represented as a Hash containing the keys :value and :type. The :value key is used to store the text associated with each token. The :type key indicates the semantics of the token. The value for :type will be one of the following symbols:

  • :identifier a symbolic name or a bit of text without any further semantics

  • :text a bit of arbitrary text

  • :number a number

  • :operator a mathematical operator symbol

  • :unary a unary operator (e.g., sqrt, text, …)

  • :font a unary font command (e.g., bb, cc, …)

  • :infix an infix operator (e.g, /, _, ^, …)

  • :binary a binary operator (e.g., frac, root, …)

  • :accent an accent character

  • :eof indicates no more tokens are available

Each token type may also have an :underover modifier. When present and set to true sub- and superscript expressions associated with the token will be rendered as under- and overscriptabove and below rather than as sub- or superscript.

:accent tokens additionally have a :postion value which is set to either :over or :under. This determines if the accent should be rendered over or under the expression to which it applies.

Constant Summary collapse

WHITESPACE =
/^\s+/
NUMBER =
/-?[0-9]+(?:\.[0-9]+)?/
QUOTED_TEXT =
/"[^"]*"/
TEX_TEXT =
/text\([^)]*\)/

Instance Method Summary collapse

Constructor Details

#initialize(string, symbols) ⇒ Tokenizer

Public: Initializes an ASCIIMath tokenizer.

string - The ASCIIMath expression to tokenize symbols - The symbol table to use while tokenizing



65
66
67
68
69
70
71
# File 'lib/asciimath/parser.rb', line 65

def initialize(string, symbols)
  @string = StringScanner.new(string)
  @symbols = symbols
  lookahead = @symbols.keys.map { |k| k.length }.max
  @symbol_regexp = /([^\s0-9]{1,#{lookahead}})/
  @push_back = nil
end

Instance Method Details

#next_tokenObject

Public: Read the next token from the ASCIIMath expression and move the tokenizer ahead by one token.

Returns the next token as a Hash



77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
# File 'lib/asciimath/parser.rb', line 77

def next_token
  if @push_back
    t = @push_back
    @push_back = nil
    return t
  end

  @string.scan(WHITESPACE)

  return {:value => nil, :type => :eof} if @string.eos?

  case @string.peek(1)
  when '"'
    read_quoted_text
  when 't'
    case @string.peek(5)
    when 'text('
      read_tex_text
    else
      read_symbol
    end
  when '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
    read_number || read_symbol
  else
    read_symbol
  end
end

#push_back(token) ⇒ Object

Public: Pushes the given token back to the tokenizer. A subsequent call to next_token will return the given token rather than generating a new one. At most one token can be pushed back.

token - The token to push back



110
111
112
# File 'lib/asciimath/parser.rb', line 110

def push_back(token)
  @push_back = token unless token[:type] == :eof
end