Class: StyleScript::Rewriter

Inherits:

Object

Object
StyleScript::Rewriter

show all

Defined in:: lib/style_script/rewriter.rb

Overview

In order to keep the grammar simple, the stream of tokens that the Lexer emits is rewritten by the Rewriter, smoothing out ambiguities, mis-nested indentation, and single-line flavors of expressions.

Constant Summary collapse

BALANCED_PAIRS = Tokens that must be balanced.

[['(', ')'], ['[', ']'], ['{', '}'], [:INDENT, :OUTDENT],
[:PARAM_START, :PARAM_END], [:CALL_START, :CALL_END], [:INDEX_START, :INDEX_END]]

EXPRESSION_START = Tokens that signal the start of a balanced pair.

BALANCED_PAIRS.map {|pair| pair.first }

EXPRESSION_TAIL = Tokens that signal the end of a balanced pair.

BALANCED_PAIRS.map {|pair| pair.last }

EXPRESSION_CLOSE = Tokens that indicate the close of a clause of an expression.

[:CATCH, :WHEN, :ELSE, :FINALLY] + EXPRESSION_TAIL

IMPLICIT_FUNC = Tokens pairs that, in immediate succession, indicate an implicit call.

[:IDENTIFIER, :SUPER, ')', :CALL_END, ']', :INDEX_END]

IMPLICIT_END =

[:IF, :UNLESS, :FOR, :WHILE, "\n", :OUTDENT]

IMPLICIT_CALL =

[:IDENTIFIER, :NUMBER, :STRING, :JS, :REGEX, :NEW, :PARAM_START,
:TRY, :DELETE, :TYPEOF, :SWITCH,
:TRUE, :FALSE, :YES, :NO, :ON, :OFF, '!', '!!', :NOT,
'@', ':>', '=>', '[', '(', '{']

INVERSES = The inverse mappings of token pairs we’re trying to fix up.

BALANCED_PAIRS.inject({}) do |memo, pair|
  memo[pair.first] = pair.last
  memo[pair.last]  = pair.first
  memo
end

SINGLE_LINERS = Single-line flavors of block expressions that have unclosed endings. The grammar can’t disambiguate them, so we insert the implicit indentation.

[:ELSE, ":>", "=>", :TRY, :FINALLY, :THEN]

SINGLE_CLOSERS =

["\n", :CATCH, :FINALLY, :ELSE, :OUTDENT, :LEADING_WHEN, :PARAM_START]

Instance Method Summary collapse

#add_implicit_indentation ⇒ Object

Because our grammar is LALR(1), it can’t handle some single-line expressions that lack ending delimiters.
#add_implicit_parentheses ⇒ Object

Methods may be optionally called without parentheses, for simple cases.
#adjust_comments ⇒ Object

Massage newlines and indentations so that comments don’t have to be correctly indented, or appear on their own line.
#close_open_calls_and_indexes ⇒ Object

We’ve tagged the opening parenthesis of a method call, and the opening bracket of an indexing operation.
#ensure_balance(*pairs) ⇒ Object

Ensure that all listed pairs of tokens are correctly balanced throughout the course of the token stream.
#move_commas_outside_outdents ⇒ Object

Make sure that we don’t accidentally break trailing commas, which need to go on the outside of expression closers.
#remove_leading_newlines ⇒ Object

Leading newlines would introduce an ambiguity in the grammar, so we dispatch them here.
#remove_mid_expression_newlines ⇒ Object

Some blocks occur in the middle of expressions – when we’re expecting this, remove their trailing newlines.
#rewrite(tokens) ⇒ Object

Rewrite the token stream in multiple passes, one logical filter at a time.
#rewrite_closing_parens ⇒ Object

We’d like to support syntax like this: el.click((event) :> el.hide()) In order to accomplish this, move outdents that follow closing parens inwards, safely.
#scan_tokens ⇒ Object

Rewrite the token stream, looking one token ahead and behind.

Instance Method Details

#add_implicit_indentation ⇒ `Object`

Because our grammar is LALR(1), it can’t handle some single-line expressions that lack ending delimiters. Use the lexer to add the implicit blocks, so it doesn’t need to. ‘)’ can close a single-line block, but we need to make sure it’s balanced.

# File 'lib/style_script/rewriter.rb', line 182

def add_implicit_indentation
  scan_tokens do |prev, token, post, i|
    next 1 unless SINGLE_LINERS.include?(token[0]) && post[0] != :INDENT &&
      !(token[0] == :ELSE && post[0] == :IF) # Elsifs shouldn't get blocks.
    starter = token[0]
    line = token[1].line
    @tokens.insert(i + 1, [:INDENT, Value.new(2, line)])
    idx = i + 1
    parens = 0
    loop do
      idx += 1
      tok = @tokens[idx]
      if (!tok || SINGLE_CLOSERS.include?(tok[0]) ||
          (tok[0] == ')' && parens == 0)) &&
          !(starter == :ELSE && tok[0] == :ELSE)
        insertion = @tokens[idx - 1][0] == "," ? idx - 1 : idx
        @tokens.insert(insertion, [:OUTDENT, Value.new(2, line)])
        break
      end
      parens += 1 if tok[0] == '('
      parens -= 1 if tok[0] == ')'
    end
    next 1 unless token[0] == :THEN
    @tokens.delete_at(i)
    next 0
  end
end

#add_implicit_parentheses ⇒ `Object`

Methods may be optionally called without parentheses, for simple cases. Insert the implicit parentheses here, so that the parser doesn’t have to deal with them.

# File 'lib/style_script/rewriter.rb', line 157

def add_implicit_parentheses
  stack = [0]
  scan_tokens do |prev, token, post, i|
    stack.push(0) if token[0] == :INDENT
    if token[0] == :OUTDENT
      last = stack.pop
      stack[-1] += last
    end
    if stack.last > 0 && (IMPLICIT_END.include?(token[0]) || post.nil?)
      idx = token[0] == :OUTDENT ? i + 1 : i
      stack.last.times { @tokens.insert(idx, [:CALL_END, Value.new(')', token[1].line)]) }
      size, stack[-1] = stack[-1] + 1, 0
      next size
    end
    next 1 unless IMPLICIT_FUNC.include?(prev[0]) && IMPLICIT_CALL.include?(token[0])
    @tokens.insert(i, [:CALL_START, Value.new('(', token[1].line)])
    stack[-1] += 1
    next 2
  end
end

#adjust_comments ⇒ `Object`

Massage newlines and indentations so that comments don’t have to be correctly indented, or appear on their own line.

# File 'lib/style_script/rewriter.rb', line 73

def adjust_comments
  scan_tokens do |prev, token, post, i|
    next 1 unless token[0] == :COMMENT
    before, after = @tokens[i - 2], @tokens[i + 2]
    if before && after &&
        ((before[0] == :INDENT && after[0] == :OUTDENT) ||
        (before[0] == :OUTDENT && after[0] == :INDENT)) &&
        before[1] == after[1]
      @tokens.delete_at(i + 2)
      @tokens.delete_at(i - 2)
      next 0
    elsif prev[0] == "\n" && [:INDENT].include?(after[0])
      @tokens.delete_at(i + 2)
      @tokens[i - 1] = after
      next 1
    elsif !["\n", :INDENT, :OUTDENT].include?(prev[0])
      @tokens.insert(i, ["\n", Value.new("\n", token[1].line)])
      next 2
    else
      next 1
    end
  end
end

#close_open_calls_and_indexes ⇒ `Object`

We’ve tagged the opening parenthesis of a method call, and the opening bracket of an indexing operation. Match them with their close.

# File 'lib/style_script/rewriter.rb', line 127

def close_open_calls_and_indexes
  parens, brackets = [0], [0]
  scan_tokens do |prev, token, post, i|
    case token[0]
    when :CALL_START  then parens.push(0)
    when :INDEX_START then brackets.push(0)
    when '('          then parens[-1] += 1
    when '['          then brackets[-1] += 1
    when ')'
      if parens.last == 0
        parens.pop
        token[0] = :CALL_END
      else
        parens[-1] -= 1
      end
    when ']'
      if brackets.last == 0
        brackets.pop
        token[0] = :INDEX_END
      else
        brackets[-1] -= 1
      end
    end
    next 1
  end
end

#ensure_balance(*pairs) ⇒ `Object`

Ensure that all listed pairs of tokens are correctly balanced throughout the course of the token stream.

Raises:

(ParseError)

# File 'lib/style_script/rewriter.rb', line 212

def ensure_balance(*pairs)
  puts "\nbefore ensure_balance: #{@tokens.inspect}" if ENV['VERBOSE']
  levels, lines = Hash.new(0), Hash.new
  scan_tokens do |prev, token, post, i|
    pairs.each do |pair|
      open, close = *pair
      levels[open] += 1 if token[0] == open
      levels[open] -= 1 if token[0] == close
      lines[token[0]] = token[1].line
      raise ParseError.new(token[0], token[1], nil) if levels[open] < 0
    end
    next 1
  end
  unclosed = levels.detect {|k, v| v > 0 }
  sym = unclosed && unclosed[0]
  raise ParseError.new(sym, Value.new(sym, lines[sym]), nil, "unclosed '#{sym}'") if unclosed
end

#move_commas_outside_outdents ⇒ `Object`

Make sure that we don’t accidentally break trailing commas, which need to go on the outside of expression closers.

# File 'lib/style_script/rewriter.rb', line 115

def move_commas_outside_outdents
  scan_tokens do |prev, token, post, i|
    if token[0] == :OUTDENT && prev[0] == ','
      @tokens.delete_at(i)
      @tokens.insert(i - 1, token)
    end
    next 1
  end
end

#remove_leading_newlines ⇒ `Object`

Leading newlines would introduce an ambiguity in the grammar, so we dispatch them here.



99
100
101

# File 'lib/style_script/rewriter.rb', line 99

def remove_leading_newlines
  @tokens.shift if @tokens[0][0] == "\n"
end

#remove_mid_expression_newlines ⇒ `Object`

Some blocks occur in the middle of expressions – when we’re expecting this, remove their trailing newlines.

# File 'lib/style_script/rewriter.rb', line 105

def remove_mid_expression_newlines
  scan_tokens do |prev, token, post, i|
    next 1 unless post && EXPRESSION_CLOSE.include?(post[0]) && token[0] == "\n"
    @tokens.delete_at(i)
    next 0
  end
end

#rewrite(tokens) ⇒ `Object`

Rewrite the token stream in multiple passes, one logical filter at a time. This could certainly be changed into a single pass through the stream, with a big ol’ efficient switch, but it’s much nicer like this.

# File 'lib/style_script/rewriter.rb', line 44

def rewrite(tokens)
  @tokens = tokens
  adjust_comments
  remove_leading_newlines
  remove_mid_expression_newlines
  move_commas_outside_outdents
  close_open_calls_and_indexes
  add_implicit_parentheses
  add_implicit_indentation
  ensure_balance(*BALANCED_PAIRS)
  rewrite_closing_parens
  @tokens
end

#rewrite_closing_parens ⇒ `Object`

We’d like to support syntax like this:

el.click((event) :>
  el.hide())

In order to accomplish this, move outdents that follow closing parens inwards, safely. The steps to accomplish this are:

Check that all paired tokens are balanced and in order.
Rewrite the stream with a stack: if you see an ‘(’ or INDENT, add it to the stack. If you see an ‘)’ or OUTDENT, pop the stack and replace it with the inverse of what we’ve just popped.
Keep track of “debt” for tokens that we fake, to make sure we end up balanced in the end.

# File 'lib/style_script/rewriter.rb', line 243

def rewrite_closing_parens
  verbose = ENV['VERBOSE']
  stack, debt = [], Hash.new(0)
  stack_stats = lambda { "stack: #{stack.inspect} debt: #{debt.inspect}\n\n" }
  puts "rewrite_closing_original: #{@tokens.inspect}" if verbose
  scan_tokens do |prev, token, post, i|
    tag, inv = token[0], INVERSES[token[0]]
    # Push openers onto the stack.
    if EXPRESSION_START.include?(tag)
      stack.push(token)
      puts "pushing #{tag} #{stack_stats[]}" if verbose
      next 1
    # The end of an expression, check stack and debt for a pair.
    elsif EXPRESSION_TAIL.include?(tag)
      puts @tokens[i..-1].inspect if verbose
      # If the tag is already in our debt, swallow it.
      if debt[inv] > 0
        debt[inv] -= 1
        @tokens.delete_at(i)
        puts "tag in debt #{tag} #{stack_stats[]}" if verbose
        next 0
      else
        # Pop the stack of open delimiters.
        match = stack.pop
        mtag  = match[0]
        # Continue onwards if it's the expected tag.
        if tag == INVERSES[mtag]
          puts "expected tag #{tag} #{stack_stats[]}" if verbose
          next 1
        else
          # Unexpected close, insert correct close, adding to the debt.
          debt[mtag] += 1
          puts "unexpected #{tag}, replacing with #{INVERSES[mtag]} #{stack_stats[]}" if verbose
          val = mtag == :INDENT ? match[1] : INVERSES[mtag]
          @tokens.insert(i, [INVERSES[mtag], Value.new(val, token[1].line)])
          next 1
        end
      end
    else
      # Uninteresting token:
      next 1
    end
  end
end

#scan_tokens ⇒ `Object`

Rewrite the token stream, looking one token ahead and behind. Allow the return value of the block to tell us how many tokens to move forwards (or backwards) in the stream, to make sure we don’t miss anything as the stream changes length under our feet.

# File 'lib/style_script/rewriter.rb', line 62

def scan_tokens
  i = 0
  loop do
    break unless @tokens[i]
    move = yield(@tokens[i - 1], @tokens[i], @tokens[i + 1], i)
    i += move
  end
end

Class: StyleScript::Rewriter

Overview

Constant Summary collapse

Instance Method Summary collapse

Instance Method Details

#add_implicit_indentation ⇒ Object

#add_implicit_parentheses ⇒ Object

#adjust_comments ⇒ Object

#close_open_calls_and_indexes ⇒ Object

#ensure_balance(*pairs) ⇒ Object

#move_commas_outside_outdents ⇒ Object

#remove_leading_newlines ⇒ Object

#remove_mid_expression_newlines ⇒ Object

#rewrite(tokens) ⇒ Object

#rewrite_closing_parens ⇒ Object

#scan_tokens ⇒ Object