Module: Polytexnic::Preprocessor::Polytex

Includes:: Literal

Included in:: Polytexnic::Preprocessor

Defined in:: lib/polytexnic/preprocessors/polytex.rb

Constant Summary

Constants included from Literal

Literal::CODE_INCLUSION_REGEX, Literal::LANG_REGEX

Instance Method Summary collapse

#cache_code_environments ⇒ Object

Caches Markdown code environments.
#cache_image_locations(text, cache) ⇒ Object

Caches the locations of images to be passed through the pipeline.
#cache_latex_literal(markdown, cache) ⇒ Object

Caches literal LaTeX environments.
#cache_math(text, cache) ⇒ Object

Caches math.
#cache_raw_latex(markdown, cache) ⇒ Object

Caches raw LaTeX commands to be passed through the pipeline.
#convert_code_inclusion(text, cache) ⇒ Object

Adds support for <<(path/to/code) inclusion.
#convert_includegraphics(text) ⇒ Object

Converts includegraphics to image.
#convert_tt(text) ⇒ Object

Converts … to kode… This effectively converts ‘inline code`, which kramdown sets as inline code, to PolyTeX’s native kode command, which in turns allows inline code to be separately styled.
#restore_hashed_content(text, cache) ⇒ Object

Restores raw code from the cache.
#restore_math(text, cache) ⇒ Object

Restores the Markdown math.
#to_polytex ⇒ Object

Converts Markdown to PolyTeX.

Methods included from Literal

#cache_display_inline_math, #cache_display_math, #cache_inline_math, #cache_literal, #cache_literal_environments, #cache_unicode, #code_salt, #element, #equation_element, #hyperrefs, #literal_types, #math_environments

Instance Method Details

#cache_code_environments ⇒ `Object`

Caches Markdown code environments. Included are indented environments, Leanpub-style indented environments, and GitHub-style code fencing.

# File 'lib/polytexnic/preprocessors/polytex.rb', line 136

def cache_code_environments
  output = []
  lines = @source.split("\n")
  indentation = ' ' * 4
  while (line = lines.shift)
    if line =~ /\{lang="(.*?)"\}/
      language = $1
      code = []
      while (line = lines.shift) && line.match(/^#{indentation}(.*)$/) do
        code << $1
      end
      code = code.join("\n")
      key = digest(code)
      code_cache[key] = [code, language]
      output << key
      output << line
    elsif line =~ /^```\s*$/        # basic code fences
      while (line = lines.shift) && !line.match(/^```\s*$/)
        output << indentation + line
      end
      output << "\n"
    elsif line =~ /^```(\w+)(,\s*options:.*)?$/  # highlighted fences
      language = $1
      options  = $2
      code = []
      while (line = lines.shift) && !line.match(/^```\s*$/) do
        code << line
      end
      code = code.join("\n")
      data = [code, language, false, options]
      key = digest(data.join("--"))
      code_cache[key] = data
      output << key
    else
      output << line
    end
  end
  output.join("\n")
end

#cache_image_locations(text, cache) ⇒ `Object`

Caches the locations of images to be passed through the pipeline. This works around a Kramdown bug, which fails to convert images properly when their location includes a URL.

# File 'lib/polytexnic/preprocessors/polytex.rb', line 115

def cache_image_locations(text, cache)
  # Matches '![Image caption](/path/to/image)'
  text.gsub!(/^\s*(!\[.*?\])\((.*?)\)/) do
    key = digest($2)
    cache[key] = $2
    "\n#{$1}(#{key})"
  end
end

#cache_latex_literal(markdown, cache) ⇒ `Object`

Caches literal LaTeX environments.

# File 'lib/polytexnic/preprocessors/polytex.rb', line 65

def cache_latex_literal(markdown, cache)
  Polytexnic::Literal.literal_types.each do |literal|
    regex = /(\\begin\{#{Regexp.escape(literal)}\}
            .*?
            \\end\{#{Regexp.escape(literal)}\})
            /xm
    markdown.gsub!(regex) do
      key = digest($1)
      cache[key] = $1
      key
    end
  end
end

#cache_math(text, cache) ⇒ `Object`

Caches math. Leanpub uses the notation $$…/$$ for both inline and block math, with the only difference being the presences of newlines:

{$$} x^2 {/$$}  % inline

and

{$$}
x^2             % block
{/$$}

I personally hate this notation and convention, so we also support LaTeX-style ( x ) and [ x^2 - 2 = 0 ] notation.

# File 'lib/polytexnic/preprocessors/polytex.rb', line 202

def cache_math(text, cache)
  text.gsub!(/(?:\{\$\$\}\n(.*?)\n\{\/\$\$\}|\\\[(.*?)\\\])/) do
    key = digest($1 || $2)
    cache[[:block, key]] = $1 || $2
    key
  end
  text.gsub!(/(?:\{\$\$\}(.*?)\{\/\$\$\}|\\\((.*?)\\\))/) do
    key = digest($1 || $2)
    cache[[:inline, key]] = $1 || $2
    key
  end
end

#cache_raw_latex(markdown, cache) ⇒ `Object`

Caches raw LaTeX commands to be passed through the pipeline.

# File 'lib/polytexnic/preprocessors/polytex.rb', line 80

def cache_raw_latex(markdown, cache)
  command_regex = /(
                    ^[ \t]*\\\w+.*\}[ \t]*$ # Command on line with arg
                    |
                    ~\\ref\{.*?\}     # reference with a tie
                    |
                    ~\\eqref\{.*?\}   # eq reference with a tie
                    |
                    \\[^\s]+\{.*?\}   # command with one arg
                    |
                    \\\w+             # normal command
                    |
                    \\-               # hyphenation
                    |
                    \\[ %&$\#@]        # space or special character
                    )
                  /x
  markdown.gsub!(command_regex) do
    content = $1
    puts content.inspect if debug?
    key = digest(content)
    cache[key] = content

    if content =~ /\{table\}|\\caption\{/
      # Pad tables & captions with newlines for kramdown compatibility.
      "\n#{key}\n"
    else
      key
    end
  end
end

#convert_code_inclusion(text, cache) ⇒ `Object`

Adds support for <<(path/to/code) inclusion.

# File 'lib/polytexnic/preprocessors/polytex.rb', line 56

def convert_code_inclusion(text, cache)
  text.gsub!(/^\s*(<<\(.*?\))/) do
    key = digest($1)
    cache[key] = "%= #{$1}"  # reduce to a previously solved case
    key
  end
end

#convert_includegraphics(text) ⇒ `Object`

Converts includegraphics to image. The reason is that raw includegraphics is almost always too wide in the PDF. Instead, we use the custom-defined image command, which is specifically designed to fix this issue.



180
181
182

# File 'lib/polytexnic/preprocessors/polytex.rb', line 180

def convert_includegraphics(text)
  text.gsub!('\includegraphics', '\image')
end

#convert_tt(text) ⇒ `Object`

Converts … to kode… This effectively converts ‘inline code`, which kramdown sets as inline code, to PolyTeX’s native kode command, which in turns allows inline code to be separately styled.



188
189
190

# File 'lib/polytexnic/preprocessors/polytex.rb', line 188

def convert_tt(text)
  text.gsub!(/\{\\tt (.*?)\}/, '\kode{\1}')
end

#restore_hashed_content(text, cache) ⇒ `Object`

Restores raw code from the cache

# File 'lib/polytexnic/preprocessors/polytex.rb', line 125

def restore_hashed_content(text, cache)
  cache.each do |key, value|
    # Because of the way backslashes get interpolated, we need to add
    # some extra ones to cover all the cases of hashed LaTeX.
    text.gsub!(key, value.gsub(/\\/, '\\\\\\'))
  end
end

#restore_math(text, cache) ⇒ `Object`

Restores the Markdown math. This is easy because we’re running everything through our LaTeX pipeline.

# File 'lib/polytexnic/preprocessors/polytex.rb', line 218

def restore_math(text, cache)
  cache.each do |(kind, key), value|
    case kind
    when :inline
      open  = '\('
      close =  '\)'
    when :block
      open  = '\[' + "\n"
      close = "\n" + '\]'
    end
    text.gsub!(key, open + value + close)
  end
end

#to_polytex ⇒ `Object`

Converts Markdown to PolyTeX. We adopt a unified approach: rather than convert “Markdown” (I use the term loosely*) directly to HTML, we convert it to PolyTeX and then run everything through the PolyTeX pipeline. Happily, kramdown comes equipped with a ‘to_latex` method that does most of the heavy lifting. The ouput isn’t as clean as that produced by Pandoc (our previous choice), but it comes with significant advantages: (1) It’s written in Ruby, available as a gem, so its use eliminates an external dependency. (2) It’s the foundation for the “Markdown” interpreter used by Leanpub, so by using it ourselves we ensure greater compatibility with Leanpub books.

<rant>The number of mutually incompatible markup languages going

by the name “Markdown” is truly mind-boggling. Most of them add things to John Gruber’s original Markdown language in an ever-expanding attempt to bolt on the functionality needed to write longer documents. At this point, I fear that “Markdown” has become little more than a marketing term.</rant>

# File 'lib/polytexnic/preprocessors/polytex.rb', line 25

def to_polytex
  require 'kramdown'
  cache = {}
  math_cache = {}
  cleaned_markdown = cache_code_environments
  puts cleaned_markdown if debug?
  cleaned_markdown.tap do |markdown|
    convert_code_inclusion(markdown, cache)
    cache_latex_literal(markdown, cache)
    cache_raw_latex(markdown, cache)
    cache_image_locations(markdown, cache)
    puts markdown if debug?
    cache_math(markdown, math_cache)
  end
  puts cleaned_markdown if debug?
  # Override the header ordering, which starts with 'section' by default.
  lh = 'chapter,section,subsection,subsubsection,paragraph,subparagraph'
  kramdown = Kramdown::Document.new(cleaned_markdown, latex_headers: lh)
  puts kramdown.inspect if debug?
  puts kramdown.to_html if debug?
  puts kramdown.to_latex if debug?
  @source = kramdown.to_latex.tap do |polytex|
              remove_comments(polytex)
              convert_includegraphics(polytex)
              convert_tt(polytex)
              restore_math(polytex, math_cache)
              restore_hashed_content(polytex, cache)
            end
end

Module: Polytexnic::Preprocessor::Polytex

Constant Summary

Constants included from Literal

Instance Method Summary collapse

Methods included from Literal

Instance Method Details

#cache_code_environments ⇒ Object

#cache_image_locations(text, cache) ⇒ Object

#cache_latex_literal(markdown, cache) ⇒ Object

#cache_math(text, cache) ⇒ Object

#cache_raw_latex(markdown, cache) ⇒ Object

#convert_code_inclusion(text, cache) ⇒ Object

#convert_includegraphics(text) ⇒ Object

#convert_tt(text) ⇒ Object

#restore_hashed_content(text, cache) ⇒ Object

#restore_math(text, cache) ⇒ Object

#to_polytex ⇒ Object