Module: Polytexnic::Preprocessor::Polytex

Includes:
Literal
Included in:
Polytexnic::Preprocessor
Defined in:
lib/polytexnic/preprocessors/polytex.rb

Constant Summary

Constants included from Literal

Literal::CODE_INCLUSION_REGEX, Literal::LANG_REGEX

Instance Method Summary collapse

Methods included from Literal

#cache_display_inline_math, #cache_display_math, #cache_inline_math, #cache_literal, #cache_literal_environments, #cache_unicode, #code_salt, #element, #equation_element, #hyperrefs, #literal_types, #math_environments

Instance Method Details

#cache_code_environmentsObject

Caches Markdown code environments. Included are indented environments, Leanpub-style indented environments, and GitHub-style code fencing.



106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
# File 'lib/polytexnic/preprocessors/polytex.rb', line 106

def cache_code_environments
  output = []
  lines = @source.split("\n")
  indentation = ' ' * 4
  while (line = lines.shift)
    if line =~ /\{lang="(.*?)"\}/
      language = $1
      code = []
      while (line = lines.shift) && line.match(/^#{indentation}(.*)$/) do
        code << $1
      end
      code = code.join("\n")
      key = digest(code)
      code_cache[key] = [code, language]
      output << key
      output << line
    elsif line =~ /^```\s*$/        # basic code fences
      while (line = lines.shift) && !line.match(/^```\s*$/)
        output << indentation + line
      end
      output << "\n"
    elsif line =~ /^```(\w+)\s*$/   # syntax-highlighted code fences
      language = $1
      code = []
      while (line = lines.shift) && !line.match(/^```\s*$/) do
        code << line
      end
      code = code.join("\n")
      key = digest(code)
      code_cache[key] = [code, language]
      output << key
    else
      output << line
    end
  end
  output.join("\n")
end

#cache_latex_literal(markdown, cache) ⇒ Object

Caches literal LaTeX environments.



53
54
55
56
57
58
59
60
61
62
63
64
65
# File 'lib/polytexnic/preprocessors/polytex.rb', line 53

def cache_latex_literal(markdown, cache)
  Polytexnic::Literal.literal_types.each do |literal|
    regex = /(\\begin\{#{Regexp.escape(literal)}\}
            .*?
            \\end\{#{Regexp.escape(literal)}\})
            /xm
    markdown.gsub!(regex) do
      key = digest($1)
      cache[key] = $1
      key
    end
  end
end

#cache_math(text, cache) ⇒ Object

Caches math. Leanpub uses the notation $$…/$$ for both inline and block math, with the only difference being the presences of newlines:

{$$} x^2 {/$$}  % inline

and

{$$}
x^2             % block
{/$$}

I personally hate this notation and convention, so we also support LaTeX-style ( x ) and [ x^2 - 2 = 0 ] notation.



154
155
156
157
158
159
160
161
162
163
164
165
# File 'lib/polytexnic/preprocessors/polytex.rb', line 154

def cache_math(text, cache)
  text.gsub!(/(?:\{\$\$\}\n(.*?)\n\{\/\$\$\}|\\\[(.*?)\\\])/) do
    key = digest($1 || $2)
    cache[[:block, key]] = $1 || $2
    key
  end
  text.gsub!(/(?:\{\$\$\}(.*?)\{\/\$\$\}|\\\((.*?)\\\))/) do
    key = digest($1 || $2)
    cache[[:inline, key]] = $1 || $2
    key
  end
end

#cache_raw_latex(markdown, cache) ⇒ Object

Caches raw LaTeX commands to be passed through the pipeline.



68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
# File 'lib/polytexnic/preprocessors/polytex.rb', line 68

def cache_raw_latex(markdown, cache)
  command_regex = /(
                    \s*\\.*\n         # Command on a single line
                    |
                    ~\\ref\{.*?\}     # reference with a tie
                    |
                    ~\\eqref\{.*?\}   # eq reference with a tie
                    |
                    \\\w+\{.*?\}      # command with one arg
                    |
                    \\\w+             # normal command
                    |
                    \\[ %&$#@]        # space or special character
                    )
                  /x
  markdown.gsub!(command_regex) do
    key = digest($1)
    cache[key] = $1
    key
  end
end

#convert_code_inclusion(text) ⇒ Object

Adds support for <<(path/to/code) inclusion. Yes, this is a bit of a hack, but it works.



45
46
47
# File 'lib/polytexnic/preprocessors/polytex.rb', line 45

def convert_code_inclusion(text)
  text.gsub!(/^\s*<<(\(.*?\))/) { "<!-- inclusion= <<#{$1}-->" }
end

#restore_inclusion(text) ⇒ Object



48
49
50
# File 'lib/polytexnic/preprocessors/polytex.rb', line 48

def restore_inclusion(text)
  text.gsub(/% <!-- inclusion= (.*?)-->/) { "%= #{$1}" }
end

#restore_math(text, cache) ⇒ Object

Restores the Markdown math. This is easy because we’re running everything through our LaTeX pipeline.



170
171
172
173
174
175
176
177
178
179
180
181
182
183
# File 'lib/polytexnic/preprocessors/polytex.rb', line 170

def restore_math(text, cache)
  cache.each do |(kind, key), value|
    case kind
    when :inline
      open  = '\('
      close =  '\)'
    when :block
      open  = '\[' + "\n"
      close = "\n" + '\]'
    end
    text.gsub!(key, open + value + close)
  end
  text
end

#restore_raw_latex(text, cache) ⇒ Object

Restores raw LaTeX from the cache



91
92
93
94
95
96
97
98
99
100
101
# File 'lib/polytexnic/preprocessors/polytex.rb', line 91

def restore_raw_latex(text, cache)
  cache.each do |key, value|
    if value == '\&'
      # Bizarrely, the default code doesn't work for '\&'.
      # I actually suspect it may be a bug in Ruby. This hacks around it.
      text.gsub!(key, value.sub(/\\/, '\\\\\\'))
    else
      text.gsub!(key, value)
    end
  end
end

#to_polytexObject

Converts Markdown to PolyTeX. We adopt a unified approach: rather than convert “Markdown” (I use the term loosely*) directly to HTML, we convert it to PolyTeX and then run everything through the PolyTeX pipeline. Happily, kramdown comes equipped with a ‘to_latex` method that does most of the heavy lifting. The ouput isn’t as clean as that produced by Pandoc (our previous choice), but it comes with significant advantages: (1) It’s written in Ruby, available as a gem, so its use eliminates an external dependency. (2) It’s the foundation for the “Markdown” interpreter used by Leanpub, so by using it ourselves we ensure greater compatibility with Leanpub books.

  • <rant>The number of mutually incompatible markup languages going

by the name “Markdown” is truly mind-boggling. Most of them add things to John Gruber’s original Markdown language in an ever-expanding attempt to bolt on the functionality needed to write longer documents. At this point, I fear that “Markdown” has become little more than a marketing term.</rant>



25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# File 'lib/polytexnic/preprocessors/polytex.rb', line 25

def to_polytex
  require 'Kramdown'
  cache = {}
  math_cache = {}
  cleaned_markdown = cache_code_environments
  cleaned_markdown.tap do |markdown|
    convert_code_inclusion(markdown)
    cache_latex_literal(markdown, cache)
    cache_raw_latex(markdown, cache)
    cache_math(markdown, math_cache)
  end
  # Override the header ordering, which starts with 'section' by default.
  lh = 'chapter,section,subsection,subsubsection,paragraph,subparagraph'
  kramdown = Kramdown::Document.new(cleaned_markdown, latex_headers: lh)
  @source = restore_inclusion(restore_math(kramdown.to_latex, math_cache))
  restore_raw_latex(@source, cache)
end