Class: RDoc::Markup::ToLaTeX

Inherits:
Formatter
  • Object
show all
Includes:
Text
Defined in:
lib/rdoc/markup/to_latex.rb

Overview

This is an RDoc Converter/Formatter that turns the RDoc markup into LaTeX code. It’s intended for use with the RDoc::Generator::Papyrus class, but if you like you can use it on it’s own (but note this class absolutely depends on RDoc’s parser). To use it, you first have to instanciate this class, and then call the #convert method on it with the text you want to convert:

f = RDoc::Markup::ToLaTeX.new
f.convert("A *bold* and +typed+ text.")

Should result in:

A \textbf{bold} and \texttt{typed} text.

If for any reason you want to just escape LaTeX control characters, you may do so by calling the #escape method. See it’s documentation for an example.

Some parts of this class are heavily inspired by RDoc’s own code, namely:

  • ::new

  • #handle_special_TIDYLINK

See each method’s descriptions for more details.

How to write a formatter

RDoc offers an easy to adapt visitor pattern for creating new formatters. “Easy” to a certain extend, as soon as you get into inline formatting RDoc’s documentation lacks some serious information. Nevertheless, I’ll describe the process of formatting here, even if I reiterate some of the concepts the documentation for class RDoc::Markup::Formatter mentions.

First, you have to derive your class from RDoc::Markup::Formatter and then obscurely have to include the RDoc::Text module, because this one is responsible for parsing inline markup.

Assuming you already wrote a generator making use of your formatter (because without writing a generator, writing a formatter is a somewhat nonsense undertaking as noone instanciates the class), I continue on how RDoc interacts with your formatter class.

So, somewhere in your generator you call the ::new method of your formatter (preferably inside the YourGenerator#formatter method, but I assume you know this as it belongs to writing generators and not formatters). Ensure that this method takes at least one argument called markup, which defaults to a nil value! Till now I didn’t really find out what it’s for, but the RDoc::Markup::Formatter::new method expects it, so we should obey it. All other arguments are up to your choice, just ensure that you call super inside your initialize method with the markup parameter as it’s sole argument.

The next task for your initialize is to tell RDoc how to cope with the three inline formatting sequences: Bold, italic and teletypd text. Call the add_tag inherited from the Formatter class and pass it one of the following symbols along with how you want to transform the given sequence:

  • :BOLD

  • :EM

  • :TT

If you want to add so-called “specials” to your formatter (and you’re likely to, as hyperlinks are such specials), you have to dig around in RDoc’s own formatters, namely RDoc::Markup::ToHtml, and find out that there’s an instance variable called @markup that allows you to do this. Call it’s add_special method with a regular expression that finds your special and a name for it as a symbol. RDoc itself uses the following specials:

@markup.add_special(/((link:|https?:|mailto:|ftp:|www\.)\S+\w)/, :HYPERLINK)
@markup.add_special(/(((\{.*?\})|\b\S+?)\[\S+?\.\S+?\])/, :TIDYLINK)

If you add a special, you have to provide a handle_special_YOURSPECIAL method in your formatter, where YOURSPECIAL corresponds to the symbol you previously passed to the add_special method. This method gets passed a RDoc::Special object, from which you just need to know the text method that retrieves the text your regular expression matched. Apply whatever you want, and return a string RDoc will incorporate into the result.

During the formatting process RDoc calls various methods on your formatter, the full list can be seen in the documentation for the class RDoc::Markup::Formatter. Note that those methods should not return a string–in fact, RDoc ignores their return values. You are expected to keep track of your formatted text, e.g. create an instance variable @result in your initialize method and fill it with text in the methods called by RDoc.

When everything has been processed, RDoc calls the end_accepting method on your formatter instance. It’s return value is expected to be the complete parsing result, so if you used a string instance variable @result as I recommended above, you should return it’s value from that method.

Inline formatting

This isn’t as hard as I explained earlier, but you have to know what to do, otherwise you’ll be stuck with paragraphs being treated as paragraphs as a whole, but no inline formatting happens. So, to achieve this, you have to define a method that initiates the inline formatting process, RDoc’s HTML formatter’s method is RDoc::Markup::HTML#to_html, so you may choose a name fitting that name scheme (I did for this formatter as well, but the to_latex method is private). You then call this method inside your accept_paragraph method with the paragraph’s text as it’s argument. The content of the method cannot be known if you didn’t dig around in RDoc’s formatter sources–it’s the following:

convert_flow(@am.flow(paragraph_text_here))

So, what does this do? It uses the superclass’s (undocumented) instance variable @am, which is an instance of RDoc::AttributeFormatter that is responsible for keeping track of which inline text attributes to apply where. It has this magic method called flow which takes one argument: The text of the paragraph you want to format. It tokenizes the paragraph into little pieces of some RDoc tokens and plain strings and returns them as an array (yes, this was the inline parsing process). We then take that token array and pass it directly to the convert_flow method (inhertied from the Formatter class) which knows how to handle the token sequence and comes back to your formatter instance each time it wants to format something, bold or teletyped text for instance (remember? You defined that with add_tag). If you want to format plain text without any special markup as well (I had to for the LaTeX formatter, because for LaTeX several characters have to be escaped even in nonformatted text, e.g. the underscore) you have to provide the method convert_string. It will get passed all strings that don’t have any markup applied; it’s return value will be in the final result.

Direct Known Subclasses

ToLaTeX_Crossref

Constant Summary collapse

LIST_TYPE2LATEX =

Maps RDoc’s list types to the corresponding LaTeX ones.

{
  :BULLET => ["\\begin{itemize}", "\\end{itemize}"],
  :NUMBER => ["\\begin{enumerate}", "\\end{enumerate}"],
  :LABEL  => ["\\begin{description}", "\\end{description}"],
  :UALPHA => ["\\begin{ualphaenum}", "\\end{ualphaenum}"],
  :LALPHA => ["\\begin{lalphaenum}", "\\end{lalphaenum}"],
  :NOTE   => ["\\begin{description}", "\\end{description}"]
}.freeze
LATEX_HEADINGS =

LaTeX heading commands. 0 is nil as there is no zeroth heading.

[nil,                         #Dummy, no hash needed with this
"\\section{%s}",             #h1
"\\subsection{%s}",          #h2
"\\subsubsection{%s}",       #h3
"\\subsubsubsection{%s}",    #h4
"\\microsection*{%s}",       #h5
"\\paragraph*{%s.} ",        #h6
"\\subparagraph*{%s}",       #Needed??
"%s", "%s", "%s", "%s", "%s", "%s"].freeze
LATEX_OPT_HEADINGS =

LaTeX heading commands for headings with an optional argument to change the TOC entry. Just reach till level 4, because lower headings don’t show up in the TOC at all.

[nil,
"\\section[%s]{%s}",
"\\subsection[%s]{%s}",
"\\subsubsection[%s]{%s}",
"\\subsubsubsection[%s]{%s}"
]
LATEX_SPECIAL_CHARS =

Characters that need to be escaped for LaTeX and their corresponding escape sequences. Note the order if important, otherwise some things (especiallaly \ and {}) are escaped twice.

{
  /\\/    => "\\textbackslash{}",
  /\$/    => "\\$",
  /#/     => "\\#",
  /%/     => "\\%",
  /\^/    => "\\^",
  /&/     => "\\\\&", #WTF? \\& in gsub doesn't do anything?! TODO: File Ruby bug when back from vaction...
  /(?<!textbackslash){/  => "\\{",
  /(?<!textbackslash{)}/ => "\\}",
  /_/     => "\\textunderscore{}",
  /\.{3}/ => "\\ldots{}",
  /~/     => "\\~",
  /©/     => "\\copyright{}",
  /LaTeX/ => "\\LaTeX{}"
}.freeze

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(heading_level = 0, inputencoding = "UTF-8", markup = nil) ⇒ ToLaTeX

Instanciates this formatter.

Parameters

heading_level

Minimum heading level. Useful for context-based heading;

a value of 1 indicates that all requested level 2 headings
are turned into level 3 ones; a value of 2 would turn them
into level 4 ones.
markup

Parameter expected by the superclass. TODO: What for?

Return value

A new instance of this class.

Example

f = RDoc::Formatter::ToLaTeX.new
puts f.convert("Some *bold* text") #=> Some \textbf{bold} text

Remarks

Some lines of this method have their origin in the RDoc project. See the code for more details.



234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
# File 'lib/rdoc/markup/to_latex.rb', line 234

def initialize(heading_level = 0, inputencoding = "UTF-8", markup = nil)
  super(markup)
  
  @heading_level = heading_level
  @inputencoding = "UTF-8"
  @result = ""
  @list_in_progress = nil
  
  #Copied from RDoc 3.8, adds link capabilities
  @markup.add_special(/((link:|https?:|mailto:|ftp:|www\.)\S+\w)/, :HYPERLINK)
  @markup.add_special(/(((\{.*?\})|\b\S+?)\[\S+?\.\S+?\])/, :TIDYLINK)

  #Add definitions for inline markup
  add_tag(:BOLD, "\\textbf{", "}")
  add_tag(:TT, "\\verb~", "~")
  add_tag(:EM, "\\textit{", "}")
end

Instance Attribute Details

#heading_levelObject (readonly)

Level relative to which headings are produced from this formatter. E.g., if this is 1, and the user requests a level 2 heading, he actually gets a level 3 one.



211
212
213
# File 'lib/rdoc/markup/to_latex.rb', line 211

def heading_level
  @heading_level
end

#list_in_progressObject (readonly)

The innermost type of list we’re currently in or nil if we don’t process a list at the moment.



217
218
219
# File 'lib/rdoc/markup/to_latex.rb', line 217

def list_in_progress
  @list_in_progress
end

#resultObject (readonly) Also known as: res

Contains everything processed so far as a string.



213
214
215
# File 'lib/rdoc/markup/to_latex.rb', line 213

def result
  @result
end

Instance Method Details

#accept_blank_line(line) ⇒ Object

Termiantes a paragraph by inserting two newlines.



319
320
321
322
# File 'lib/rdoc/markup/to_latex.rb', line 319

def accept_blank_line(line)
  @result.chomp!
  @result << "\n\n"
end

#accept_heading(head) ⇒ Object

Adds a fitting section, subsection, etc. for the heading.



325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
# File 'lib/rdoc/markup/to_latex.rb', line 325

def accept_heading(head)
  #Verbatim text inside headings is one of LaTeX’s ways to hell.
  #We need to take special care of this by means of fancyvrb’s
  #\SaveVerb command plus suppressing the verbatim inside the TOC.
  hsh = save_verbs(enc(head.text))
  
  if hsh[:save_verbs].empty?
    @result << sprintf(LATEX_HEADINGS[@heading_level + head.level], hsh[:save_inline]) << "\n"
  else #OK, some fool must have verbatim in the heading...
    @result << hsh[:save_verbs]
    heading = LATEX_OPT_HEADINGS[@heading_level + head.level]
    if heading
      @result << sprintf(heading, hsh[:plain_inline], hsh[:save_inline]) << "\n"
    else #Heading not in TOC
      @result << sprintf(LATEX_HEADINGS[@heading_level + head.level], hsh[:save_inline]) << "\n"
    end
  end
end

#accept_list_end(list) ⇒ Object

Adds endlist_type.



289
290
291
292
# File 'lib/rdoc/markup/to_latex.rb', line 289

def accept_list_end(list)
  @result << LIST_TYPE2LATEX[list.type][1] << "\n"
  @list_in_progress = nil
end

#accept_list_item_end(item) ⇒ Object

Adds a terminating n for a list item if this is necessary (usually the newline is automatically created by processing the list paragraph).



314
315
316
# File 'lib/rdoc/markup/to_latex.rb', line 314

def accept_list_item_end(item)
  @result << "\n" unless @result.end_with?("\n")
end

#accept_list_item_start(item) ⇒ Object

Adds item.



295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
# File 'lib/rdoc/markup/to_latex.rb', line 295

def accept_list_item_start(item)
  if item.label
    #Verbatim inside list labels is dangerous!
    hsh = save_verbs(enc(item.label))
    @result << hsh[:save_verbs]
    
    if @list_in_progress == :NOTE
      @result << "\\item[#{hsh[:save_inline]}:] " #Newline done by ending paragraph
    else
      @result << "\\item[#{hsh[:save_inline]}] " #Newline done by ending paragraph
    end
  else
    @result << "\\item " #Newline done by ending method
  end
end

#accept_list_start(list) ⇒ Object

Adds begintype>.



283
284
285
286
# File 'lib/rdoc/markup/to_latex.rb', line 283

def accept_list_start(list)
  @list_in_progress = list.type
  @result << LIST_TYPE2LATEX[list.type][0] << "\n"
end

#accept_paragraph(par) ⇒ Object

Adds par’s text plus newline to the result.



267
268
269
# File 'lib/rdoc/markup/to_latex.rb', line 267

def accept_paragraph(par)
  @result << to_latex(enc(par.text)) << "\n"
end

#accept_raw(raw) ⇒ Object

Writes the raw thing as-is into the document.



345
346
347
# File 'lib/rdoc/markup/to_latex.rb', line 345

def accept_raw(raw)
  @result << raw.parts.join("\n")
end

#accept_rule(rule) ⇒ Object

Adds a rule. The rule’s height is rule.weight pt, the rule’s width textwidth.



278
279
280
# File 'lib/rdoc/markup/to_latex.rb', line 278

def accept_rule(rule)
  @result << "\\par\\noindent\\rule{\\textwidth}{" << rule.weight.to_s << "pt}\\par\n"
end

#accept_verbatim(ver) ⇒ Object

Puts ver’s text between beginverbatim and endverbatim



272
273
274
# File 'lib/rdoc/markup/to_latex.rb', line 272

def accept_verbatim(ver)
  @result << "\\begin{Verbatim}\n" << enc(ver.text).chomp << "\n\\end{Verbatim}\n"
end

#convert_string(str) ⇒ Object

Called for each plaintext string in a paragraph by the #convert_flow method called in #to_latex.



356
357
358
359
360
361
362
# File 'lib/rdoc/markup/to_latex.rb', line 356

def convert_string(str)
  if in_tt?
    enc(str)
  else
    escape(enc(str))
  end
end

#end_acceptingObject

Last method called. Supposed to return the result string.



262
263
264
# File 'lib/rdoc/markup/to_latex.rb', line 262

def end_accepting
  @result
end

#escape(str) ⇒ Object

Escapes all LaTeX control characters from a string.

Parameter

str

The string to remove the characters from.

Return value

A new string with many backslashes. :-)

Example

f = RDoc::Markup::ToLaTeX.new
str = "I paid 20$ to buy the_item #15."
puts f.escape(str) #=> I paid 20\$ to buy the\textunderscore{}item \#15.


386
387
388
389
390
391
392
# File 'lib/rdoc/markup/to_latex.rb', line 386

def escape(str)
  result = str.dup
  LATEX_SPECIAL_CHARS.each_pair do |regexp, escape_seq|
    result.gsub!(regexp, escape_seq)
  end
  result
end

Handles raw hyperlinks.



350
351
352
# File 'lib/rdoc/markup/to_latex.rb', line 350

def handle_special_HYPERLINK(special)
  make_url(special.text)
end

Method copied from RDoc project and slightly modified.

Handles hyperlinks of form text and text.



367
368
369
370
371
372
373
374
375
# File 'lib/rdoc/markup/to_latex.rb', line 367

def handle_special_TIDYLINK(special)
  text = enc(special.text)

  return escape(text) unless text =~ /\{(.*?)\}\[(.*?)\]/ or text =~ /(\S+)\[(.*?)\]/

  label = $1
  url   = $2
  make_url url, escape(label)
end

#start_acceptingObject

First method called.



257
258
259
# File 'lib/rdoc/markup/to_latex.rb', line 257

def start_accepting
  @result = ""
end