Quarto

Yet another ebook generation toolchain.

A "Quarto" is a bookbinding term, and this is my fourth attempt at an ebook toolchain.

Requirements

Quarto depends on several external programs which you will need to install before using it.

  • Pandoc
  • Pygments
  • xmllint
  • PrinceXML

Installation/Usage

  1. Install the gem (gem install quarto)
  2. Create a Rakefile in your book project root.
  3. Add require "quarto/tasks" to the top of the Rakefile.
  4. Run rake -T to see the available tasks.

Concepts

Quarto is a set of Rake tasks backed up by a Ruby library, which in turn relies heavily on Nokogiri and a number of external tools.

XHTML5 is king

The central philosophy of Quarto is to do as much work as possible with XHTML5 files. All input formats (e.g. Markdown) are first converted to XHTML5 before any other work is done. Then various transformations occur. Finally, at the end of the line, an XHTML5 "master" file is converted to various deliverable formats such as PDF. The reason for this philosophy is simple: Nokogiri makes it really easy to perform arbitrary semantic transformations on XHTML documents, without a lot of tedious mucking about with text munging. The more of the work that is done on DOM object trees, the easier it is to do.

The assembly line

Quarto is a set of Rake tasks, so execution normally starts with an end product and works backwards through the dependency chain to figure out what needs to be done to produce that product. However, it's probably easier to understand everything Quarto does by viewing it as an assembly line starting with source files and ending with deliverables. Here are the steps along the way.

Note that all files generated by Quarto are placed in a build subdirectory of your project's root. It will be created if needed.

  1. Source files. These are manuscript files in supported source formats (currently only Markdown). They might be in the root of your project, or in subdirectories.
  2. Source files are exported into export files in build/export. Export files are HTML, produced using whatever tool is appropriate for the input format. E.g. pandoc is used to export Markdown source files to HTML equivalents.
  3. The source files are then normalized into XHTML section files (in build/sections). During this normalization process any idiosyncrasies in the HTML produced by the export tool are dealt with.
  4. A spine file is generated. This XHTML file will be used to tie together all of the section files. The body of this file contains references to (but not the content of) all of the section files. It also contains stylesheets and other metadata.
  5. The spine file is then expanded into an XHTML codex file. This file contains the body content of all of the section files. Only body content is taken from the section files, everything else is ignored. From this point forward, all operations will be done on monolithic files rather than on partial files corresponding to the original sources.
  6. The spine file is searched for source code listings. Each listing is extracted out as text into a listing file (in build/listings). Listing files are named based on the SHA1 of the listing and its language, e.g. build/listings/3361c5f02e08bd44bde2d42633a2c9be201f7ec4.rb. Using the SHA1 in naming is an optimization which ensures that only changed code listings ever need to be re-highlighted (see the next step). During this step a skeleton file is also created. This XHTML file mirrors the codex file, except that all of the source code listings have been replaced with references to highlight files (see next step).
  7. The next step is to perform source code highlighting on the listing files, using Pygments. This produces highlight files, which are HTML files in the build/highlights directory. They named based on the SHA1 of the corresponding code listing.
  8. The skeleton file and the highlights file are then stitched back together into a master file. This XHTML file is the "gold standard" from which all deliverables will be generated.
  9. Deliverable files suitable for distributing to end-users, such as PDF or Epub files, are produced using the master file.

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request