Gem Version Build Status Code Climate

This gem processes Asciidoctor documents and outputs an XML representation of the document, intended as a document model for ISO International Standards. The XML representation can then be processed in turn to generate PDF or Microsoft Word output (via DocBook).

The document model intends to introduce rigour into the ISO standards authoring process; the existing Microsoft Word template from ISO do not support such rigour down to the element level. The ISO International Standard format is prescribed in ISO/IEC DIR 2 "Principles and rules for the structure and drafting of ISO and IEC documents", to a level that is amenable to an explicit document model. A formal document model would allow checking for consistency in format and content, and expedite authoring and quality control of ISO standards.

The document model ("ISO XML") is under development, but it already contains all the markup needed to render the "Rice document", the ISO’s model document of an international standard. It is expressed as a Relax NG Compact schema; actual validation occurs against its full Relax NG counterpart. A UML representation of the document model is given below. Note that the document model is currently still in the exploratory phase, and will likely be changing significantly.

Asciidoctor has been selected as the authoring tool to generate the document model representation of ISO standards. It is a document formatting tool like Markdown and DocBook, which combines the relative ease of use of the former (using relatively lightweight markup), and the rigour and expressively of the latter (it has a well-defined syntax, and was in fact initially developed as a DocBook document authoring tool). Asciidoctor has built-in capability to output Text, DocBook and HTML; so it can be used to preview the file as it is being authored.

Note that in order to generate output close to what is intended, the Asciidoc document includes a fair amount of formatting instructions (e.g. disabling section numbering where appropriate, the titling of Appendixes as Annexes), as well as ISO boilerplate text, and predefined section headers (sections are recognised by fixed titles such as Normative References). Authoring ISO standards in this fashion assumes that users will be populating an Asciidoc template, and not removing needed formatting instructions.

Features not visible in HTML preview

The gem uses built-in Asciidoc formatting as much as possible, so that users can retain the ability to preview documents; for Terms and Definitions clauses, which have a good deal of explicit structure, macros have been introduced for semantic markup (admitted terms, deprecated terms, etc). The default HTML output of an Asciidoc-formatted ISO document is quite close to the intended final output, with the following exceptions:

  • Terms and Definitions: each term is marked up as an unnumbered subclause, the semantic markup of alternate and other terms is not rendered visually.

  • Formuals: Asciidoctor has no provision for the automated numbering of isolated block formulas ("stem"), and does not display the number assigned a block formula in its default HTML processor—although it does provide automated numbering of examples. The encoding of formulas may change in future versions, although the final numbering is meant to be provided by downstream tools processing the ISO XML output.

  • Missing elements: The document model does not yet include Asciidoc elements that do not occur in the Rice document: in particular, source code, definition lists (except when used as keys for formuals or figures), examples (as distinct from figures; examples within Terms and Definitions are catered for), sidebars (as distinct from warnings), quotes.

  • Markup: Some connecting text which is used to convey markup structure is left out: in particular, DEPRECATED and SOURCE (replaced by formatting macros).

  • Tables: Table footnotes are treated like all other footnotes: they are rendered at the bottom of the document, rather than the bottom of the table, and they are not numbered separately.

  • Crossreferences: Footnoted crossreferences are indicated with the reference text fn in isolation, or fn: as a prefix to the reference text. The default HTML processor leaves these as is: if no reference text is given, only fn will be displayed (though it will still hyperlink to the right reference).

  • References: The convention for references is that ISO documents are cited without brackets by ISO number, and optionally year, whether they are normative or in the bibliography (e.g. ISO 20483:2013); while all other references are cited by bracketed number in the bibliography (e.g. [1]). The default HTML processor treats all references the same, and will bracket them (e.g. [ISO 20483:2013]). For the same reason, ISO references listed in the bibliography will be listed under an ISO reference, rather than a bracketed number.

  • References: References are rendered cited throughout, since they are automated. For that reason, if reference is to be made to both an undated and a dated version of an ISO reference, these need to be explicitly listed as separate references. (This is not done in the Rice model document, which lists ISO 6646, but under Terms and Definitions cites the dated ISO 6646:2011.

  • References: ISO references that are undated but published have their date indicated under the ISO standards format in an explanatory footnote. Because of constraints introduced by Asciidoctor, that explanation is instead given in square brackets in Asciidoc format.

  • Annexes: Subheadings cannot preserve subsection numbering, while also appearing inline with their text (e.g. Rice document, Annex B.2): they appear as headings in separate lines.

  • Annexes: Crossreferences to Annex subclauses are automatically prefixed with Clause rather than Annex or nothing.

  • Metadata: Document metadata such as document numbers, technical committees and title wording are not rendered in the default HTML output.

  • Patent Notice: Patent notices are treated and rendered as a subsection of the introduction, with an explicit subheading.

  • Numbering: The numbering of figures and tables is sequential in the default HTML processor: it does not include the Clause or Annex number. This, Figure 1, not Figure A.1.

  • Notes: There is no automatic note numbering by the default HTML processor.

  • Keys: Keys to formulas and figures are expected to be marked up as definition lists consistently, rather than as inline prose.

  • Figures: Simple figures are marked up as images, figures containing subfigures as examples. Numbering by the default HTML processor may be inconsistent. Subfigures are automatically numbered as independent figures.

  • Markup: The default HTML processor does not support CSS extensions such as small caps or strike through, though these can be marked up as CSS classes through custom macros in Asciidoc: a custom CSS stylesheet will be needed to render them.

TODO: May need to only encode figures as examples.

Document Attributes

The gem also relies on Asciidoc document attributes to provide necessary metadata about the document. These include:

:docnumber:

The ISO document number (mandatory)

:tc-docnumber:

The document number assigned by the Technical committee

:ref-docnumber:

The reference document number (appearing in page headers)

:partnumber:

The ISO document part number

:edition:

The document edition

:revdate:

The date the document was last updated

:copyright-year:

The year which will be claimed as when the copyright for the document was issued

:title-intro-en:

The introductory component of the English title of the document

:title-main-en:

The main component of the English title of the document (mandatory). (The first line of the Asciidoc document, which contains the title introduced with =, is ignored)

:title-part-en:

The English title of the document part

:title-intro-fr:

The introductory component of the French title of the document. (This document template presupposes authoring in English; a different template will be needed for French, including French titles of document components such as annexes.)

:title-main-fr:

The main component of the French title of the document (mandatory).

:title-part-fr:

The French title of the document part

:doctype:

The document type (see ISO deliverables: The different types of ISO publications) (mandatory). The permitted types are: international-standard, technical-specification, technical-report, publicly-available-specification, international-workshop-agreement, guide.

:docstage:

The stage code for the document status (see International harmonized stage codes)

:docsubstage:

The substage code for the document status (see International harmonized stage codes)

:secretariat:

The national body acting as the secretariat for the document in the deafting stage

:technical-committee-number:

The number of the relevant ISO technical committee

:technical-committee:

The name of the relevant ISO technical committee (mandatory)

:subcommittee-number:

The number of the relevant ISO subcommittee

:subcommittee:

The name of the relevant ISO subcommittee

:workgroup-number:

The number of the relevant ISO workgroup

:workgroup:

The name of the relevant ISO workgroup

:language:

The language of the document (en or fr) (mandatory)

The gem translates the document into ISO XML format, and then validates its output against the ISO XML document model; errors are reported to console against the XML, and are intended for users to check that they have provided all necessary components of the document.

The attribute :draft:, if present, includes review notes in the XML output; these are otherwise suppressed.

Usage

$ asciidoctor a.adoc  # HTML output of Asciidoc file
$ asciidoctor -b iso -r 'asciidoctor-iso' a.adoc  # ISO XML output

Document model

grammar1
grammar2
grammar3
grammar4

Examples

The gem has been tested to date against the "Rice document", the ISO’s model document of an international standard. This repository includes: