TeXLogParser

˙ ˙

This small Ruby gem eases many pains around digesting logs from (La)TeX engines. Used as a command-line program or library, it converts (La)TeX logs into human- or machine-readable forms.

Disclaimer: Due to the nature of (La)TeX logs, parsing is inherently heuristic.

Installation

On any system with working Ruby (≥ 2.3), installation is as simple as this:

[sudo] gem install tex_log_parser

The usual options and, later, update mechanisms of Rubygems apply; please refer to their documentation for details.

Usage

There are two ways to parse logs: with the command-line program and via the underlying Ruby API.

Command-line Interface

By default, texlogparser reads from stdin and writes to stdout. That is, you can use it like so:

pdflatex -interaction=nonstopmode example.tex | texlogparser

This adds so little runtime overhead that there are few reasons not to use it. Note that the original log file will still be written to example.log, so no information is lost.

Important: Without nonstopmode, pdflatex et al. stop on errors to interact with the user; texlogparser is not prepared to play the middle man for that and will block.

You can also read from and/or write to files:

texlogparser -i example.log                          # From file, to stdout
texlogparser -i example.log -o example.simple.log    # From and to file
cat example.log | texlogparser -o example.simple.log # From stdin, to file

If you want to use the output programmatically, you may want to add option -f json. It does just what it sounds like.

Ruby API

The interface is rather narrow; your main entry point is class TexLogParser. Calling parse on it will yield a list of Message objects.

Here is a minimal yet complete example:

require 'tex_log_parser'

log = File.readlines('example.log')
parser = TexLogParser.new(log)
puts parser.parse[0]

Recommendations

Here are some tips on how to generate logs that do not trip up parsing unnecessarily:

Use _latex option -file-line-error to get higher accuracy regarding source files and lines.
Increase the maximum line length as much as possible to improve overall efficacy. Bad linebreaks are bad.
Avoid parentheses and whitespace in file paths.
The shell output of the initial run of pdflatex et al. on a new file can contain output of subprograms, and be complicated in other ways as well. It is therefore more robust to use the log file as written to disk, and/or the output resp. log file produced by a subsequent run. (Don't worry, real errors will stick around!)

Contributing

For bug reports and feature requests, the usual rules apply: search for existing issues; join the discussion or create a new one; be specific and nice; expect nothing.

That aside, there are two groups of experts whose help would be much appreciated: (La)TeX gourmets and Ruby developers.

TeXians

Please report any logs that get parsed wrong, be it because whole messages are not found, or because not all details are correctly extracted.

Reports that provide the following information will be the most useful:

Full failing log of a minimal example (ideally with source document).
The engine(s) you use, e.g. pdflatex, xelatex, or lualatex.
Expected number of error, warning, and info messages (the latter optional).
Expected message with
- log line numbers (where the message starts and ends),
- level of the message (error, warning, or info), and
- which source file (and lines) it references.
Advanced: In case of wrong source files, run texlogparser -d on the log and note on which lines it changes file scopes in wrong ways.

If you also know a little Ruby, please consider translating those data into a (failing) test and open a pull request.

Some preemptive notes:

Issues around messages below warning level have low priority.
Problems caused by inopportune linebreaks are probably out of scope.

Bonus: Convince as many package maintainers to use the same standardized, robust way of writing to the log.

Rubyists

Any feedback about the code quality and usefulness of the documentation would be very appreciated. Particular areas of interest include:

Is the API designed in useful ways?
Does the documentation cover all your questions?
Is the Gem structured properly?
What can be improved to encourage code contributions?
Does the CLI script have problems on any platform?

Contributors

egreg and David Carlisle provided helpful test cases and insight in LaTeX Stack Exchange chat.