Peregrin

A library for inspecting Zhooks, Ochooks and EPUB ebooks, and converting between them.

Invented by Inventive Labs. Released under the MIT license.

More info: http://ochook.org/peregrin

Requirements

Ruby, at least 1.8.x.

You must have ImageMagick installed — specifically, you must have the 'convert' utility provided by ImageMagick somewhere in your PATH.

Required Ruby gems:

  • zipruby
  • nokogiri
  • mime-types

Peregrin from the command-line

You can use Peregrin to inspect a Zhook, Ochook or EPUB file from the command-line. It will perform very basic validation of the file and output an analysis.

$ peregrin strunk.epub
[EPUB]

Cover
  images/cover.png

Components [10]
  cover.xml
  title.xml
  about.xml
  main0.xml
  main1.xml
  main2.xml
  main3.xml
  main4.xml
  main5.xml
  main6.xml

Resources [2]
  css/main.css
  images/cover.png

Chapters
- Title
- About
- Chapter 1 - Introductory
- Chapter 2 - Elementary Rules of Usage
- Chapter 3 - Elementary Principles of Composition
- Chapter 4 - A Few Matters of Form
- Chapter 5 - Words and Expressions Commonly Misused
- Chapter 6 - Words Commonly Misspelled

Properties [5]
  title: The Elements of Style
  identifier: urn:uuid:6f82990c-9394-11df-920d-001cc0a62c0b
  language: en
  creator: William Strunk Jr.
  subject: Non-Fiction

Note that file type detection is quite naive — it just uses the path extension, and if the extension is not .zhook or .epub, it assumes the path is an Ochook directory.

You can also use Peregrin to convert from one format to another. Just provide two paths to the utility; it will convert from the first to the second.

$ peregrin strunk.epub strunk.zhook
[Zhook]
Cover
  cover.png

Components [1]
  index.html

Resources [2]
  css/main.css
  cover.png

Chapters
- Title
- About
- Chapter 1 - Introductory
- Chapter 2 - Elementary Rules of Usage
- Chapter 3 - Elementary Principles of Composition
- Chapter 4 - A Few Matters of Form
- Chapter 5 - Words and Expressions Commonly Misused
- Chapter 6 - Words Commonly Misspelled

Properties [5]
  title: The Elements of Style
  identifier: urn:uuid:6f82990c-9394-11df-920d-001cc0a62c0b
  language: en
  creator: William Strunk Jr.
  subject: Non-Fiction

Library usage

The three formats are represented in the Peregrin::Epub, Peregrin::Zhook and Peregrin::Ochook classes. Each format class responds to the following methods:

  • validate(path)
  • read(path) - creates an instance of the class from the path
  • new(book) - creates an instance of the class from a Peregrin::Book

Each instance of a format class responds to the following methods:

  • write(path)
  • to_book(options) - returns a Peregrin:Book object

Here's what a conversion routine might look like:

zhook = Peregrin::Zhook.read('foo.zhook') epub = Peregrin::Epub.new(zhook.to_book(:componentize => true)) epub.write('foo.epub')

Peregrin::Book

Between the three supported formats, there is an abstracted concept of "book" data, which holds the following information:

  • components - an array of Components that make up the linear content
  • chapters - an array of Chapters (with title, src and children)
  • properties - an array of Property metadata tuples (key/value + attributes)
  • resources - an array of Resources contained in the ebook, other than components
  • cover - the Resource that should be used as the cover of the ebook

There will probably be some changes to the shape of this data over the development of Peregrin, to ensure that the Book interchange object retains all relevant information about an ebook without lossiness. But for the moment, it's being kept as simple as possible.

Peregrin?

All this rhyming on "ook" put me in mind of the Took family. There is no deeper meaning.

History

  • 1.2.0
    • Metadata files like OPF, OCX now first-class citizens called 'blueprints'
    • Page progression direction from EPUB3 (@nono)
    • Fixed-layout attributes for components (@nono)
  • 1.1.4
    • Basic EPUB3 and EPUB fixed-layout read support (@klacointe)