EPUB Parser

Build Status Dependency Status Gem Version


gem install epub-parser


As a library

require 'epub/parser'

book = EPUB::Parser.parse('book.epub')
book..titles # => Array of EPUB::Publication::Package::Metadata::Title. Main title, subtitle, etc...
book..title # => Title string including all titles
book..creators # => Creators(authors)
book.each_page_on_spine do |page|
  page.media_type # => "application/xhtml+xml"
  page.entry_name # => "OPS/nav.xhtml" entry name in EPUB package(zip archive)
  page.read # => raw content document
  page.content_document.nokogiri # => Nokogiri::XML::Document. The same to Nokogiri.XML(page.read)
  # do something more
  #    :

See document's Home or API Documentation for more info.

epubinfo command-line tool

epubinfo tool extracts and shows the metadata of specified EPUB book.

$ epubinfo ~/Documebts/Books/build_awesome_command_line_applications_in_ruby.epub
Title:              Build Awesome Command-Line Applications in Ruby (for KITAITI MAKOTO)
Identifiers:        978-1-934356-91-3
Titles:             Build Awesome Command-Line Applications in Ruby (for KITAITI MAKOTO)
Languages:          en
Creators:           David Bryant Copeland
Publishers:         The Pragmatic Bookshelf, LLC (338304)
Rights:             Copyright © 2012 Pragmatic Programmers, LLC
Subjects:           Pragmatic Bookshelf
Unique identifier:  978-1-934356-91-3
Epub version:       2.0

See Epubinfo for more info.

epub-open command-line tool

epub-open tool provides interactive shell(IRB) which helps you research about EPUB book.

epub-open path/to/book.epub

IRB starts. self becomes the EPUB book and can access to methods of EPUB.

=> "Title of the book"
=> [Author 1, Author2, ...]
=> #<Set: {"nav"}> # You know that first resource of this book is nav document
nav = resources.first
=> ...
=> #<Addressable::URI:0x15ce350 URI:nav.xhtml>
=> "application/xhtml+xml"
puts nav.read
<?xml version="1.0"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
=> nil
exit # Enter "exit" when exit the session

See EpubOpen for more info.


Documentation is available in homepage.

If you installed EPUB Parser by gem command, you can also generate documentaiton yourself(rubygems-yardoc gem is needed):

$ gem install epub-parser
$ gem yardoc epub-parser
Files:          33
Modules:        20 (   20 undocumented)
Classes:        45 (   44 undocumented)
Constants:      31 (   31 undocumented)
Methods:       292 (   88 undocumented)
52.84% documented
YARD documentation is generated to:

It will show you path to generated documentation(/path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc here) at the end.

Or, generating by yardoc command is possible, too:

$ git clone https://github.com/KitaitiMakoto/epub-parser.git
$ cd epub-parser
$ bundle install --path=deps
$ bundle exec rake doc:yard
Files:          33
Modules:        20 (   20 undocumented)
Classes:        45 (   44 undocumented)
Constants:      31 (   31 undocumented)
Methods:       292 (   88 undocumented)
52.84% documented

Then documentation will be available in doc directory.


  • Ruby 2.1.0 or later
  • patch command to install Nokogiri
  • C compiler to compile Zip/Ruby and Nokogiri
  • gepub - a generic EPUB library for Ruby
  • epubinfo - Extracts metadata information from EPUB files. Supports EPUB2 and EPUB3 formats.
  • ReVIEW - ReVIEW is a easy-to-use digital publishing system for books and ebooks.
  • epzip - epzip is EPUB packing tool. It's just only doing 'zip.' :)
  • eeepub - EeePub is a Ruby ePub generator
  • epub-maker - This library supports making and editing EPUB books based on this EPUB Parser library

If you find other gems, please tell me or request a pull request.



  • Bug fix for EPUB::CFI::Location#<=>
  • Change default physical container adapter from EPUB::OCF::PhysicalContainer::ZipRuby to EPUB::OCF::PhysicalContainer::ArchiveZip
  • Add EPUB::CFI::Step#element? and #character_data?
  • Change attribute name: EPUB::CFI::Step#step -> EPUB::CFI::Step#value, EPUB::CFI::CharacterOffset#offset -> EPUB::CFI::CharacterOffset#value
  • Show modified on epubinfo command


  • Change the name of physical container adapter for file system: :File -> :UnpackedDirectory
  • Add EPUB::Publication::Package::Manifest::Item#full_path
  • Make #href= acceptable String
  • Implement EPUB::CFI and EPUB::Parser::CFI
  • Remove nokogumbo from dependencies. It ommits head and body elements
  • Remove Cucumber and Cucumber features
  • Add EPUB::Publication::Package::Metadata#modified and EPUB::Book::Features#modified
  • Add EPUB::Book::Features#release_identifier


  • [BUGFIX]Item#entry_name returns normalized IRI


  • Remove deprecated EPUB::Constants::MediaType::UnsupportedError. Use UnsupportedMediatType instead.
  • Make it possible to use archive-zip gem to extract contents from EPUB package
  • Add warning about default physical container adapter change
  • Make it possible to extract contents from the web via EPUB::OCF::PhysicalContainer::UnpackedURI See ExtractContentsFromWeb for details.

See CHANGELOG for older changelogs and details.


  • EPUB 3.0.1
  • Multiple rootfiles
  • Help features for epub-open tool
  • Vocabulary Association Mechanisms
  • Implementing navigation document and so on
  • Media Overlays
  • Content Document
  • Digital Signature
  • Using SAX on parsing
  • Abstraction of XML parser(making it possible to use REXML, standard bundled XML library of Ruby)
  • Handle with encodings other than UTF-8


  • Simple inspect for epub-open tool
  • Using zip library instead of unzip command, which has security issue
  • Modify methods around fallback to see bindings element in the package
  • Content Document(only for Navigation Documents)
  • Fixed Layout
  • Vocabulary Association Mechanisms(only for itemref)
  • Archive library abstraction
  • Extracting and organizing common behavior from some classes to modules


This library is distribuetd under the term of the MIT License. See MIT-LICENSE file for more info.