Aggregate Contents From the Web
From version 0.2.1, EPUB Parser can parse unpacked(unzipped) EPUB files on the web and aggregate contents in the books.
Let's get contents of pretty cmmic Page Blanche from IDPF's GitHub repository: https://github.com/IDPF/epub3-samples/tree/master/30/page-blanche
We can consider URI
https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/ as the root directory of the book because we can get EPUB Open Container Format's
container.xml file from
Note: Don't forget slash at the end of URI
EPUB Parser can treat the URI as EPUB book file path and parse contents from it by using:
require 'epub/parser' uri = 'https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/' epub = ::.(uri, container_adapter: :UnpackedURI)
The trick is to setto . It makes it possible to parse EPUB book from the web. Now we can play with EPUB books as always!
As an example, I will show you a script to download all the files of specified EPUB book to local directory(source code is available in repository's aggregate-contents-from-web).
$ ruby examples/aggregate-contents-from-web.rb https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/ Started downloading EPUB contents... from: https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/ to: /tmp/epub-parser20150703-13148-ghdtfq Making mimetype file... Downloading META-INF/container.xml ... Downloading EPUB/package.opf ... Downloading EPUB/Style/style.css ... Downloading EPUB/Navigation/nav.xhtml ... Downloading EPUB/Navigation/toc.ncx ... Downloading EPUB/Content/cover.xhtml ... Downloading EPUB/Content/PageBlanche_Page_000.xhtml ... Downloading EPUB/Content/PageBlanche_Page_001.xhtml ... Downloading EPUB/Content/PageBlanche_Page_002.xhtml ... Downloading EPUB/Content/PageBlanche_Page_003.xhtml ... Downloading EPUB/Content/PageBlanche_Page_004.xhtml ... Downloading EPUB/Content/PageBlanche_Page_005.xhtml ... Downloading EPUB/Content/PageBlanche_Page_006.xhtml ... Downloading EPUB/Content/PageBlanche_Page_007.xhtml ... Downloading EPUB/Content/PageBlanche_Page_008.xhtml ... Downloading EPUB/Image/cover.jpg ... Downloading EPUB/Image/PageBlanche_Page_001.jpg ... Downloading EPUB/Image/PageBlanche_Page_002.jpg ... Downloading EPUB/Image/PageBlanche_Page_003.jpg ... Downloading EPUB/Image/PageBlanche_Page_004.jpg ... Downloading EPUB/Image/PageBlanche_Page_005.jpg ... Downloading EPUB/Image/PageBlanche_Page_006.jpg ... Downloading EPUB/Image/PageBlanche_Page_007.jpg ... Downloading EPUB/Image/PageBlanche_Page_008.jpg ... /tmp/epub-parser20150703-13148-ghdtfq
The last line of the output is path to directory which contents are downloaded to. We can repackage it as an EPUB file. Let's use epzip utility to do that easily:
$ epzip /tmp/epub-parser20150703-13148-ghdtfq ./page-blanche.epub
epub-open may also handle with URI as EPUB books.