Class: EpubTools::XHTMLExtractor

Inherits:
Object
  • Object
show all
Includes:
Loggable
Defined in:
lib/epub_tools/xhtml_extractor.rb

Overview

Extracts text .xhtml files from EPUB archives, excluding nav.xhtml

Instance Method Summary collapse

Methods included from Loggable

#log

Constructor Details

#initialize(options = {}) ⇒ XHTMLExtractor

Initializes the class

Parameters:

  • options (Hash) (defaults to: {})

    Configuration options

Options Hash (options):

  • :source_dir (String)

    Directory containing source .epub files (required)

  • :target_dir (String)

    Directory where .xhtml files will be extracted (required)

  • :verbose (Boolean)

    Whether to print progress to STDOUT (default: false)



14
15
16
17
18
19
# File 'lib/epub_tools/xhtml_extractor.rb', line 14

def initialize(options = {})
  @source_dir = File.expand_path(options.fetch(:source_dir))
  @target_dir = File.expand_path(options.fetch(:target_dir))
  @verbose = options[:verbose] || false
  FileUtils.mkdir_p(@target_dir)
end

Instance Method Details

#runArray<String>

Runs the extraction process

Returns:

  • (Array<String>)

    Paths to all extracted XHTML files



23
24
25
26
27
28
29
30
# File 'lib/epub_tools/xhtml_extractor.rb', line 23

def run
  all_extracted_files = []
  epub_files.each do |epub_path|
    extracted = extract_xhtmls_from(epub_path)
    all_extracted_files.concat(extracted) if extracted
  end
  all_extracted_files
end