Class: EpubTools::StyleFinder

Inherits:
Object
  • Object
show all
Includes:
Loggable
Defined in:
lib/epub_tools/style_finder.rb

Overview

Finds css classes for bold and italic texts in Google Docs-generated EPUBs. Used by XHTMLCleaner and SplitChapters.

Instance Method Summary collapse

Methods included from Loggable

#log

Constructor Details

#initialize(options = {}) ⇒ StyleFinder

Initializes the class

Parameters:

  • options (Hash) (defaults to: {})

    Configuration options

Options Hash (options):

  • :file_path (String)

    XHTML file to be analyzed (required)

  • :output_path (String)

    Path to write the YAML file (default: ‘text_style_classes.yaml’)

  • :verbose (Boolean)

    Whether to print progress to STDOUT (default: false)

Raises:

  • (ArgumentError)


17
18
19
20
21
22
# File 'lib/epub_tools/style_finder.rb', line 17

def initialize(options = {})
  @file_path = options.fetch(:file_path)
  @output_path = options[:output_path] || 'text_style_classes.yaml'
  @verbose = options[:verbose] || false
  raise ArgumentError, "File does not exist: #{@file_path}" unless File.exist?(@file_path)
end

Instance Method Details

#runHash

Runs the finder

Returns:

  • (Hash)

    Data containing the extracted style classes (italics and bolds)



26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# File 'lib/epub_tools/style_finder.rb', line 26

def run
  doc = Nokogiri::HTML(File.read(@file_path))
  style_blocks = doc.xpath('//style').map(&:text).join("\n")

  italics = extract_classes(style_blocks, /font-style\s*:\s*italic/)
  bolds   = extract_classes(style_blocks, /font-weight\s*:\s*700/)

  print_summary(italics, bolds) if @verbose

  data = {
    'italics' => italics,
    'bolds' => bolds
  }
  File.write(@output_path, data.to_yaml)
  data
end