Class: IiifPrint::SplitPdfs::DerivativeRodeoSplitter

Inherits:
Object
  • Object
show all
Defined in:
lib/iiif_print/split_pdfs/derivative_rodeo_splitter.rb

Overview

This class wraps the DerivativeRodeo::Generators::PdfSplitGenerator to find preprocessed images, or split a PDF if there are no preprocessed images.

We have already attached the original file to the file_set. We want to convert that original file that’s attached to a input_uri (e.g. “file://path/to/original-file” as in what we have written to Fedora as the PDF)

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(filename, file_set:, output_tmp_dir: Dir.tmpdir) ⇒ DerivativeRodeoSplitter

Returns a new instance of DerivativeRodeoSplitter.

Parameters:

  • filename (String)

    path to the original file. Note that we use #filename to derivate #input_uri

  • file_set (FileSet)

    the container for the original file and its derivatives.

  • output_tmp_dir (String) (defaults to: Dir.tmpdir)

    where we will be writing things. In using ‘Dir.mktmpdir` we’re creating a sudirectory on ‘Dir.tmpdir`



31
32
33
34
35
36
37
38
39
40
41
42
43
# File 'lib/iiif_print/split_pdfs/derivative_rodeo_splitter.rb', line 31

def initialize(filename, file_set:, output_tmp_dir: Dir.tmpdir)
  @filename = filename
  @file_set = file_set

  @input_uri = "file://#{filename}"

  # We are writing the images to a local location that CarrierWave can upload.  This is a
  # local file, internal to IiifPrint; it looks like SpaceStone/DerivativeRodeo lingo, but
  # that's just a convenience.
  output_template_path = File.join(output_tmp_dir, '{{ dir_parts[-1..-1] }}', '{{ filename }}')

  @output_location_template = "file://#{output_template_path}"
end

Instance Attribute Details

#file_setObject (readonly)

Returns the value of attribute file_set.



45
46
47
# File 'lib/iiif_print/split_pdfs/derivative_rodeo_splitter.rb', line 45

def file_set
  @file_set
end

#filenameObject (readonly)

Returns the value of attribute filename.



45
46
47
# File 'lib/iiif_print/split_pdfs/derivative_rodeo_splitter.rb', line 45

def filename
  @filename
end

#input_uriString (readonly)

This is where, in “Fedora” we have the original file. This is not the original file in the pre-processing location but instead the long-term location of the file in the application that mounts IIIF Print.

Returns:

  • (String)


53
54
55
# File 'lib/iiif_print/split_pdfs/derivative_rodeo_splitter.rb', line 53

def input_uri
  @input_uri
end

#output_location_templateString (readonly)

This is the location where we’re going to write the derivatives that will “go into Fedora”; it is a local location, one that IIIF Print’s mounting application can directly do “File.read”

Returns:

  • (String)


61
62
63
# File 'lib/iiif_print/split_pdfs/derivative_rodeo_splitter.rb', line 61

def output_location_template
  @output_location_template
end

Class Method Details

.call(filename, file_set:) ⇒ Array<String>

Returns paths to images split from each page of PDF file.

Parameters:

  • filename (String)

    the local path to the PDFDerivativeServicele

  • file_set (FileSet)

    file set containing the PDF file to split

Returns:

  • (Array<String>)

    paths to images split from each page of PDF file

See Also:



20
21
22
# File 'lib/iiif_print/split_pdfs/derivative_rodeo_splitter.rb', line 20

def self.call(filename, file_set:)
  new(filename, file_set: file_set).split_files
end

Instance Method Details

#handle_original_file_not_in_derivative_rodeoString

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

When the file does not exist in the pre-processed location (e.g. “SpaceStone”) we need to ensure that we have something locally. We copy the FileSet#import_url to the #input_uri location.

Returns:

Raises:

  • (DerivativeRodeo::Errors::FileMissingError)

    when the input_uri does not exist



123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
# File 'lib/iiif_print/split_pdfs/derivative_rodeo_splitter.rb', line 123

def handle_original_file_not_in_derivative_rodeo
  # A quick short-circuit.  Don't attempt to copy.  Likely already covered by the DerivativeRodeo::Generators::CopyGenerator
  return input_uri if rodeo_conformant_uri_exists?(input_uri)

  message = "#{self.class}##{__method__} found #{file_set.class}#import_url of #{file_set.import_url.inspect} to exist.  " \
            "Perhaps there was a problem in SpaceStone downloading the file?  " \
            "Regardless, we'll use DerivativeRodeo::Generators::CopyGenerator to ensure #{input_uri.inspect} exists.  " \
            "However, we'll almost certainly be generating child pages locally."
  Rails.logger.info(message)

  # This ensures that we have a copy of the file_set.import_uri at the input_uri location;
  # we likely have this.
  DerivativeRodeo::Generators::CopyGenerator.new(
    input_uris: [file_set.import_url],
    output_location_template: input_uri
  ).generated_uris.first
end

#preprocessed_location_templateString

Note:

The preprocessed_location_template should end in ‘.pdf`. The DerivativeRodeo::BaseGenerator::PdfSplitGenerator#derive_preprocessed_template_from will coerce the template into one that represents the split pages.

Where can we find the file that represents the pre-processing template. In this case, the original PDF file.

The logic handles a case where SpaceStone successfully fetched the file to then perform processing.

For example, SpaceStone::Serverless will pre-process derivatives and write them into an S3 bucket that we then use for IIIF Print.

rubocop:disable Metrics/MethodLength rubocop:disable Metrics/AbcSize



82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
# File 'lib/iiif_print/split_pdfs/derivative_rodeo_splitter.rb', line 82

def preprocessed_location_template
  return @preprocessed_location_template if defined?(@preprocessed_location_template)

  derivative_rodeo_candidate = IiifPrint::DerivativeRodeoService.derivative_rodeo_uri(file_set: file_set, filename: filename)

  @preprocessed_location_template =
    if derivative_rodeo_candidate.blank?
      message = "#{self.class}##{__method__} could not establish derivative_rodeo_candidate for " \
                "#{file_set.class} ID=#{file_set&.id} #to_param=#{file_set&.to_param} with filename #{filename.inspect}.  " \
                "Move along little buddy."
      Rails.logger.debug(message)
      nil
    elsif rodeo_conformant_uri_exists?(derivative_rodeo_candidate)
      Rails.logger.debug("#{self.class}##{__method__} found existing file at location #{derivative_rodeo_candidate}.  High five partner!")
      derivative_rodeo_candidate
    elsif file_set.import_url
      message = "#{self.class}##{__method__} did not find #{derivative_rodeo_candidate.inspect} to exist.  " \
                "Moving on to check the #{file_set.class}#import_url of #{file_set.import_url.inspect}"
      Rails.logger.warn(message)
      handle_original_file_not_in_derivative_rodeo
    else
      message = "#{self.class}##{__method__} could not find an existing file at #{derivative_rodeo_candidate} " \
                "nor a remote_url for #{file_set.class} ID=#{file_set.id} #to_param=#{file_set&.to_param}.  " \
                "Returning `nil' as we have no possible preprocess.  " \
                "Maybe the input_uri #{input_uri.inspect} will be adequate."
      Rails.logger.warn(message)
      nil
    end
end

#split_filesArray<Strings>

Returns the paths to each of the images split off from the PDF.

Returns:

  • (Array<Strings>)

    the paths to each of the images split off from the PDF.



149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
# File 'lib/iiif_print/split_pdfs/derivative_rodeo_splitter.rb', line 149

def split_files
  DerivativeRodeo::Generators::PdfSplitGenerator.new(
    input_uris: [input_uri],
    output_location_template: output_location_template,
    preprocessed_location_template: preprocessed_location_template
  ).generated_files.map(&:file_path)
rescue => e
  message = "#{self.class}##{__method__} encountered `#{e.class}' “#{e}” for " \
            "input_uri: #{input_uri.inspect}, " \
            "output_location_template: #{output_location_template.inspect}, and " \
            "preprocessed_location_template: #{preprocessed_location_template.inspect}."
  exception = RuntimeError.new(message)
  exception.set_backtrace(e.backtrace)
  raise exception
end