Class: IiifPrint::SplitPdfs::DerivativeRodeoSplitter
- Inherits:
-
Object
- Object
- IiifPrint::SplitPdfs::DerivativeRodeoSplitter
- Defined in:
- lib/iiif_print/split_pdfs/derivative_rodeo_splitter.rb
Overview
This class wraps the DerivativeRodeo::Generators::PdfSplitGenerator to find preprocessed images, or split a PDF if there are no preprocessed images.
We have already attached the original file to the file_set. We want to convert that original file that’s attached to a input_uri (e.g. “file://path/to/original-file” as in what we have written to Fedora as the PDF)
Instance Attribute Summary collapse
-
#file_set ⇒ Object
readonly
Returns the value of attribute file_set.
-
#filename ⇒ Object
readonly
Returns the value of attribute filename.
-
#input_uri ⇒ String
readonly
This is where, in “Fedora” we have the original file.
-
#output_location_template ⇒ String
readonly
This is the location where we’re going to write the derivatives that will “go into Fedora”; it is a local location, one that IIIF Print’s mounting application can directly do “File.read”.
Class Method Summary collapse
-
.call(filename, file_set:) ⇒ Array<String>
Paths to images split from each page of PDF file.
Instance Method Summary collapse
-
#handle_original_file_not_in_derivative_rodeo ⇒ String
private
When the file does not exist in the pre-processed location (e.g. “SpaceStone”) we need to ensure that we have something locally.
-
#initialize(filename, file_set:, output_tmp_dir: Dir.tmpdir) ⇒ DerivativeRodeoSplitter
constructor
A new instance of DerivativeRodeoSplitter.
-
#preprocessed_location_template ⇒ String
Where can we find the file that represents the pre-processing template.
-
#split_files ⇒ Array<Strings>
The paths to each of the images split off from the PDF.
Constructor Details
#initialize(filename, file_set:, output_tmp_dir: Dir.tmpdir) ⇒ DerivativeRodeoSplitter
Returns a new instance of DerivativeRodeoSplitter.
31 32 33 34 35 36 37 38 39 40 41 42 43 |
# File 'lib/iiif_print/split_pdfs/derivative_rodeo_splitter.rb', line 31 def initialize(filename, file_set:, output_tmp_dir: Dir.tmpdir) @filename = filename @file_set = file_set @input_uri = "file://#{filename}" # We are writing the images to a local location that CarrierWave can upload. This is a # local file, internal to IiifPrint; it looks like SpaceStone/DerivativeRodeo lingo, but # that's just a convenience. output_template_path = File.join(output_tmp_dir, '{{ dir_parts[-1..-1] }}', '{{ filename }}') @output_location_template = "file://#{output_template_path}" end |
Instance Attribute Details
#file_set ⇒ Object (readonly)
Returns the value of attribute file_set.
45 46 47 |
# File 'lib/iiif_print/split_pdfs/derivative_rodeo_splitter.rb', line 45 def file_set @file_set end |
#filename ⇒ Object (readonly)
Returns the value of attribute filename.
45 46 47 |
# File 'lib/iiif_print/split_pdfs/derivative_rodeo_splitter.rb', line 45 def filename @filename end |
#input_uri ⇒ String (readonly)
This is where, in “Fedora” we have the original file. This is not the original file in the pre-processing location but instead the long-term location of the file in the application that mounts IIIF Print.
53 54 55 |
# File 'lib/iiif_print/split_pdfs/derivative_rodeo_splitter.rb', line 53 def input_uri @input_uri end |
#output_location_template ⇒ String (readonly)
This is the location where we’re going to write the derivatives that will “go into Fedora”; it is a local location, one that IIIF Print’s mounting application can directly do “File.read”
61 62 63 |
# File 'lib/iiif_print/split_pdfs/derivative_rodeo_splitter.rb', line 61 def output_location_template @output_location_template end |
Class Method Details
.call(filename, file_set:) ⇒ Array<String>
Returns paths to images split from each page of PDF file.
20 21 22 |
# File 'lib/iiif_print/split_pdfs/derivative_rodeo_splitter.rb', line 20 def self.call(filename, file_set:) new(filename, file_set: file_set).split_files end |
Instance Method Details
#handle_original_file_not_in_derivative_rodeo ⇒ String
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
When the file does not exist in the pre-processed location (e.g. “SpaceStone”) we need to ensure that we have something locally. We copy the FileSet#import_url to the #input_uri location.
123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
# File 'lib/iiif_print/split_pdfs/derivative_rodeo_splitter.rb', line 123 def handle_original_file_not_in_derivative_rodeo # A quick short-circuit. Don't attempt to copy. Likely already covered by the DerivativeRodeo::Generators::CopyGenerator return input_uri if rodeo_conformant_uri_exists?(input_uri) = "#{self.class}##{__method__} found #{file_set.class}#import_url of #{file_set.import_url.inspect} to exist. " \ "Perhaps there was a problem in SpaceStone downloading the file? " \ "Regardless, we'll use DerivativeRodeo::Generators::CopyGenerator to ensure #{input_uri.inspect} exists. " \ "However, we'll almost certainly be generating child pages locally." Rails.logger.info() # This ensures that we have a copy of the file_set.import_uri at the input_uri location; # we likely have this. DerivativeRodeo::Generators::CopyGenerator.new( input_uris: [file_set.import_url], output_location_template: input_uri ).generated_uris.first end |
#preprocessed_location_template ⇒ String
The preprocessed_location_template should end in ‘.pdf`. The DerivativeRodeo::BaseGenerator::PdfSplitGenerator#derive_preprocessed_template_from will coerce the template into one that represents the split pages.
Where can we find the file that represents the pre-processing template. In this case, the original PDF file.
The logic handles a case where SpaceStone successfully fetched the file to then perform processing.
For example, SpaceStone::Serverless will pre-process derivatives and write them into an S3 bucket that we then use for IIIF Print.
rubocop:disable Metrics/MethodLength rubocop:disable Metrics/AbcSize
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
# File 'lib/iiif_print/split_pdfs/derivative_rodeo_splitter.rb', line 82 def preprocessed_location_template return @preprocessed_location_template if defined?(@preprocessed_location_template) derivative_rodeo_candidate = IiifPrint::DerivativeRodeoService.derivative_rodeo_uri(file_set: file_set, filename: filename) @preprocessed_location_template = if derivative_rodeo_candidate.blank? = "#{self.class}##{__method__} could not establish derivative_rodeo_candidate for " \ "#{file_set.class} ID=#{file_set&.id} #to_param=#{file_set&.to_param} with filename #{filename.inspect}. " \ "Move along little buddy." Rails.logger.debug() nil elsif rodeo_conformant_uri_exists?(derivative_rodeo_candidate) Rails.logger.debug("#{self.class}##{__method__} found existing file at location #{derivative_rodeo_candidate}. High five partner!") derivative_rodeo_candidate elsif file_set.import_url = "#{self.class}##{__method__} did not find #{derivative_rodeo_candidate.inspect} to exist. " \ "Moving on to check the #{file_set.class}#import_url of #{file_set.import_url.inspect}" Rails.logger.warn() handle_original_file_not_in_derivative_rodeo else = "#{self.class}##{__method__} could not find an existing file at #{derivative_rodeo_candidate} " \ "nor a remote_url for #{file_set.class} ID=#{file_set.id} #to_param=#{file_set&.to_param}. " \ "Returning `nil' as we have no possible preprocess. " \ "Maybe the input_uri #{input_uri.inspect} will be adequate." Rails.logger.warn() nil end end |
#split_files ⇒ Array<Strings>
Returns the paths to each of the images split off from the PDF.
149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 |
# File 'lib/iiif_print/split_pdfs/derivative_rodeo_splitter.rb', line 149 def split_files DerivativeRodeo::Generators::PdfSplitGenerator.new( input_uris: [input_uri], output_location_template: output_location_template, preprocessed_location_template: preprocessed_location_template ).generated_files.map(&:file_path) rescue => e = "#{self.class}##{__method__} encountered `#{e.class}' “#{e}” for " \ "input_uri: #{input_uri.inspect}, " \ "output_location_template: #{output_location_template.inspect}, and " \ "preprocessed_location_template: #{preprocessed_location_template.inspect}." exception = RuntimeError.new() exception.set_backtrace(e.backtrace) raise exception end |