Class: IiifPrint::DerivativeRodeoService
- Inherits:
-
Object
- Object
- IiifPrint::DerivativeRodeoService
- Defined in:
- app/services/iiif_print/derivative_rodeo_service.rb
Overview
This class implements the interface of a Hyrax::DerivativeService.
That means three important methods are:
And the object initializes with a FileSet.
It is a companion to PluggableDerivativeService.
rubocop:disable Metrics/ClassLength
Class Attributes collapse
-
#named_derivatives_and_generators_by_type ⇒ Hash<Symbol, #constantize>
The named derivative and it’s associated generator.
-
#named_derivatives_and_generators_filter ⇒ #call
With three named parameters: :filename, :candidates, :file_set.
-
#parent_work_identifier_property_name ⇒ String
The property we use to identify the unique identifier of the parent work as it went through the SpaceStone pre-process.
-
#preprocessed_location_adapter_name ⇒ String
The name of a derivative rodeo storage location; this will must be a registered with the DerivativeRodeo::StorageLocations::BaseLocation.
Instance Attribute Summary collapse
-
#file_set ⇒ Object
readonly
Returns the value of attribute file_set.
Class Method Summary collapse
-
.derivative_rodeo_preprocessed_directory_for(file_set:, filename:) ⇒ String, NilClass
By convention, we’re putting the files of a work in a “directory” that is based on some identifying value (e.g. an object’s AARK ID) of the work.
-
.derivative_rodeo_uri(file_set:, filename: nil, extension: nil, adapter_name: preprocessed_location_adapter_name) ⇒ String, NilClass
This method encodes some existing assumptions about the URI based on implementations for Adventist.
-
.get_ancestor(filename: nil, file_set:) ⇒ Object
Figure out the ancestor type and ancestor.
Instance Method Summary collapse
-
#cleanup_derivatives ⇒ Object
We need to clean up the derivatives that we created.
-
#create_derivatives(filename) ⇒ Object
The file_set.class.*_mime_types are carried over from Hyrax.
-
#initialize(file_set) ⇒ DerivativeRodeoService
constructor
rubocop:enable Metrics/MethodLength.
-
#named_derivatives_and_generators ⇒ Hash<Symbol,String] The named derivative types and their corresponding generators.
Hash<Symbol,String] The named derivative types and their corresponding generators.
- #valid? ⇒ Boolean
Constructor Details
#initialize(file_set) ⇒ DerivativeRodeoService
rubocop:enable Metrics/MethodLength
232 233 234 |
# File 'app/services/iiif_print/derivative_rodeo_service.rb', line 232 def initialize(file_set) @file_set = file_set end |
Instance Attribute Details
#file_set ⇒ Object (readonly)
Returns the value of attribute file_set.
236 237 238 |
# File 'app/services/iiif_print/derivative_rodeo_service.rb', line 236 def file_set @file_set end |
#named_derivatives_and_generators_by_type ⇒ Hash<Symbol, #constantize>
Could be nice to have a registry for the DerivativeRodeo::Generators; but that’s a tomorrow wish.
Returns the named derivative and it’s associated generator. The “name” is important for Hyrax or IIIF Print implementations. The generator is one that exists in the DerivativeRodeo.
57 58 59 60 61 62 63 64 65 66 67 |
# File 'app/services/iiif_print/derivative_rodeo_service.rb', line 57 class_attribute(:named_derivatives_and_generators_by_type, default: { pdf: { thumbnail: "DerivativeRodeo::Generators::ThumbnailGenerator" }, image: { thumbnail: "DerivativeRodeo::Generators::ThumbnailGenerator", json: "DerivativeRodeo::Generators::WordCoordinatesGenerator", xml: "DerivativeRodeo::Generators::AltoGenerator", txt: "DerivativeRodeo::Generators::PlainTextGenerator" } }) |
#named_derivatives_and_generators_filter ⇒ #call
Returns with three named parameters: :filename, :candidates, :file_set
-
:file_set is a FileSet
-
:filename is a String
-
:named_derivatives_and_generators is an entry from #named_derivatives_and_generators_by_type as pulled from #named_derivatives_and_generators
The lambda is responsible for filtering any named generators that should or should not be run. It should return a data structure similar to the provided :named_derivatives_and_generators.
97 98 |
# File 'app/services/iiif_print/derivative_rodeo_service.rb', line 97 class_attribute(:named_derivatives_and_generators_filter, default: ->(file_set:, filename:, named_derivatives_and_generators:) { named_derivatives_and_generators }) |
#parent_work_identifier_property_name ⇒ String
The default of :aark_id is a quick hack for adventist. By exposing a configuration value, my hope is that this becomes easier to configure.
Returns the property we use to identify the unique identifier of the parent work as it went through the SpaceStone pre-process.
28 |
# File 'app/services/iiif_print/derivative_rodeo_service.rb', line 28 class_attribute :parent_work_identifier_property_name, default: 'aark_id' |
#preprocessed_location_adapter_name ⇒ String
Returns The name of a derivative rodeo storage location; this will must be a registered with the DerivativeRodeo::StorageLocations::BaseLocation.
35 |
# File 'app/services/iiif_print/derivative_rodeo_service.rb', line 35 class_attribute :preprocessed_location_adapter_name, default: 's3' |
Class Method Details
.derivative_rodeo_preprocessed_directory_for(file_set:, filename:) ⇒ String, NilClass
You may find yourself wanting to override this method. Please do if you find a better way to do this.
By convention, we’re putting the files of a work in a “directory” that is based on some identifying value (e.g. an object’s AARK ID) of the work.
Because we split PDFs (see SplitPdfs::DerivativeRodeoSplitter we need to consider that we may be working on the PDF (and that FileSet is directly associated with the work) or we are working on one of the pages ripped from the PDF (and the FileSet’s work is a to be related child work of the original work).
rubocop:disable Metrics/MethodLength
203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 |
# File 'app/services/iiif_print/derivative_rodeo_service.rb', line 203 def self.derivative_rodeo_preprocessed_directory_for(file_set:, filename:) ancestor, ancestor_type = get_ancestor(filename: filename, file_set: file_set) # Why might we not have an ancestor? In the case of grandparent_for, we may not yet have run # the create relationships job. We could sneak a peak in the table to maybe glean some insight. # However, read further the `else` clause to see the novel approach. # rubocop:disable Style/GuardClause if ancestor = "#{self.class}.#{__method__} #{file_set.class} ID=#{file_set.id} and filename: #{filename.inspect}" \ "has #{ancestor_type} of #{ancestor.class} ID=#{ancestor.id}" Rails.logger.info() parent_work_identifier = ancestor.public_send(parent_work_identifier_property_name) return parent_work_identifier if parent_work_identifier.present? Rails.logger.warn("Expected #{ancestor.class} ID=#{ancestor.id} (#{ancestor_type} of #{file_set.class} ID=#{file_set.id}) " \ "to have a present #{parent_work_identifier_property_name.inspect}") nil else # HACK: This makes critical assumptions about how we're creating the title for the file_set; # but we don't have much to fall-back on. Consider making this a configurable function. Or # perhaps this entire method should be more configurable. # TODO: Revisit this implementation. candidate = file_set.title.first.split(".").first return candidate if candidate.present? nil end # rubocop:enable Style/GuardClause end |
.derivative_rodeo_uri(file_set:, filename: nil, extension: nil, adapter_name: preprocessed_location_adapter_name) ⇒ String, NilClass
This method encodes some existing assumptions about the URI based on implementations for Adventist. Those are reasonable assumptions but time will tell how reasonable.
By convention, this method is returning output_location of the SpaceStone::Serverless processing. We might know the original location that SpaceStone::Serverless processed, but that seems to be a tenuous assumption.
In other words, where would SpaceStone, by convention, have written the original file and by convention written that original file’s derivatives.
TODO: We also need to account for PDF splitting
rubocop:disable Metrics/MethodLength
143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 |
# File 'app/services/iiif_print/derivative_rodeo_service.rb', line 143 def self.derivative_rodeo_uri(file_set:, filename: nil, extension: nil, adapter_name: preprocessed_location_adapter_name) # TODO: This is a hack that knows about the inner workings of Hydra::Works, but for # expendiency, I'm using it. See # https://github.com/samvera/hydra-works/blob/c9b9dd0cf11de671920ba0a7161db68ccf9b7f6d/lib/hydra/works/services/add_file_to_file_set.rb#L49-L53 filename ||= Hydra::Works::DetermineOriginalName.call(file_set.original_file) dirname = derivative_rodeo_preprocessed_directory_for(file_set: file_set, filename: filename) return nil unless dirname # The aforementioned filename and the following basename and extension are here to allow for # us to take an original file and see if we've pre-processed the derivative file. In the # pre-processed derivative case, that would mean we have a different extension than the # original. extension ||= File.extname(filename) extension = ".#{extension}" unless extension.start_with?(".") # We want to strip off the extension of the given filename. basename = File.basename(filename, File.extname(filename)) # TODO: What kinds of exceptions might we raise if the location is not configured? Do we need # to "validate" it in another step. location = DerivativeRodeo::StorageLocations::BaseLocation.load_location(adapter_name) File.join(location.adapter_prefix, dirname, "#{basename}#{extension}") end |
.get_ancestor(filename: nil, file_set:) ⇒ Object
Figure out the ancestor type and ancestor
174 175 176 177 178 179 180 181 182 |
# File 'app/services/iiif_print/derivative_rodeo_service.rb', line 174 def self.get_ancestor(filename: nil, file_set:) # In the case of a page split from a PDF, we need to know the grandparent's identifier to # find the file(s) in the DerivativeRodeo. if DerivativeRodeo::Generators::PdfSplitGenerator.filename_for_a_derived_page_from_a_pdf?(filename: filename) [IiifPrint.grandparent_for(file_set), :grandparent] else [IiifPrint.parent_for(file_set), :parent] end end |
Instance Method Details
#cleanup_derivatives ⇒ Object
Due to the configurability and plasticity of the named derivatives, it is possible that when we created the derivatives, we had a different configuration (e.g. were we to create derivatives again, we might get a set of different files). So we must ask ourselves, is it important to clean up all derivatives (even ones that may not be in scope for this service) or to clean up only those presently in scope? I am favoring removing all of them. In part because of the nature of the valid derivative service.
We need to clean up the derivatives that we created.
287 288 289 290 291 292 293 294 295 296 297 298 299 300 |
# File 'app/services/iiif_print/derivative_rodeo_service.rb', line 287 def cleanup_derivatives ## Were we to only delete the derivatives that this service presently creates, this would be ## that code: # # named_derivatives_and_generators.keys.each do |named_derivative| # path = absolute_derivative_path_for(named_derivative) # FileUtils.rm_f(path) if File.exist?(path) # end ## Instead, let's clean it all up. Hyrax::DerivativePath.derivatives_for_reference(file_set).each do |path| FileUtils.rm_f(path) if File.exist?(path) end end |
#create_derivatives(filename) ⇒ Object
We write derivatives to the #absolute_derivative_path_for and should likewise clean them up when deleted.
The file_set.class.*_mime_types are carried over from Hyrax.
265 266 267 268 269 270 271 272 273 274 275 |
# File 'app/services/iiif_print/derivative_rodeo_service.rb', line 265 def create_derivatives(filename) named_derivatives_and_generators_filter .call(file_set: file_set, filename: filename, named_derivatives_and_generators: named_derivatives_and_generators) .flat_map do |named_derivative, generator_name| lasso_up_some_derivatives( named_derivative: named_derivative, generator_name: generator_name, filename: filename ) end end |
#named_derivatives_and_generators ⇒ Hash<Symbol,String] The named derivative types and their corresponding generators.
Returns Hash<Symbol,String] The named derivative types and their corresponding generators.
110 111 112 113 114 115 116 117 118 119 |
# File 'app/services/iiif_print/derivative_rodeo_service.rb', line 110 def named_derivatives_and_generators @named_derivatives_and_generators ||= if file_set.class.pdf_mime_types.include?(mime_type) named_derivatives_and_generators_by_type.fetch(:pdf).deep_dup elsif file_set.class.image_mime_types.include?(mime_type) named_derivatives_and_generators_by_type.fetch(:image).deep_dup else raise UnexpectedMimeTypeError.new(file_set: file_set, mime_type: mime_type) end end |
#valid? ⇒ Boolean
242 243 244 245 246 247 248 249 250 |
# File 'app/services/iiif_print/derivative_rodeo_service.rb', line 242 def valid? if in_the_rodeo? Rails.logger.info("Using the DerivativeRodeo for FileSet ID=#{file_set.id} with mime_type of #{mime_type}") true else Rails.logger.info("Skipping the DerivativeRodeo for FileSet ID=#{file_set.id} with mime_type of #{mime_type}") false end end |