Class: IiifPrint::DerivativeRodeoService

Inherits:
Object
  • Object
show all
Defined in:
app/services/iiif_print/derivative_rodeo_service.rb

Overview

This class implements the interface of a Hyrax::DerivativeService.

That means three important methods are:

And the object initializes with a FileSet.

It is a companion to PluggableDerivativeService.

rubocop:disable Metrics/ClassLength

Class Attributes collapse

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(file_set) ⇒ DerivativeRodeoService

rubocop:enable Metrics/MethodLength



232
233
234
# File 'app/services/iiif_print/derivative_rodeo_service.rb', line 232

def initialize(file_set)
  @file_set = file_set
end

Instance Attribute Details

#file_setObject (readonly)

Returns the value of attribute file_set.



236
237
238
# File 'app/services/iiif_print/derivative_rodeo_service.rb', line 236

def file_set
  @file_set
end

#named_derivatives_and_generators_by_typeHash<Symbol, #constantize>

TODO:

Could be nice to have a registry for the DerivativeRodeo::Generators; but that’s a tomorrow wish.

Returns the named derivative and it’s associated generator. The “name” is important for Hyrax or IIIF Print implementations. The generator is one that exists in the DerivativeRodeo.

Examples:

# In this case there are two changes:
#   1. Do not use the DerivativeRodeo to process PDFs; instead fallback to another
#      applicable service.
#   2. For Images, we will use the DerivativeRodeo but will only generate the thumbnail.
#      We will skip the JSON, XML, and TXT for an image.
#
# NOTE: Changing the behavior in this way may create broken assumptions in Hyrax.
IiifPrint::DerivativeRodeoService.named_derivatives_and_generators_by_type =
   { image: { thumbnail: "DerivativeRodeo::Generators::ThumbnailGenerator" } }

Returns:

  • (Hash<Symbol, #constantize>)

    the named derivative and it’s associated generator. The “name” is important for Hyrax or IIIF Print implementations. The generator is one that exists in the DerivativeRodeo.



57
58
59
60
61
62
63
64
65
66
67
# File 'app/services/iiif_print/derivative_rodeo_service.rb', line 57

class_attribute(:named_derivatives_and_generators_by_type, default: {
  pdf: {
    thumbnail: "DerivativeRodeo::Generators::ThumbnailGenerator"
  },
  image: {
    thumbnail: "DerivativeRodeo::Generators::ThumbnailGenerator",
    json: "DerivativeRodeo::Generators::WordCoordinatesGenerator",
    xml: "DerivativeRodeo::Generators::AltoGenerator",
    txt: "DerivativeRodeo::Generators::PlainTextGenerator"
  }
})

#named_derivatives_and_generators_filter#call

Returns with three named parameters: :filename, :candidates, :file_set

The lambda is responsible for filtering any named generators that should or should not be run. It should return a data structure similar to the provided :named_derivatives_and_generators.

Examples:

# The following configured filter will skip thumbnail generation for any files that
# end in '.tn.jpg'
IiifPrint::DerivativeRodeoService.named_derivatives_and_generators_filter =
  ->(file_set:, filename:, named_derivatives_and_generators:) do
    named_derivatives_and_generators.reject do |named_derivative, generators|
      named_derivative == :thumbnail && filename.downcase.ends_with?('.tn.jpg')
    end
  end

Returns:

  • (#call)

    with three named parameters: :filename, :candidates, :file_set

    The lambda is responsible for filtering any named generators that should or should not be run. It should return a data structure similar to the provided :named_derivatives_and_generators

See Also:



97
98
# File 'app/services/iiif_print/derivative_rodeo_service.rb', line 97

class_attribute(:named_derivatives_and_generators_filter,
default: ->(file_set:, filename:, named_derivatives_and_generators:) { named_derivatives_and_generators })

#parent_work_identifier_property_nameString

TODO:

The default of :aark_id is a quick hack for adventist. By exposing a configuration value, my hope is that this becomes easier to configure.

Returns the property we use to identify the unique identifier of the parent work as it went through the SpaceStone pre-process.

Returns:

  • (String)

    the property we use to identify the unique identifier of the parent work as it went through the SpaceStone pre-process.



28
# File 'app/services/iiif_print/derivative_rodeo_service.rb', line 28

class_attribute :parent_work_identifier_property_name, default: 'aark_id'

#preprocessed_location_adapter_nameString

Returns The name of a derivative rodeo storage location; this will must be a registered with the DerivativeRodeo::StorageLocations::BaseLocation.

Returns:

  • (String)

    The name of a derivative rodeo storage location; this will must be a registered with the DerivativeRodeo::StorageLocations::BaseLocation.



35
# File 'app/services/iiif_print/derivative_rodeo_service.rb', line 35

class_attribute :preprocessed_location_adapter_name, default: 's3'

Class Method Details

.derivative_rodeo_preprocessed_directory_for(file_set:, filename:) ⇒ String, NilClass

Note:

You may find yourself wanting to override this method. Please do if you find a better way to do this.

By convention, we’re putting the files of a work in a “directory” that is based on some identifying value (e.g. an object’s AARK ID) of the work.

Because we split PDFs (see SplitPdfs::DerivativeRodeoSplitter we need to consider that we may be working on the PDF (and that FileSet is directly associated with the work) or we are working on one of the pages ripped from the PDF (and the FileSet’s work is a to be related child work of the original work).

rubocop:disable Metrics/MethodLength

Parameters:

  • file_set (FileSet)
  • filename (String)

Returns:

  • (String)

    the dirname (without any “/” we hope)

  • (NilClass)

    when we cannot infer a URI from the object.



203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
# File 'app/services/iiif_print/derivative_rodeo_service.rb', line 203

def self.derivative_rodeo_preprocessed_directory_for(file_set:, filename:)
  ancestor, ancestor_type = get_ancestor(filename: filename, file_set: file_set)

  # Why might we not have an ancestor?  In the case of grandparent_for, we may not yet have run
  # the create relationships job.  We could sneak a peak in the table to maybe glean some insight.
  # However, read further the `else` clause to see the novel approach.
  # rubocop:disable Style/GuardClause
  if ancestor
    message = "#{self.class}.#{__method__} #{file_set.class} ID=#{file_set.id} and filename: #{filename.inspect}" \
              "has #{ancestor_type} of #{ancestor.class} ID=#{ancestor.id}"
    Rails.logger.info(message)
    parent_work_identifier = ancestor.public_send(parent_work_identifier_property_name)
    return parent_work_identifier if parent_work_identifier.present?
    Rails.logger.warn("Expected #{ancestor.class} ID=#{ancestor.id} (#{ancestor_type} of #{file_set.class} ID=#{file_set.id}) " \
                      "to have a present #{parent_work_identifier_property_name.inspect}")
    nil
  else
    # HACK: This makes critical assumptions about how we're creating the title for the file_set;
    # but we don't have much to fall-back on.  Consider making this a configurable function.  Or
    # perhaps this entire method should be more configurable.
    # TODO: Revisit this implementation.
    candidate = file_set.title.first.split(".").first
    return candidate if candidate.present?
    nil
  end
  # rubocop:enable Style/GuardClause
end

.derivative_rodeo_uri(file_set:, filename: nil, extension: nil, adapter_name: preprocessed_location_adapter_name) ⇒ String, NilClass

This method encodes some existing assumptions about the URI based on implementations for Adventist. Those are reasonable assumptions but time will tell how reasonable.

By convention, this method is returning output_location of the SpaceStone::Serverless processing. We might know the original location that SpaceStone::Serverless processed, but that seems to be a tenuous assumption.

In other words, where would SpaceStone, by convention, have written the original file and by convention written that original file’s derivatives.

TODO: We also need to account for PDF splitting

rubocop:disable Metrics/MethodLength

Parameters:

  • file_set (FileSet)
  • filename (String) (defaults to: nil)
  • extension (String) (defaults to: nil)
  • adapter_name (String) (defaults to: preprocessed_location_adapter_name)

    Added as a parameter to make testing just a bit easier. See #preprocessed_location_adapter_name

Returns:

  • (String)

    when we have a possible candidate.

  • (NilClass)

    when we could not derive a candidate.



143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
# File 'app/services/iiif_print/derivative_rodeo_service.rb', line 143

def self.derivative_rodeo_uri(file_set:, filename: nil, extension: nil, adapter_name: preprocessed_location_adapter_name)
  # TODO: This is a hack that knows about the inner workings of Hydra::Works, but for
  # expendiency, I'm using it.  See
  # https://github.com/samvera/hydra-works/blob/c9b9dd0cf11de671920ba0a7161db68ccf9b7f6d/lib/hydra/works/services/add_file_to_file_set.rb#L49-L53
  filename ||= Hydra::Works::DetermineOriginalName.call(file_set.original_file)

  dirname = derivative_rodeo_preprocessed_directory_for(file_set: file_set, filename: filename)
  return nil unless dirname

  # The aforementioned filename and the following basename and extension are here to allow for
  # us to take an original file and see if we've pre-processed the derivative file.  In the
  # pre-processed derivative case, that would mean we have a different extension than the
  # original.
  extension ||= File.extname(filename)
  extension = ".#{extension}" unless extension.start_with?(".")

  # We want to strip off the extension of the given filename.
  basename = File.basename(filename, File.extname(filename))

  # TODO: What kinds of exceptions might we raise if the location is not configured?  Do we need
  # to "validate" it in another step.
  location = DerivativeRodeo::StorageLocations::BaseLocation.load_location(adapter_name)

  File.join(location.adapter_prefix, dirname, "#{basename}#{extension}")
end

.get_ancestor(filename: nil, file_set:) ⇒ Object

Figure out the ancestor type and ancestor



174
175
176
177
178
179
180
181
182
# File 'app/services/iiif_print/derivative_rodeo_service.rb', line 174

def self.get_ancestor(filename: nil, file_set:)
  # In the case of a page split from a PDF, we need to know the grandparent's identifier to
  # find the file(s) in the DerivativeRodeo.
  if DerivativeRodeo::Generators::PdfSplitGenerator.filename_for_a_derived_page_from_a_pdf?(filename: filename)
    [IiifPrint.grandparent_for(file_set), :grandparent]
  else
    [IiifPrint.parent_for(file_set), :parent]
  end
end

Instance Method Details

#cleanup_derivativesObject

Note:

Due to the configurability and plasticity of the named derivatives, it is possible that when we created the derivatives, we had a different configuration (e.g. were we to create derivatives again, we might get a set of different files). So we must ask ourselves, is it important to clean up all derivatives (even ones that may not be in scope for this service) or to clean up only those presently in scope? I am favoring removing all of them. In part because of the nature of the valid derivative service.

We need to clean up the derivatives that we created.



287
288
289
290
291
292
293
294
295
296
297
298
299
300
# File 'app/services/iiif_print/derivative_rodeo_service.rb', line 287

def cleanup_derivatives
  ## Were we to only delete the derivatives that this service presently creates, this would be
  ## that code:
  #
  # named_derivatives_and_generators.keys.each do |named_derivative|
  #   path = absolute_derivative_path_for(named_derivative)
  #   FileUtils.rm_f(path) if File.exist?(path)
  # end

  ## Instead, let's clean it all up.
  Hyrax::DerivativePath.derivatives_for_reference(file_set).each do |path|
    FileUtils.rm_f(path) if File.exist?(path)
  end
end

#create_derivatives(filename) ⇒ Object

Note:

We write derivatives to the #absolute_derivative_path_for and should likewise clean them up when deleted.

The file_set.class.*_mime_types are carried over from Hyrax.



265
266
267
268
269
270
271
272
273
274
275
# File 'app/services/iiif_print/derivative_rodeo_service.rb', line 265

def create_derivatives(filename)
  named_derivatives_and_generators_filter
    .call(file_set: file_set, filename: filename, named_derivatives_and_generators: named_derivatives_and_generators)
    .flat_map do |named_derivative, generator_name|
    lasso_up_some_derivatives(
      named_derivative: named_derivative,
      generator_name: generator_name,
      filename: filename
    )
  end
end

#named_derivatives_and_generatorsHash<Symbol,String] The named derivative types and their corresponding generators.

Returns Hash<Symbol,String] The named derivative types and their corresponding generators.

Returns:

  • (Hash<Symbol,String] The named derivative types and their corresponding generators.)

    Hash<Symbol,String] The named derivative types and their corresponding generators.

Raises:

See Also:



110
111
112
113
114
115
116
117
118
119
# File 'app/services/iiif_print/derivative_rodeo_service.rb', line 110

def named_derivatives_and_generators
  @named_derivatives_and_generators ||=
    if file_set.class.pdf_mime_types.include?(mime_type)
      named_derivatives_and_generators_by_type.fetch(:pdf).deep_dup
    elsif file_set.class.image_mime_types.include?(mime_type)
      named_derivatives_and_generators_by_type.fetch(:image).deep_dup
    else
      raise UnexpectedMimeTypeError.new(file_set: file_set, mime_type: mime_type)
    end
end

#valid?Boolean

Returns:

  • (Boolean)

See Also:



242
243
244
245
246
247
248
249
250
# File 'app/services/iiif_print/derivative_rodeo_service.rb', line 242

def valid?
  if in_the_rodeo?
    Rails.logger.info("Using the DerivativeRodeo for FileSet ID=#{file_set.id} with mime_type of #{mime_type}")
    true
  else
    Rails.logger.info("Skipping the DerivativeRodeo for FileSet ID=#{file_set.id} with mime_type of #{mime_type}")
    false
  end
end