Class: IiifPrint::TextFormatsFromALTOService

Inherits:
BaseDerivativeService show all
Defined in:
lib/iiif_print/text_formats_from_alto_service.rb

Overview

Plugin to make text format derviatives (JSON, plain-text) from ALTO,

either existing derivative, or an impending attachment.
NOTE: to keep this from conflicting with TextExtractionDerivativeService,
      this class should be invoked by it, not PluggableDerivativeService.

Instance Attribute Summary

Attributes inherited from BaseDerivativeService

#file_set, #master_format

Instance Method Summary collapse

Methods inherited from BaseDerivativeService

#convert_cmd, #derivative_path_factory, #identify, #im_convert, #initialize, #jp2_convert, #jp2_to_intermediate, #load_destpath, #mime_type, #mime_type_for, #one_bit?, #prepare_path, #use_color?, #valid?

Constructor Details

This class inherits a constructor from IiifPrint::BaseDerivativeService

Instance Method Details

#altoObject



51
52
53
54
# File 'lib/iiif_print/text_formats_from_alto_service.rb', line 51

def alto
  path = alto_path
  File.read(path, encoding: 'UTF-8') unless path.nil?
end

#alto_pathObject



41
42
43
44
45
46
47
48
49
# File 'lib/iiif_print/text_formats_from_alto_service.rb', line 41

def alto_path
  # check first for existing, non-empty derivative data:
  path = derivative_path_factory.derivative_path_for_reference(
    @file_set,
    'xml'
  )
  return path if nonempty_file?(path)
  incoming_alto_path
end

#cleanup_derivatives(*args) ⇒ Object



74
75
76
77
# File 'lib/iiif_print/text_formats_from_alto_service.rb', line 74

def cleanup_derivatives(*args)
  # do nothing here; IiifPrint::TextExtractionDerivativeService
  # has this job instead for cleaning ALTO, JSON, TXT.
end

#create_derivatives(_filename) ⇒ Object



56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
# File 'lib/iiif_print/text_formats_from_alto_service.rb', line 56

def create_derivatives(_filename)
  # as this plugin makes derivatives of derivative, _filename is ignored
  source_file = alto
  return if source_file.nil?
  # Image width from characterized primary file helps ensure proper scaling:
  file = @file_set.original_file
  width = file.nil? ? nil : file.width[0].to_i
  height = file.nil? ? nil : file.height[0].to_i
  # ALTOReader is responsible for transcoding, this class just saves result
  reader = IiifPrint::TextExtraction::AltoReader.new(
    source_file,
    width,
    height
  )
  save_derivative('json', reader.json)
  save_derivative('txt', reader.text)
end

#incoming_alto_pathObject

if there was no derivative yet, there might be one in-transit from

an ingest, so check for that, and use its source if applicable:


33
34
35
36
37
38
39
# File 'lib/iiif_print/text_formats_from_alto_service.rb', line 33

def incoming_alto_path
  path = IiifPrint::DerivativeAttachment.where(
    fileset_id: @file_set.id,
    destination_name: 'xml'
  ).pluck(:path).uniq.first
  path if nonempty_file?(path)
end

#nonempty_file?(path) ⇒ Boolean

Returns:

  • (Boolean)


25
26
27
28
29
# File 'lib/iiif_print/text_formats_from_alto_service.rb', line 25

def nonempty_file?(path)
  return false if path.nil?
  return false unless File.exist?(path)
  !File.size(path).zero?
end

#save_derivative(destination, data) ⇒ Object



9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# File 'lib/iiif_print/text_formats_from_alto_service.rb', line 9

def save_derivative(destination, data)
  mime_type = mime_type_for(destination)
  # Load/prepare base of "pairtree" dir structure for extension, fileset
  prepare_path(destination)
  #
  save_path = derivative_path_factory.derivative_path_for_reference(
    @file_set,
    destination
  )
  # Write data as UTF-8 encoded text
  File.open(save_path, "w:UTF-8") do |f|
    f.write(data)
    IiifPrint.copy_derivatives_from_data_store(stream: data, directives: { url: file_set.id.to_s, container: 'extracted_text', mime_type: mime_type })
  end
end