Class: Darlingtonia::HyraxRecordImporter

Inherits:
RecordImporter show all
Defined in:
lib/darlingtonia/hyrax_record_importer.rb

Constant Summary collapse

DEFAULT_CREATOR_KEY =

TODO: Get this from Hyrax config

'[email protected]'

Instance Attribute Summary collapse

Attributes inherited from RecordImporter

#error_stream, #info_stream

Instance Method Summary collapse

Constructor Details

#initialize(error_stream: Darlingtonia.config.default_error_stream, info_stream: Darlingtonia.config.default_info_stream, attributes: {}) ⇒ HyraxRecordImporter

Returns a new instance of HyraxRecordImporter.

Examples:

attributes: { collection_id: '123',
              depositor_id: '456',
              batch_id: '789',
              deduplication_field: 'legacy_id'
            }

Parameters:

  • attributes (Hash) (defaults to: {})

    Attributes that come from the UI or importer rather than from the CSV/mapper. These are useful for logging and tracking the output of an import job for a given collection, user, or batch. If a deduplication_field is provided, the system will look for existing works with that field and matching value and will update the record instead of creating a new record.



47
48
49
50
51
52
53
54
55
56
57
# File 'lib/darlingtonia/hyrax_record_importer.rb', line 47

def initialize(error_stream: Darlingtonia.config.default_error_stream,
               info_stream: Darlingtonia.config.default_info_stream,
               attributes: {})
  self.collection_id = attributes[:collection_id]
  self.batch_id = attributes[:batch_id]
  self.deduplication_field = attributes[:deduplication_field]
  set_depositor(attributes[:depositor_id])
  @success_count = 0
  @failure_count = 0
  super(error_stream: error_stream, info_stream: info_stream)
end

Instance Attribute Details

#batch_idString

Returns an id number associated with the process that kicked off this import run.

Returns:

  • (String)

    an id number associated with the process that kicked off this import run



18
19
20
# File 'lib/darlingtonia/hyrax_record_importer.rb', line 18

def batch_id
  @batch_id
end

#collection_idString

Returns The fedora ID for a Collection.

Returns:

  • (String)

    The fedora ID for a Collection.



14
15
16
# File 'lib/darlingtonia/hyrax_record_importer.rb', line 14

def collection_id
  @collection_id
end

#deduplication_fieldString

and update the metadata instead of creating a new record. This will NOT re-import file attachments.

Returns:

  • (String)

    if this is set, look for records with a match in this field



23
24
25
# File 'lib/darlingtonia/hyrax_record_importer.rb', line 23

def deduplication_field
  @deduplication_field
end

#depositorUser

Returns:

  • (User)


10
11
12
# File 'lib/darlingtonia/hyrax_record_importer.rb', line 10

def depositor
  @depositor
end

#failure_countString

Returns the number of records this importer has failed to create.

Returns:

  • (String)

    the number of records this importer has failed to create



31
32
33
# File 'lib/darlingtonia/hyrax_record_importer.rb', line 31

def failure_count
  @failure_count
end

#success_countString

Returns the number of records this importer has successfully created.

Returns:

  • (String)

    the number of records this importer has successfully created



27
28
29
# File 'lib/darlingtonia/hyrax_record_importer.rb', line 27

def success_count
  @success_count
end

Instance Method Details

#based_near_attributes(based_near) ⇒ Hash

When submitting location data (a.k.a. the “based near” attribute) via the UI, Hyrax expects to receive a ‘based_near_attributes` hash in a specific format. We need to take geonames urls as provided by the customer and transform them to mimic what the Hyrax UI would ordinarily produce. These will get turned into Hyrax::ControlledVocabularies::Location objects upon ingest. The expected hash looks like this:

"based_near_attributes"=>
  {
    "0"=> {
            "id"=>"http://sws.geonames.org/5667009/", "_destroy"=>""
          },
    "1"=> {
            "id"=>"http://sws.geonames.org/6252001/", "_destroy"=>""
          },
}

Returns:

  • (Hash)

    a “based_near_attributes” hash as



160
161
162
163
164
165
166
167
168
# File 'lib/darlingtonia/hyrax_record_importer.rb', line 160

def based_near_attributes(based_near)
  original_geonames_uris = based_near
  return if original_geonames_uris.empty?
  based_near_attributes = {}
  original_geonames_uris.each_with_index do |uri, i|
    based_near_attributes[i.to_s] = { 'id' => uri_to_sws(uri), "_destroy" => "" }
  end
  based_near_attributes
end

#create_upload_files(record) ⇒ Array

Create a Hyrax::UploadedFile for each file attachment TODO: What if we can’t find the file? TODO: How do we specify where the files can be found?

Parameters:

Returns:

  • (Array)

    an array of Hyrax::UploadedFile ids



116
117
118
119
120
121
122
123
124
125
126
127
128
129
# File 'lib/darlingtonia/hyrax_record_importer.rb', line 116

def create_upload_files(record)
  return unless record.mapper.respond_to?(:files)
  files_to_attach = record.mapper.files
  return [] if files_to_attach.nil? || files_to_attach.empty?

  uploaded_file_ids = []
  files_to_attach.each do |filename|
    file = File.open(find_file_path(filename))
    uploaded_file = Hyrax::UploadedFile.create(user: @depositor, file: file)
    uploaded_file_ids << uploaded_file.id
    file.close
  end
  uploaded_file_ids
end

#file_attachments_pathObject

The path on disk where file attachments can be found



107
108
109
# File 'lib/darlingtonia/hyrax_record_importer.rb', line 107

def file_attachments_path
  ENV['IMPORT_PATH'] || '/opt/data'
end

#find_existing_record(record) ⇒ ActiveFedora::Base

Search for any existing records that match on the deduplication_field

Parameters:

  • record (ImportRecord)

Returns:

  • (ActiveFedora::Base)


73
74
75
76
77
78
79
80
81
# File 'lib/darlingtonia/hyrax_record_importer.rb', line 73

def find_existing_record(record)
  return unless deduplication_field
  return unless record.respond_to?(deduplication_field)
  return if record.mapper.send(deduplication_field).nil?
  return if record.mapper.send(deduplication_field).empty?
  existing_records = import_type.where("#{deduplication_field}": record.mapper.send(deduplication_field).to_s)
  raise "More than one record matches deduplication_field #{deduplication_field} with value #{record.mapper.send(deduplication_field)}" if existing_records.count > 1
  existing_records&.first
end

#find_file_path(filename) ⇒ String

Within the directory specified by ENV, find the first instance of a file matching the given filename. If there is no matching file, raise an exception.

Parameters:

  • filename (String)

Returns:

  • (String)

    a full pathname to the found file



137
138
139
140
141
# File 'lib/darlingtonia/hyrax_record_importer.rb', line 137

def find_file_path(filename)
  filepath = Dir.glob("#{ENV['IMPORT_PATH']}/**/#{filename}").first
  raise "Cannot find file #{filename}... Are you sure it has been uploaded and that the filename matches?" if filepath.nil?
  filepath
end

#import(record:) ⇒ void

This method returns an undefined value.

Parameters:

  • record (ImportRecord)


87
88
89
90
91
92
93
94
95
96
# File 'lib/darlingtonia/hyrax_record_importer.rb', line 87

def import(record:)
  existing_record = find_existing_record(record)
  create_for(record: record) unless existing_record
  update_for(existing_record: existing_record, update_record: record) if existing_record
rescue Faraday::ConnectionFailed, Ldp::HttpError => e
  error_stream << e
rescue RuntimeError => e
  error_stream << e
  raise e
end

#import_typeObject

TODO: You should be able to specify the import type in the import



99
100
101
102
103
104
# File 'lib/darlingtonia/hyrax_record_importer.rb', line 99

def import_type
  raise 'No curation_concern found for import' unless
    defined?(Hyrax) && Hyrax&.config&.curation_concerns&.any?

  Hyrax.config.curation_concerns.first
end

#set_depositor(user_key) ⇒ Object

“depositor” is a required field for Hyrax. If it hasn’t been set, set it to the Hyrax default batch user.



62
63
64
65
66
67
# File 'lib/darlingtonia/hyrax_record_importer.rb', line 62

def set_depositor(user_key)
  user = ::User.find_by_user_key(user_key) if user_key
  user ||= ::User.find(user_key) if user_key
  user ||= ::User.find_or_create_system_user(DEFAULT_CREATOR_KEY)
  self.depositor = user
end

#uri_to_sws(uri) ⇒ String

Take a user-facing geonames URI and return an sws URI, of the form Hyrax expects (e.g., “sws.geonames.org/6252001/”)

Parameters:

  • uri (String)

Returns:

  • (String)

    an sws style geonames uri



175
176
177
178
179
# File 'lib/darlingtonia/hyrax_record_importer.rb', line 175

def uri_to_sws(uri)
  uri = URI(uri)
  geonames_number = uri.path.split('/')[1]
  "http://sws.geonames.org/#{geonames_number}/"
end