Module: Export::Dwca
- Defined in:
- lib/export/dwca.rb,
lib/export/dwca/data.rb
Defined Under Namespace
Modules: GbifProfile Classes: Data
Constant Summary collapse
- INDEX_VERSION =
Version is a way to track dates where the indexing changed significantly such that all or most of the index should be regenerated. To add a version use ‘Time.now` via IRB
[ '2021-10-12 17:00:00.000000 -0500', # First major refactor '2021-10-15 17:00:00.000000 -0500', # Minor Excludes footprintWKT, and references to GeographicArea in gazetteer; new form of media links '2021-11-04 17:00:00.000000 -0500', # Minor Removes '|', fixes some mappings '2021-11-08 13:00:00.000000 -0500', # PENDING: Minor Adds depth mappings '2021-11-30 13:00:00.000000 -0500', # Fix inverted long,lat '2022-01-21 16:30:00.000000 -0500', # basisOfRecord can now be FossilSpecimen; occurrenceId exporting; adds redundant time fields '2022-03-31 16:30:00.000000 -0500', # collectionCode, occurrenceRemarks and various small fixes '2022-04-28 16:30:00.000000 -0500', # add dwcOccurrenceStatus '2022-09-28 16:30:00.000000 -0500', # add phylum, class, order, higherClassification '2023-04-03 16:30:00.000000 -0500', # add associatedTaxa; updating InternalAttributes is now reflected in index '2023-12-14 16:30:00.000000 -0500' # add verbatimLabel ].freeze
Class Method Summary collapse
-
.build_index_async(klass, record_scope, predicate_extensions: {}) ⇒ Object
When we re-index a large set of data then we run it in the background.
-
.download_async(record_scope, request = nil, extension_scopes: {}, predicate_extensions: {}) ⇒ Download
The download object containing the archive.
- .index_metadata(klass, record_scope) ⇒ Object
Class Method Details
.build_index_async(klass, record_scope, predicate_extensions: {}) ⇒ Object
When we re-index a large set of data then we run it in the background. To determine when it is done we poll by the last record to be indexed.
69 70 71 72 73 |
# File 'lib/export/dwca.rb', line 69 def self.build_index_async(klass, record_scope, predicate_extensions: {} ) s = record_scope.order(:id) ::DwcaCreateIndexJob.perform_later(klass.to_s, sql_scope: s.to_sql) (klass, s) end |
.download_async(record_scope, request = nil, extension_scopes: {}, predicate_extensions: {}) ⇒ Download
Returns the download object containing the archive.
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
# File 'lib/export/dwca.rb', line 35 def self.download_async(record_scope, request = nil, extension_scopes: {}, predicate_extensions: {}) name = "dwc-a_#{DateTime.now}.zip" download = ::Download::DwcArchive.create!( name: "DwC Archive generated at #{Time.now.utc}.", description: 'A Darwin Core archive.', filename: name, request:, expires: 2.days.from_now, total_records: record_scope.size # Was haveing problems with count() TODO: increment after when extensions are allowed. ) # Note we pass a string with the record scope ::DwcaCreateDownloadJob.perform_later( download, core_scope: record_scope.to_sql, extension_scopes:, predicate_extensions:, ) download end |
.index_metadata(klass, record_scope) ⇒ Object
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
# File 'lib/export/dwca.rb', line 75 def self.(klass, record_scope) a = record_scope.first&.to_global_id&.to_s # TODO: this should be UUID? b = record_scope.last&.to_global_id&.to_s # TODO: this should be UUID? t = record_scope.size # was having problems with count = { total: t, start_time: Time.zone.now, sample: [a, b].compact } if b && (t > 2) max = 9 max = t if t < 9 ids = klass .select('*') .from("(select id, type, ROW_NUMBER() OVER (ORDER BY id ASC) rn from (#{record_scope.to_sql}) b ) a") .where("a.rn % ((SELECT COUNT(*) FROM (#{record_scope.to_sql}) c) / #{max}) = 0") .limit(max) .collect{|o| o.to_global_id.to_s} [:sample].insert(1, *ids) end [:sample].uniq! end |