Module: Export::Coldp::Files::Taxon

Defined in:
lib/export/coldp/files/taxon.rb

Overview

Concepts not mapped:

`namePhrase` - e.g. `sensu lato` this would come from OTU#name

Notes

  • ColDP importer has a normalizing step that recognizes some names no longer point to any OTU

  • CoLDP can not handle assertions that a name that is currently treated as (invalid) was useds as a name (valid) for previously valid concept, i.e. CoL does not track alternative past concept heirarchies

TODO: create map of all possible CoLDP used IRIs and ability to populate project with them automatically

Constant Summary collapse

IRI_MAP =
{
  extinct: 'https://api.checklistbank.org/datapackage#Taxon.extinct',                         # 1,0
  temporal_range_end: 'https://api.checklistbank.org/datapackage#Taxon.temporal_range_end',   # from https://api.checklistbank.org/vocab/geotime
  temporal_range_start: 'https://api.checklistbank.org/datapackage#Taxon.temporal_range_end', # from https://api.checklistbank.org/vocab/geotime
  lifezone: 'https://api.checklistbank.org/datapackage#Taxon.lifezone',                       # from https://api.checklistbank.org/vocab/lifezone
}
SKIPPED_RANKS =
%w{
  NomenclaturalRank::Iczn::SpeciesGroup::Superspecies
  NomenclaturalRank::Iczn::SpeciesGroup::Supersuperspecies
}

Class Method Summary collapse

Class Method Details

.according_to_date(otu) ⇒ Object

Potentially reference

Confidence level
   confidence_validated_at (last time this confidence level was deemed OK)


86
87
88
89
90
91
# File 'lib/export/coldp/files/taxon.rb', line 86

def self.according_to_date(otu)
  # a) Dynamic - !! most recent updated_at stamp for *any* OTU tied data -> this is a big grind: if so add cached_touched_on_date to Otu
  # b) modify Confidence level to include date
  # c) review what SFs does in their model
  nil
end

.according_to_id(otu) ⇒ Object

A reference to the publication of the person who established the taxonomic concept

TW has a plurality of sources that reference this concept, it's a straightforward map
It is somewhat unclear how/whether CoL will use this concept


79
80
81
# File 'lib/export/coldp/files/taxon.rb', line 79

def self.according_to_id(otu)
  nil
end

.generate(otus, project_members, root_otu_id = nil, reference_csv = nil, prefer_unlabelled_otus: true) ⇒ Object



113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
# File 'lib/export/coldp/files/taxon.rb', line 113

def self.generate(otus, project_members, root_otu_id = nil, reference_csv = nil, prefer_unlabelled_otus: true)

  # Until we have RC5 articulations we are simplifying handling the fact
  # that one taxon name can be used for many OTUs. Track to see that
  # an OTU with a given taxon name does not already exist
  #   `taxon_name_id: nil`  - uniquify via Ruby hash keys
  observed_taxon_name_ids = { }

  # TODO: optional Taxon.alternativeID field allows inclusion of external identifiers: https://github.com/CatalogueOfLife/coldp#alternativeid-1 https://github.com/CatalogueOfLife/coldp#identifiers
  #   e.g., gbif:2704179,col:6W3C4,BOLD:AAJ2287,wikidata:Q157571

  CSV.generate(col_sep: "\t") do |csv|

    csv << %w{
      ID
      parentID
      nameID
      namePhrase
      provisional
      accordingToID
      scrutinizer
      scrutinizerID
      scrutinizerDate
      referenceID
      extinct
      temporalRangeStart
      temporalRangeEnd
      environment
      link
      remarks
      modified
      modifiedBy
    }

    taxon_remarks_vocab_id = Predicate.find_by(uri: 'https://github.com/catalogueoflife/coldp#Taxon.remarks',
                                               project_id: otus[0]&.project_id)&.id
    name_phrase_vocab_id = Predicate.find_by(uri: 'https://github.com/catalogueoflife/coldp#Taxon.namePhrase',
                                               project_id: otus[0]&.project_id)&.id

    otus.each do |o|
      # !! When a name is a synonmy (combination), but that combination has no OTU
      # !! then the parent of the name in the taxon table is nil
      # !! Handle this edge case (probably resolved now)

      # TODO: alter way parent is set to conform to CoLDP status
      #   For OTUs with combinations we might have to change the parenthood?!

      parent_id = nil
      if root_otu_id != o.id
        if pid = o.parent_otu_id(skip_ranks: SKIPPED_RANKS, prefer_unlabelled_otus: prefer_unlabelled_otus)
          parent_id = pid
        else
          puts 'WARNING no parent!!'
          # there is no OTU parent for the hierarchy, at present we just flat skip this OTU
          # Curators can use the create OTUs for valid ids to resolve this data issue
          next
        end
      end

      # TODO: This was excluding OTUs that were being excluded downstream previously
      # This should never happen now since parent ambiguity is caught above!
      # can be removed in theory
      # TODO: remove once RC5 better modelled
      next if observed_taxon_name_ids[o.taxon_name_id]
      observed_taxon_name_ids[o.taxon_name_id] = nil

      # TODO: Use o.coordinate_otus to summarize accross different instances of the OTU

      sources = o.sources
      source = o.source

      parent_id = (root_otu_id == o.id ? nil : parent_id )

      csv << [
        o.id,                                                                # ID (Taxon)
        parent_id,                                                           # parentID (Taxon)
        o.taxon_name.id,                                                     # nameID (Name)
        name_phrase(o, name_phrase_vocab_id),                                # namePhrase
        provisional(o),                                                      # provisional
        according_to_id(o),                                                  # accordingToID
        scrutinizer(o),                                                      # scrutinizer
        scrutinizer_id(o),                                                   # scrutinizerID
        scrutinizer_date(o),                                                 # scrutizinerDate
        reference_id(sources),                                               # referenceID
        predicate_value(o, :extinct),                                        # extinct
        predicate_value(o, :temporal_range_start),                           # temporalRangeStart
        predicate_value(o, :temporal_range_end),                             # temporalRangeEnd
        predicate_value(o, :lifezone),                                       # environment (formerly named lifezone)
        link(o),                                                             # link
        Export::Coldp.sanitize_remarks(remarks(o, taxon_remarks_vocab_id)),  # remarks
        Export::Coldp.modified(o[:updated_at]),                              # modified
        Export::Coldp.modified_by(o[:updated_by_id], project_members)        # modifiedBy
      ]

      Export::Coldp::Files::Reference.add_reference_rows(sources, reference_csv, project_members) if reference_csv
    end
  end
end


93
94
95
# File 'lib/export/coldp/files/taxon.rb', line 93

def self.link(otu)
  # API or public interface
end

.name_phrase(otu, vocab_id) ⇒ Object

Name phrase is for appended phrases like senso stricto and senso lato



48
49
50
51
52
53
# File 'lib/export/coldp/files/taxon.rb', line 48

def self.name_phrase(otu, vocab_id)
  da = DataAttribute.find_by(type: 'InternalAttribute',
                             controlled_vocabulary_term_id: vocab_id,
                             attribute_subject_id: otu.id)
  da&.value
end

.predicate_value(otu, predicate) ⇒ Object

Parameters:

  • predicate (:symbol)

    a key from IRI_MAP



27
28
29
30
# File 'lib/export/coldp/files/taxon.rb', line 27

def self.predicate_value(otu, predicate)
  return nil unless IRI_MAP[predicate]
  otu.data_attributes.joins(:predicate).where(controlled_vocabulary_terms: {uri: IRI_MAP[predicate]}).first&.value
end

.provisional(otu) ⇒ Object

return [Boolean, nil]

TODO - reason in TW this is provisional name


34
35
36
37
38
39
40
41
42
43
44
# File 'lib/export/coldp/files/taxon.rb', line 34

def self.provisional(otu)
  # nomen dubium
  # incertae sedis
  # unresolved homonym, without replacement
  #
  #
  #
  # * if two OTUs for same name are in OTU set then both have to be provisional
  # * missaplication (?)
  nil
end

.reference_id(sources) ⇒ Object

“supporting the taxonomic concept” Potentially- all other Citations tied to Otu, what exactly supports a concept?



107
108
109
110
111
# File 'lib/export/coldp/files/taxon.rb', line 107

def self.reference_id(sources)
  i = sources.pluck(:id)
  return i.join(',') if i.any?
  nil
end

.remarks(otu, taxon_remarks_vocab) ⇒ Object



97
98
99
100
101
102
103
# File 'lib/export/coldp/files/taxon.rb', line 97

def self.remarks(otu, taxon_remarks_vocab)
  if otu.data_attributes.where(controlled_vocabulary_term_id: taxon_remarks_vocab).any?
    otu.data_attributes.where(controlled_vocabulary_term_id: taxon_remarks_vocab).pluck(:value).join('|')
  else
    nil
  end
end

.scrutinizer(otu) ⇒ Object

The scrutinizer concept is unused at present We’re looking for the canonical implementation of it before we implement/extrapolate from data here.

* crawl attribution for inference on higher/lower
* UI/methods to assign/spam/visualize throught
* project preference (!! should project preferences has reference ids? !!)

according to is the curator responsible for this OTU, comma delimited list of curators We could also look at time-stamp data to detect “staleness” of an OTU concept



63
64
65
# File 'lib/export/coldp/files/taxon.rb', line 63

def self.scrutinizer(otu)
  nil
end

.scrutinizer_date(otu) ⇒ Object



72
73
74
# File 'lib/export/coldp/files/taxon.rb', line 72

def self.scrutinizer_date(otu)
  nil
end

.scrutinizer_id(otu) ⇒ Object

ORCID version of above



68
69
70
# File 'lib/export/coldp/files/taxon.rb', line 68

def self.scrutinizer_id(otu)
  nil
end