Class: Uc3DmpId::Comparator
- Inherits:
-
Object
- Object
- Uc3DmpId::Comparator
- Defined in:
- lib/uc3-dmp-id/comparator.rb
Overview
Class that compares incoming data from an external source to the DMP It determines if they are likely related and applies a confidence rating
Constant Summary collapse
- MSG_MISSING_DMPS =
'No DMPs were defined. Expected an Array of OpenSearch documents!'- STOP_WORDS =
%w[a an and if of or the then they].freeze
Instance Attribute Summary collapse
-
#dmps ⇒ Object
See the bottom of this file for a hard-coded crosswalk between Crossref funder ids and ROR ids Some APIs do not support ROR fully for funder ids, so we need to be able to reference both.
-
#logger ⇒ Object
See the bottom of this file for a hard-coded crosswalk between Crossref funder ids and ROR ids Some APIs do not support ROR fully for funder ids, so we need to be able to reference both.
Instance Method Summary collapse
-
#compare(hash:) ⇒ Object
Compare the incoming hash with the DMP details that were gathered during initialization.
-
#initialize(**args) ⇒ Comparator
constructor
Expecting an Array of OpenSearch documents as :dmps in the :args.
Constructor Details
#initialize(**args) ⇒ Comparator
Expecting an Array of OpenSearch documents as :dmps in the :args
22 23 24 25 26 27 28 29 30 |
# File 'lib/uc3-dmp-id/comparator.rb', line 22 def initialize(**args) @logger = args[:logger] @details_hash = {} @dmps = args.fetch(:dmps, []) @logger&.debug(message: 'Comparator DMPs', details: @dmps) raise ComparatorError, MSG_MISSING_DMPS if @dmps.empty? end |
Instance Attribute Details
#dmps ⇒ Object
See the bottom of this file for a hard-coded crosswalk between Crossref funder ids and ROR ids Some APIs do not support ROR fully for funder ids, so we need to be able to reference both
19 20 21 |
# File 'lib/uc3-dmp-id/comparator.rb', line 19 def dmps @dmps end |
#logger ⇒ Object
See the bottom of this file for a hard-coded crosswalk between Crossref funder ids and ROR ids Some APIs do not support ROR fully for funder ids, so we need to be able to reference both
19 20 21 |
# File 'lib/uc3-dmp-id/comparator.rb', line 19 def logger @logger end |
Instance Method Details
#compare(hash:) ⇒ Object
Compare the incoming hash with the DMP details that were gathered during initialization.
The incoming Hash should match the documents found in OpenSearch. For example:
"people": ["john doe", "[email protected]"],
"people_ids": ["https://orcid.org/0000-0000-0000-ZZZZ"],
"affiliations": ["example college"],
"affiliation_ids": ["https://ror.org/00000zzzz"],
"funder_ids": ["https://doi.org/10.13039/00000000000"],
"funders": ["example funder (example.gov)"],
"funder_opportunity_ids": ["485yt8325ty"],
"grant_ids": [],
"funding_status": "planned",
"dmp_id": "doi.org/11.22222/A1B2c3po",
"title": "example data management plan",
"visibility": "private",
"featured": 0,
"description": "the example project abstract",
"project_start": "2022-01-03",
"project_end": "2024-12-23",
"created": "2023-08-07",
"modified": "2023-08-07",
"registered": "2023-08-07"
rubocop:disable Metrics/AbcSize, Metrics/CyclomaticComplexity, Metrics/PerceivedComplexity
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
# File 'lib/uc3-dmp-id/comparator.rb', line 57 def compare(hash:) scoring = [] return scoring unless hash.is_a?(Hash) && !hash['title'].nil? @dmps.each do |dmp| @logger&.debug(message: 'Incoming external work', details: hash) # Compare the grant ids. If we have a match return the response immediately since that is # a very positive match! response = { confidence: 'None', score: 0, notes: [] } response = _grants_match?(array: hash.fetch('grant_ids', []), dmp:, response:) scoring << response if response[:confidence] != 'None' next if response[:confidence] != 'None' # Compare the people involved, their affiliations and any funding opportunity numbers response = _opportunities_match?(array: hash.fetch('funder_opportunity_ids', []), dmp:, response:) response = _orcids_match?(array: hash.fetch('people_ids', []), dmp:, response:) response = _last_name_match?(hash:, dmp:, response:) response = _affiliation_match?(hash:, dmp:, response:) # Only process the following if we had some matching people, affiliations or opportunity nbrs response = _repository_match?(hash:, dmp:, response:) if response[:score].positive? response = _text_match?(type: 'title', text: hash['title'], dmp:, response:) if response[:score].positive? response = _text_match?(type: 'abstract', text: hash['description'], dmp:, response:) if response[:score].positive? # If the score is less than 3 then we have no confidence that it is a match # next if response[:score] <= 2 # Set the confidence level based on the score response[:dmp_id] = "DMP##{dmp['dmp_id']}" response[:confidence] = if response[:score] > 10 'High' else (response[:score] > 5 ? 'Medium' : 'Low') end @logger&.debug(message: "Found a match!", details: { dmp: dmp, analysis: response }) scoring << response end # TODO: introduce a tie-breaker here (maybe the closes to the project_end date) scoring.compact.sort { |a, b| b[:score] <=> a[:score] }&.first end |