Class: Krikri::Enrichments::LanguageToLexvo

Inherits:
Object
  • Object
show all
Includes:
Audumbla::FieldEnrichment
Defined in:
lib/krikri/enrichments/language_to_lexvo.rb

Overview

Converts text fields and/or providedLabels to ISO 639-3 URIs (Lexvo)

Transforms text values matching either (English) labels or language codes from Lexvo into DPLA::MAP::Controlled::Language resources with skos:exactMatch of the appropriate Lexvo URIs. Original string values are retained as dpla:providedLabel.

Currently suppports langague codes in ISO 639-3, but may be extended to matchother two and three letter codes in Lexvo with ISO 639-3 URIs.

If no matches are found, returns a bnode with the input value as providedLabel.

If passed an ActiveTriples::Resource, the enrichment will:

- Perform the above text matching on any present `providedLabel`s,
  returning the original node if no results are found.  If multiple
  values are provided and multiple matches found, they will be
  deduplicated.
- Leave DPLA::MAP::Controlled::Language objects that are not bnodes
  unaltered.
- Remove any values which are not either bnodes or members of
  DPLA::MAP::Controlled::Language.

Label matches are cached within the enrichment instance,

Examples:

matching string values

iso = LanguageToLexvo.new
iso.enrich_value('fin')     # matches 'http://lexvo.org/id/iso639-3/fin'
iso.enrich_value('finnish') # matches 'http://lexvo.org/id/iso639-3/fin'
iso.enrich_value('eng')     # matches 'http://lexvo.org/id/iso639-3/eng'
iso.enrich_value('english') # matches 'http://lexvo.org/id/iso639-3/eng'
iso.enrich_value('English') # matches 'http://lexvo.org/id/iso639-3/eng'

matching node values

iso = LanguageToLexvo.new
lang = DPLA::MAP::Controlled::Language.new
lang.providedLabel = 'eng'
iso.enrich_value(lang)     # matches ['http://lexvo.org/id/iso639-3/fin']
lang.providedLabel = 'fin', 'eng'
iso.enrich_value(lang)     # matches ['http://lexvo.org/id/iso639-3/fin',
                           #          'http://lexvo.org/id/iso639-3/eng']

See Also:

Constant Summary collapse

TERMS =
RDF::ISO_639_3.to_a
QNAMES =
TERMS.map { |t| t.qname[1] }.freeze

Instance Method Summary collapse

Instance Method Details

#enrich_literal(label) ⇒ ActiveTriples::Resource

Runs the enrichment over a string.

Parameters:

  • label (#to_s)

    the string to match

Returns:

  • (ActiveTriples::Resource)

    a blank node with a ‘dpla:providedLabel` of `label` and a `skos:exactMatch` of the matching lexvo language, if any



106
107
108
109
110
111
112
113
114
115
116
117
118
119
# File 'lib/krikri/enrichments/language_to_lexvo.rb', line 106

def enrich_literal(label)
  node = DPLA::MAP::Controlled::Language.new()
  node.providedLabel = label

  match = match_iso(label.to_s)
  match = match_label(label.to_s) if match.node?

  # if match is still a node, we didn't find anything
  return node if match.node?

  node.exactMatch = match 
  node.prefLabel = RDF::ISO_639_3[match.rdf_subject.qname[1]].label.last
  node
end

#enrich_node(value) ⇒ Array<ActiveTriples::Resource>, ActiveTriples::Resource

Runs the enrichment over a specific node, accepting an ‘ActiveTriples::Resource` with a provided label and returning a new node with a lexvo match.

Parameters:

  • value (ActiveTriples::Resource)

    a resource with a ‘dpla:providedLabel`

Returns:

  • (Array<ActiveTriples::Resource>, ActiveTriples::Resource)


93
94
95
96
97
# File 'lib/krikri/enrichments/language_to_lexvo.rb', line 93

def enrich_node(value)
  labels = value.get_values(RDF::DPLA.providedLabel)
  return value if labels.empty?
  labels.map { |label| enrich_literal(label) }
end

#enrich_value(value) ⇒ DPLA::MAP::Controlled::Language?

Runs the enrichment against a node. Can match literal values, and Language values with a provided label.

Examples:

with a matching value

lang = enrich_value('finnish')
#=> #<DPLA::MAP::Controlled::Language:0x3f(default)>
lang.providedLabel
#=> ['finnish']
lang.exactMatch.map(&:to_term)
#=> [#<RDF::Vocabulary::Term:0x9b URI:http://lexvo.org/id/iso639-3/fin>]

with no match

lang = enrich_value('moomin')
#=> #<DPLA::MAP::Controlled::Language:0x3f(default)>
lang.providedLabel
#=> ['moomin']
lang.exactMatch
#=> []

Parameters:

  • value (ActiveTriples::Resource, #to_s)

Returns:

  • (DPLA::MAP::Controlled::Language, nil)

    a resource representing the language match.



77
78
79
80
81
82
83
# File 'lib/krikri/enrichments/language_to_lexvo.rb', line 77

def enrich_value(value)
  return enrich_node(value) if value.is_a?(ActiveTriples::Resource) &&
    value.node?
  return value if value.is_a?(DPLA::MAP::Controlled::Language)
  return nil if value.is_a?(ActiveTriples::Resource)
  enrich_literal(value)
end

#match_iso(code) ⇒ DPLA::MAP::Controlled::Language

Converts string or symbol for a three letter language code to an ‘ActiveTriples::Resource`.

Parameters:

  • code (#to_sym)

    a three letter iso code

Returns:

  • (DPLA::MAP::Controlled::Language)


127
128
129
130
# File 'lib/krikri/enrichments/language_to_lexvo.rb', line 127

def match_iso(code)
  match = QNAMES.find { |c| c == code.downcase.to_sym }
  from_sym(match)
end

#match_label(label) ⇒ DPLA::MAP::Controlled::Language

Converts string or symbol for a language label to an ‘ActiveTriples::Resource`.

Matched values are cached in an instance variable ‘@lang_cache` to avoid multiple traversals through the vocabulary term labels.

Parameters:

  • code (#to_sym)

    a string to match against a language label

Returns:

  • (DPLA::MAP::Controlled::Language)


141
142
143
144
145
146
147
148
149
150
151
# File 'lib/krikri/enrichments/language_to_lexvo.rb', line 141

def match_label(label)
  @lang_cache ||= {}
  return @lang_cache[label] if @lang_cache.keys.include? label

  match = TERMS.find do |t|
    Array(t.label).map(&:downcase).include? label.downcase
  end
  
  # Caches and returns the the label match
  @lang_cache[label] = from_sym(match)
end