Class: Stanford::Mods::Record

Inherits:

Mods::Record

Object
Mods::Record
Stanford::Mods::Record

show all

Defined in:: lib/stanford-mods/geo_spatial.rb,
lib/stanford-mods.rb,
lib/stanford-mods/name.rb,
lib/stanford-mods/origin_info.rb,
lib/stanford-mods/searchworks.rb,
lib/stanford-mods/physical_location.rb,
lib/stanford-mods/searchworks_subjects.rb

Overview

NON-SearchWorks specific wranglings of MODS cartographics metadata

Constant Summary collapse

COLLECTOR_ROLE_URI =

'http://id.loc.gov/vocabulary/relators/col'

Class Method Summary collapse

.date_is_approximate?(date_element) ⇒ Boolean

NOTE: legal values for MODS date elements with attribute qualifier are ‘approximate’, ‘inferred’ or ‘questionable’.
.earliest_year_int(date_el_array) ⇒ Object

get earliest parseable year (as an Integer) from the passed date elements.
.earliest_year_str(date_el_array) ⇒ Object

get earliest parseable year (as a String) from the passed date elements.
.keyDate(elements) ⇒ Nokogiri::XML::Element^?

given a set of date elements, return the single element with attribute keyDate=“yes” or return nil if no elements have attribute keyDate=“yes”, or if multiple elements have keyDate=“yes”.
.remove_approximate(nodeset) ⇒ Array<Nokogiri::XML::Element>

remove Elements from NodeSet if they have a qualifier attribute of ‘approximate’ or ‘questionable’.

Instance Method Summary collapse

#additional_authors_w_dates ⇒ Object

all names, in display form, except the main_author names will be the display_value_w_date form see Mods::Record.name in nom_terminology for details on the display_value algorithm.
#box ⇒ Object

return box number (note: single valued and might be something like 35A) data in location/physicalLocation or in relatedItem/location/physicalLocation so use _location to get the data from either one of them TODO: should it be hierarchical series/box/folder?.
#catkey ⇒ String

Value with the numeric catkey in it, or nil if none exists.
#collectors_w_dates ⇒ Object

Array of Strings, each containing the computed display value of a personal name with the role of Collector (see mods gem nom_terminology for display value algorithm).
#coordinates ⇒ Object
#date_created_elements(ignore_approximate = false) ⇒ Array<Nokogiri::XML::Element>

return /originInfo/dateCreated elements in MODS records.
#date_issued_elements(ignore_approximate = false) ⇒ Array<Nokogiri::XML::Element>

return /originInfo/dateIssued elements in MODS records.
#druid ⇒ Object
#druid=(new_druid) ⇒ Object
#era_facet ⇒ Array<String>

subject/temporal values with trailing comma, semicolon, and backslash (and any preceding spaces) removed.
#folder ⇒ Object

returns folder number (note: single valued) data in location/physicalLocation or in relatedItem/location/physicalLocation so use _location to get the data from either one of them TODO: should it be hierarchical series/box/folder?.
#format ⇒ Array[String] deprecated
Deprecated.
- kept for backwards compatibility but not part of SW UI redesign work Summer 2014
#format_main ⇒ Array[String]

select one or more format values from the controlled vocabulary per JVine Summer 2014 searchworks-solr-lb.stanford.edu:8983/solr/select?facet.field=format_main_ssim&rows=0&facet.sort=index.
#geographic_facet ⇒ Array<String>

geographic_search values with trailing comma, semicolon, and backslash (and any preceding spaces) removed.
#geographic_search ⇒ Array<String>

Values are the contents of: subject/geographic subject/hierarchicalGeographic subject/geographicCode (only include the translated value if it isn’t already present from other mods geo fields).
#includes_marc_relator_collector_role?(role_node) ⇒ Boolean

True if there is a MARC relator collector role assigned.
#location ⇒ Object

return entire contents of physicalLocation (note: single valued) but only if it has series, accession, box or folder data data in location/physicalLocation or in relatedItem/location/physicalLocation so use _location to get the data from either one of them TODO: should it be hierarchical series/box/folder?.
#main_author_w_date ⇒ String

the first encountered <mods><name> element with marcrelator flavor role of ‘Creator’ or ‘Author’.
#main_author_w_date_test ⇒ Object
#non_collector_person_authors ⇒ Object

FIXME: this is broken if there are multiple role codes and some of them are not marcrelator.
#place ⇒ Object

—- old date parsing methods used downstream of gem; will be deprecated/replaced with new date parsing methods.
#point_bbox ⇒ Object
#pub_date_display ⇒ String

For the date display only, the first place to look is in the dates without encoding=marc array.
#pub_date_facet ⇒ Array[String]

Values for the pub date facet.
#pub_date_facet_single_value(ignore_approximate = false) ⇒ String

return a single string intended for facet use for pub date prefer dateIssued (any) before dateCreated (any) before dateCaptured (any) look for a keyDate and use it if there is one; otherwise pick earliest date.
#pub_date_sort ⇒ Object

creates a date suitable for sorting.
#pub_year_int(ignore_approximate = false) ⇒ Integer

return pub year as an Integer prefer dateIssued (any) before dateCreated (any) before dateCaptured (any) look for a keyDate and use it if there is one; otherwise pick earliest date.
#pub_year_sort_str(ignore_approximate = false) ⇒ String deprecated Deprecated.

use pub_year_int
#series ⇒ Object

return series/accession ‘number’ (note: single valued) data in location/physicalLocation or in relatedItem/location/physicalLocation so use _location to get the data from either one of them TODO: should it be hierarchical series/box/folder?.
#subject_all_search ⇒ Array<String>

Values are the contents of: all subject subelements except subject/cartographic plus genre top level element.
#subject_other_search ⇒ Array<String>

Values are the contents of: subject/name subject/occupation - no subelements subject/titleInfo.
#subject_other_subvy_search ⇒ Array<String>

Values are the contents of: subject/temporal subject/genre.
#sw_addl_authors ⇒ Array<String>

Values for author_7xx_search field.
#sw_addl_titles ⇒ Array<String>

this includes all titles except.
#sw_corporate_authors ⇒ Array<String>

Values for author_corp_display.
#sw_full_title ⇒ String

Value for title_245_search, title_full_display.
#sw_full_title_without_commas ⇒ Object deprecated Deprecated.

in favor of sw_title_display
#sw_genre ⇒ Array[String]

return values for the genre facet in SearchWorks.
#sw_geographic_search(sep = ' ') ⇒ Array<String>

Values are the contents of: subject/geographic subject/hierarchicalGeographic subject/geographicCode (only include the translated value if it isn’t already present from other mods geo fields).
#sw_impersonal_authors ⇒ Array<String>

return the display_value_w_date for all <mods><name> elements that do not have type=‘personal’.
#sw_language_facet ⇒ Object

include langagues known to SearchWorks; try to error correct when possible (e.g. when ISO-639 disagrees with MARC standard).
#sw_logger ⇒ Object

—- PUBLICATION (place, year) —- see origin_info.rb (as all this information comes from top level originInfo element) —- end PUBLICATION (place, year) —-.
#sw_main_author ⇒ String

Value for author_1xx_search field.
#sw_meeting_authors ⇒ Array<String>

Values for author_meeting_display.
#sw_person_authors ⇒ Array<String>

Values for author_person_facet, author_person_display.
#sw_short_title ⇒ String

Value for title_245a_search field.
#sw_sort_author ⇒ String

Returns a sortable version of the main_author: main_author + sorting title which is the mods approximation of the value created for a marc record.
#sw_sort_title ⇒ String

Returns a sortable version of the main title.
#sw_subject_names(sep = ', ') ⇒ Array<String>

Values are the contents of: subject/name/namePart “Values from namePart subelements should be concatenated in the order they appear (e.g. ”Shakespeare, William, 1564-1616“)”.
#sw_subject_titles(sep = ' ') ⇒ Array<String>

Values are the contents of: subject/titleInfo/(subelements).
#sw_title_display ⇒ String

like sw_full_title without trailing ,/;:.
#topic_facet ⇒ Array<String>

Values are the contents of: subject/topic subject/name subject/title subject/occupation with trailing comma, semicolon, and backslash (and any preceding spaces) removed.
#topic_search ⇒ Array<String>

Values are the contents of: mods/genre mods/subject/topic.
#year_facet_str(date_el_array) ⇒ String

given the passed date elements, look for a single keyDate and use it if there is one; otherwise pick earliest parseable date.
#year_int(date_el_array) ⇒ Integer

given the passed date elements, look for a single keyDate and use it if there is one; otherwise pick earliest parseable date.
#year_sort_str(date_el_array) ⇒ String

given the passed date elements, look for a single keyDate and use it if there is one; otherwise pick earliest parseable date.

Class Method Details

.date_is_approximate?(date_element) ⇒ `Boolean`

NOTE: legal values for MODS date elements with attribute qualifier are

'approximate', 'inferred' or 'questionable'

# File 'lib/stanford-mods/origin_info.rb', line 123

def self.date_is_approximate?(date_element)
  qualifier = date_element["qualifier"] if date_element.respond_to?('[]')
  qualifier == 'approximate' || qualifier == 'questionable'
end

.earliest_year_int(date_el_array) ⇒ `Object`

get earliest parseable year (as an Integer) from the passed date elements



133
134
135

# File 'lib/stanford-mods/origin_info.rb', line 133

def self.earliest_year_int(date_el_array)
  earliest_year(date_el_array, :year_int_from_date_str)
end

.earliest_year_str(date_el_array) ⇒ `Object`

get earliest parseable year (as a String) from the passed date elements



142
143
144

# File 'lib/stanford-mods/origin_info.rb', line 142

def self.earliest_year_str(date_el_array)
  earliest_year(date_el_array, :sortable_year_string_from_date_str)
end

.keyDate(elements) ⇒ `Nokogiri::XML::Element`^?

given a set of date elements, return the single element with attribute keyDate=“yes”

or return nil if no elements have attribute keyDate="yes", or if multiple elements have keyDate="yes"

# File 'lib/stanford-mods/origin_info.rb', line 105

def self.keyDate(elements)
  keyDates = elements.select { |node| node["keyDate"] == 'yes' }
  keyDates.first if keyDates.size == 1
end

.remove_approximate(nodeset) ⇒ `Array<Nokogiri::XML::Element>`

remove Elements from NodeSet if they have a qualifier attribute of ‘approximate’ or ‘questionable’



114
115
116

# File 'lib/stanford-mods/origin_info.rb', line 114

def self.remove_approximate(nodeset)
  nodeset.select { |node| node unless date_is_approximate?(node) }
end

Instance Method Details

#additional_authors_w_dates ⇒ `Object`

all names, in display form, except the main_author

names will be the display_value_w_date form
see Mods::Record.name  in nom_terminology for details on the display_value algorithm

# File 'lib/stanford-mods/name.rb', line 39

def additional_authors_w_dates
  results = []
  @mods_ng_xml.plain_name.each { |n|
    results << n.display_value_w_date
  }
  results.delete(main_author_w_date)
  results
end

#box ⇒ `Object`

return box number (note: single valued and might be something like 35A)

data in location/physicalLocation or in relatedItem/location/physicalLocation
so use _location to get the data from either one of them

TODO: should it be hierarchical series/box/folder?

# File 'lib/stanford-mods/physical_location.rb', line 13

def box
  #   _location.physicalLocation should find top level and relatedItem
  box_num = @mods_ng_xml._location.physicalLocation.map do |node|
    val = node.text
    # note that this will also find Flatbox or Flat-box
    match_data = val.match(/Box ?:? ?([^,|(Folder)]+)/i)
    match_data[1].strip if match_data.present?
  end.compact

  # There should only be one box
  box_num.first
end

#catkey ⇒ `String`

# File 'lib/stanford-mods/searchworks.rb', line 359

def catkey
  catkey = self.term_values([:record_info, :recordIdentifier])
  if catkey && catkey.length > 0
    return catkey.first.tr('a', '') # ensure catkey is numeric only
  end
  nil
end

#collectors_w_dates ⇒ `Object`

# File 'lib/stanford-mods/name.rb', line 64

def collectors_w_dates
  result = []
  @mods_ng_xml.personal_name.each do |n|
    next if n.role.size.zero?
    n.role.each { |r|
      result << n.display_value_w_date if includes_marc_relator_collector_role?(r)
    }
  end
  result unless result.empty?
end

#coordinates ⇒ `Object`



9
10
11

# File 'lib/stanford-mods/geo_spatial.rb', line 9

def coordinates
  Array(@mods_ng_xml.subject.cartographics.coordinates).map(&:text)
end

#date_created_elements(ignore_approximate = false) ⇒ `Array<Nokogiri::XML::Element>`

return /originInfo/dateCreated elements in MODS records

# File 'lib/stanford-mods/origin_info.rb', line 85

def date_created_elements(ignore_approximate=false)
  date_created_nodeset = @mods_ng_xml.origin_info.dateCreated
  return self.class.remove_approximate(date_created_nodeset) if ignore_approximate
  date_created_nodeset.to_a
end

#date_issued_elements(ignore_approximate = false) ⇒ `Array<Nokogiri::XML::Element>`

return /originInfo/dateIssued elements in MODS records

# File 'lib/stanford-mods/origin_info.rb', line 95

def date_issued_elements(ignore_approximate=false)
  date_issued_nodeset = @mods_ng_xml.origin_info.dateIssued
  return self.class.remove_approximate(date_issued_nodeset) if ignore_approximate
  date_issued_nodeset.to_a
end

#druid ⇒ `Object`



371
372
373

# File 'lib/stanford-mods/searchworks.rb', line 371

def druid
  @druid ? @druid : 'Unknown item'
end

#druid=(new_druid) ⇒ `Object`



367
368
369

# File 'lib/stanford-mods/searchworks.rb', line 367

def druid=(new_druid)
  @druid = new_druid
end

#era_facet ⇒ `Array<String>`

subject/temporal values with trailing comma, semicolon, and backslash (and any preceding spaces) removed



104
105
106

# File 'lib/stanford-mods/searchworks_subjects.rb', line 104

def era_facet
  subject_temporal.map { |val| val.sub(/[\\,;]$/, '').strip } unless !subject_temporal
end

#folder ⇒ `Object`

returns folder number (note: single valued)

data in location/physicalLocation or in relatedItem/location/physicalLocation
so use _location to get the data from either one of them

TODO: should it be hierarchical series/box/folder?

# File 'lib/stanford-mods/physical_location.rb', line 30

def folder
  #   _location.physicalLocation should find top level and relatedItem
  folder_num = @mods_ng_xml._location.physicalLocation.map do |node|
    val = node.text

    match_data = if val =~ /\|/
                   # we assume the data is pipe-delimited, and may contain commas within values
                   val.match(/Folder ?:? ?([^|]+)/)
                 else
                   # the data should be comma-delimited, and may not contain commas within values
                   val.match(/Folder ?:? ?([^,]+)/)
                 end

    match_data[1].strip if match_data.present?
  end.compact

  # There should be one folder
  folder_num.first
end

#format ⇒ `Array[String]`

Deprecated.

kept for backwards compatibility but not part of SW UI redesign work Summer 2014

select one or more format values from the controlled vocabulary here:

http://searchworks-solr-lb.stanford.edu:8983/solr/select?facet.field=format&rows=0&facet.sort=index

@deprecated: this is no longer used in SW, Revs or Spotlight Jan 2016

# File 'lib/stanford-mods/searchworks.rb', line 227

def format
  val = []
  types = self.term_values(:typeOfResource)
  if types
    genres = self.term_values(:genre)
    issuance = self.term_values([:origin_info,:issuance])
    types.each do |type|
      case type
        when 'cartographic'
          val << 'Map/Globe'
        when 'mixed material'
          val << 'Manuscript/Archive'
        when 'moving image'
          val << 'Video'
        when 'notated music'
          val << 'Music - Score'
        when 'software, multimedia'
          val << 'Computer File'
        when 'sound recording-musical'
          val << 'Music - Recording'
        when 'sound recording-nonmusical', 'sound recording'
          val << 'Sound Recording'
        when 'still image'
          val << 'Image'
        when 'text'
          val << 'Book' if issuance && issuance.include?('monographic')
          book_genres = ['book chapter', 'Book chapter', 'Book Chapter',
            'issue brief', 'Issue brief', 'Issue Brief',
            'librettos', 'Librettos',
            'project report', 'Project report', 'Project Report',
            'technical report', 'Technical report', 'Technical Report',
            'working paper', 'Working paper', 'Working Paper']
          val << 'Book' if genres && !(genres & book_genres).empty?
          conf_pub = ['conference publication', 'Conference publication', 'Conference Publication']
          val << 'Conference Proceedings' if genres && !(genres & conf_pub).empty?
          val << 'Journal/Periodical' if issuance && issuance.include?('continuing')
          article = ['article', 'Article']
          val << 'Journal/Periodical' if genres && !(genres & article).empty?
          stu_proj_rpt = ['student project report', 'Student project report', 'Student Project report', 'Student Project Report']
          val << 'Other' if genres && !(genres & stu_proj_rpt).empty?
          thesis = ['thesis', 'Thesis']
          val << 'Thesis' if genres && !(genres & thesis).empty?
        when 'three dimensional object'
          val << 'Other'
      end
    end
  end
  val.uniq
end

#format_main ⇒ `Array[String]`

select one or more format values from the controlled vocabulary per JVine Summer 2014

http://searchworks-solr-lb.stanford.edu:8983/solr/select?facet.field=format_main_ssim&rows=0&facet.sort=index

# File 'lib/stanford-mods/searchworks.rb', line 280

def format_main
  val = []
  types = self.term_values(:typeOfResource)
  article_genres = ['article', 'Article',
    'book chapter', 'Book chapter', 'Book Chapter',
    'issue brief', 'Issue brief', 'Issue Brief',
    'project report', 'Project report', 'Project Report',
    'student project report', 'Student project report', 'Student Project report', 'Student Project Report',
    'technical report', 'Technical report', 'Technical Report',
    'working paper', 'Working paper', 'Working Paper'
  ]
  book_genres = ['conference publication', 'Conference publication', 'Conference Publication',
    'instruction', 'Instruction',
    'librettos', 'Librettos',
    'thesis', 'Thesis'
  ]
  if types
    genres = self.term_values(:genre)
    issuance = self.term_values([:origin_info, :issuance])
    types.each do |type|
      case type
        when 'cartographic'
          val << 'Map'
        when 'mixed material'
          val << 'Archive/Manuscript'
        when 'moving image'
          val << 'Video'
        when 'notated music'
          val << 'Music score'
        when 'software, multimedia'
          if genres && (genres.include?('dataset') || genres.include?('Dataset'))
            val << 'Dataset'
          else
            val << 'Software/Multimedia'
          end
        when 'sound recording-musical'
          val << 'Music recording'
        when 'sound recording-nonmusical', 'sound recording'
          val << 'Sound recording'
        when 'still image'
          val << 'Image'
        when 'text'
          val << 'Book' if genres && !(genres & article_genres).empty?
          val << 'Book' if issuance && issuance.include?('monographic')
          val << 'Book' if genres && !(genres & book_genres).empty?
          val << 'Journal/Periodical' if issuance && issuance.include?('continuing')
        when 'three dimensional object'
          val << 'Object'
      end
    end
  end
  val.uniq
end

#geographic_facet ⇒ `Array<String>`

geographic_search values with trailing comma, semicolon, and backslash (and any preceding spaces) removed



98
99
100

# File 'lib/stanford-mods/searchworks_subjects.rb', line 98

def geographic_facet
  geographic_search.map { |val| val.sub(/[\\,;]$/, '').strip } unless !geographic_search
end

#geographic_search ⇒ `Array<String>`

Values are the contents of:

subject/geographic
subject/hierarchicalGeographic
subject/geographicCode  (only include the translated value if it isn't already present from other mods geo fields)

# File 'lib/stanford-mods/searchworks_subjects.rb', line 113

def geographic_search
  @geographic_search ||= begin
    result = self.sw_geographic_search

    # TODO:  this should go into stanford-mods ... but then we have to set that gem up with a Logger
    # print a message for any unrecognized encodings
    xvals = self.subject.geographicCode.translated_value
    codes = self.term_values([:subject, :geographicCode])
    if codes && codes.size > xvals.size
      self.subject.geographicCode.each { |n|
        if n.authority != 'marcgac' && n.authority != 'marccountry'
          sw_logger.info("#{druid} has subject geographicCode element with untranslated encoding (#{n.authority}): #{n.to_xml}")
        end
      }
    end

    # FIXME:  stanford-mods should be returning [], not nil ...
    return nil if !result || result.empty?
    result
  end
end

#includes_marc_relator_collector_role?(role_node) ⇒ `Boolean`

# File 'lib/stanford-mods/name.rb', line 79

def includes_marc_relator_collector_role?(role_node)
  (role_node.authority.include?('marcrelator') && role_node.value.include?('Collector')) ||
  role_node.roleTerm.valueURI.first == COLLECTOR_ROLE_URI
end

#location ⇒ `Object`

return entire contents of physicalLocation (note: single valued)

but only if it has series, accession, box or folder data
data in location/physicalLocation or in relatedItem/location/physicalLocation
so use _location to get the data from either one of them

TODO: should it be hierarchical series/box/folder?

# File 'lib/stanford-mods/physical_location.rb', line 55

def location
  #   _location.physicalLocation should find top level and relatedItem
  loc = @mods_ng_xml._location.physicalLocation.map do |node|
    node.text if node.text.match(/.*(Series)|(Accession)|(Folder)|(Box).*/i)
  end.compact

  # There should only be one location
  loc.first
end

#main_author_w_date ⇒ `String`

the first encountered <mods><name> element with marcrelator flavor role of ‘Creator’ or ‘Author’. if no marcrelator ‘Creator’ or ‘Author’, the first name without a role. if no name without a role, then nil see Mods::Record.name in nom_terminology for details on the display_value algorithm

# File 'lib/stanford-mods/name.rb', line 16

def main_author_w_date
  result = nil
  first_wo_role = nil
  @mods_ng_xml.plain_name.each { |n|
    if n.role.size == 0
      first_wo_role ||= n
    end
    n.role.each { |r|
      if r.authority.include?('marcrelator') &&
            (r.value.include?('Creator') || r.value.include?('Author'))
        result ||= n.display_value_w_date
      end
    }
  }
  if !result && first_wo_role
    result = first_wo_role.display_value_w_date
  end
  result
end

#main_author_w_date_test ⇒ `Object`

# File 'lib/stanford-mods/searchworks.rb', line 100

def main_author_w_date_test
  result = nil
  first_wo_role = nil
  self.plain_name.each { |n|
    if n.role.size == 0
      first_wo_role ||= n
    end
    n.role.each { |r|
      if r.authority.include?('marcrelator') &&
        (r.value.include?('Creator') || r.value.include?('Author'))
        result ||= n.display_value_w_date
      end
    }
  }
  if !result && first_wo_role
    result = first_wo_role.display_value_w_date
  end
  result
end

#non_collector_person_authors ⇒ `Object`

FIXME: this is broken if there are multiple role codes and some of them are not marcrelator

# File 'lib/stanford-mods/name.rb', line 51

def non_collector_person_authors
  result = []
  @mods_ng_xml.personal_name.map do |n|
    next if n.role.size.zero?
    n.role.each { |r|
      result << n.display_value_w_date unless includes_marc_relator_collector_role?(r)
    }
  end
  result unless result.empty?
end

#place ⇒ `Object`

—- old date parsing methods used downstream of gem; will be deprecated/replaced with new date parsing methods

# File 'lib/stanford-mods/origin_info.rb', line 198

def place
  vals = self.term_values([:origin_info, :place, :placeTerm])
  vals
end

#point_bbox ⇒ `Object`

# File 'lib/stanford-mods/geo_spatial.rb', line 13

def point_bbox
  coordinates.map do |n|
    matches = n.match(/^\(?([^)]+)\)?\.?$/)

    if matches
      coord_to_bbox(matches[1])
    else
      coord_to_bbox(n)
    end
  end.compact
end

#pub_date_display ⇒ `String`

For the date display only, the first place to look is in the dates without encoding=marc array. If no such dates, select the first date in the dates_marc_encoding array. Otherwise return nil @deprecated: DO NOT USE: this is no longer used in SW, Revs or Spotlight Jan 2016

# File 'lib/stanford-mods/origin_info.rb', line 244

def pub_date_display
  return dates_no_marc_encoding.first unless dates_no_marc_encoding.empty?
  return dates_marc_encoding.first unless dates_marc_encoding.empty?
  nil
end

#pub_date_facet ⇒ `Array[String]`

Values for the pub date facet. This is less strict than the 4 year date requirements for pub_date Jan 2016: used to populate Solr pub_date field for Spotlight and SearchWorks

Spotlight:  pub_date field should be replaced by pub_year_w_approx_isi and pub_year_no_approx_isi
SearchWorks:  pub_date field used for display in search results and show view; for sorting nearby-on-shelf
   these could be done with more approp fields/methods (pub_year_int for sorting;  new pub year methods to populate field)

TODO: prob should deprecated this in favor of pub_date_facet_single_value;

need head-to-head testing with pub_date_facet_single_value

# File 'lib/stanford-mods/origin_info.rb', line 211

def pub_date_facet
  if pub_date
    if pub_date.start_with?('-')
      return (pub_date.to_i + 1000).to_s + ' B.C.'
    end
    if pub_date.include? '--'
      cent = pub_date[0, 2].to_i
      cent += 1
      cent = cent.to_s + 'th century'
      return cent
    else
      return pub_date
    end
  end
  nil
end

#pub_date_facet_single_value(ignore_approximate = false) ⇒ `String`

return a single string intended for facet use for pub date prefer dateIssued (any) before dateCreated (any) before dateCaptured (any)

look for a keyDate and use it if there is one;  otherwise pick earliest date



21
22
23

# File 'lib/stanford-mods/origin_info.rb', line 21

def pub_date_facet_single_value(ignore_approximate = false)
  single_pub_year(ignore_approximate, :year_facet_str)
end

#pub_date_sort ⇒ `Object`

creates a date suitable for sorting. Guarnteed to be 4 digits or nil @deprecated: use pub_year_int, or pub_year_sort_str if you must have a string (why?)

# File 'lib/stanford-mods/origin_info.rb', line 230

def pub_date_sort
  if pub_date
    pd = pub_date
    pd = '0' + pd if pd.length == 3
    pd = pd.gsub('--', '00')
  end
  fail "pub_date_sort was about to return a non 4 digit value #{pd}!" if pd && pd.length != 4
  pd
end

#pub_year_int(ignore_approximate = false) ⇒ `Integer`

return pub year as an Integer prefer dateIssued (any) before dateCreated (any) before dateCaptured (any)

look for a keyDate and use it if there is one;  otherwise pick earliest date



32
33
34

# File 'lib/stanford-mods/origin_info.rb', line 32

def pub_year_int(ignore_approximate = false)
  single_pub_year(ignore_approximate, :year_int)
end

#pub_year_sort_str(ignore_approximate = false) ⇒ `String`

Deprecated.

use pub_year_int

return a single string intended for lexical sorting for pub date prefer dateIssued (any) before dateCreated (any) before dateCaptured (any)

look for a keyDate and use it if there is one;  otherwise pick earliest date



44
45
46

# File 'lib/stanford-mods/origin_info.rb', line 44

def pub_year_sort_str(ignore_approximate = false)
  single_pub_year(ignore_approximate, :year_sort_str)
end

#series ⇒ `Object`

return series/accession ‘number’ (note: single valued)

data in location/physicalLocation or in relatedItem/location/physicalLocation
so use _location to get the data from either one of them

TODO: should it be hierarchical series/box/folder?

# File 'lib/stanford-mods/physical_location.rb', line 69

def series
  #   _location.physicalLocation should find top level and relatedItem
  series_num = @mods_ng_xml._location.physicalLocation.map do |node|
    val = node.text
    # feigenbaum uses 'Accession'
    match_data = val.match(/(?:(?:Series)|(?:Accession)):? ([^,|]+)/i)
    match_data[1].strip if match_data.present?
  end.compact

  # There should be only one series
  series_num.first
end

#subject_all_search ⇒ `Array<String>`

Values are the contents of:

all subject subelements except subject/cartographic plus  genre top level element

# File 'lib/stanford-mods/searchworks_subjects.rb', line 171

def subject_all_search
  vals = topic_search ? Array.new(topic_search) : []
  vals.concat(geographic_search) if geographic_search
  vals.concat(subject_other_search) if subject_other_search
  vals.concat(subject_other_subvy_search) if subject_other_subvy_search
  vals.empty? ? nil : vals
end

#subject_other_search ⇒ `Array<String>`

Values are the contents of:

subject/name
subject/occupation  - no subelements
subject/titleInfo

# File 'lib/stanford-mods/searchworks_subjects.rb', line 140

def subject_other_search
  @subject_other_search ||= begin
    vals = subject_occupations ? Array.new(subject_occupations) : []
    vals.concat(subject_names) if subject_names
    vals.concat(subject_titles) if subject_titles
    vals.empty? ? nil : vals
  end
end

#subject_other_subvy_search ⇒ `Array<String>`

Values are the contents of:

subject/temporal
subject/genre

# File 'lib/stanford-mods/searchworks_subjects.rb', line 153

def subject_other_subvy_search
  @subject_other_subvy_search ||= begin
    vals = subject_temporal ? Array.new(subject_temporal) : []
    gvals = self.term_values([:subject, :genre])
    vals.concat(gvals) if gvals

    # print a message for any temporal encodings
    self.subject.temporal.each { |n|
      sw_logger.info("#{druid} has subject temporal element with untranslated encoding: #{n.to_xml}") if !n.encoding.empty?
    }

    vals.empty? ? nil : vals
  end
end

#sw_addl_authors ⇒ `Array<String>`



64
65
66

# File 'lib/stanford-mods/searchworks.rb', line 64

def sw_addl_authors
  additional_authors_w_dates
end

#sw_addl_titles ⇒ `Array<String>`

this includes all titles except



180
181
182

# File 'lib/stanford-mods/searchworks.rb', line 180

def sw_addl_titles
  full_titles.select { |s| s !~ Regexp.new(Regexp.escape(sw_short_title)) }
end

#sw_corporate_authors ⇒ `Array<String>`

# File 'lib/stanford-mods/searchworks.rb', line 80

def sw_corporate_authors
  val = @mods_ng_xml.plain_name.select {|n| n.type_at == 'corporate'}.map { |n| n.display_value_w_date }
  val
end

#sw_full_title ⇒ `String`

# File 'lib/stanford-mods/searchworks.rb', line 130

def sw_full_title
  outer_nodes = @mods_ng_xml.title_info
  outer_node = outer_nodes ? outer_nodes.first : nil
  if outer_node
    nonSort = outer_node.nonSort.text.strip.empty? ? nil : outer_node.nonSort.text.strip
    title   = outer_node.title.text.strip.empty?   ? nil : outer_node.title.text.strip
    preSubTitle = nonSort ? [nonSort, title].compact.join(" ") : title
    preSubTitle.sub!(/:$/, '') if preSubTitle # remove trailing colon

    subTitle = outer_node.subTitle.text.strip
    preParts = subTitle.empty? ? preSubTitle : preSubTitle + " : " + subTitle
    preParts.sub!(/\.$/, '') if preParts # remove trailing period

    partName   = outer_node.partName.text.strip   unless outer_node.partName.text.strip.empty?
    partNumber = outer_node.partNumber.text.strip unless outer_node.partNumber.text.strip.empty?
    partNumber.sub!(/,$/, '') if partNumber # remove trailing comma
    if partNumber && partName
      parts = partNumber + ", " + partName
    elsif partNumber
      parts = partNumber
    elsif partName
      parts = partName
    end
    parts.sub!(/\.$/, '') if parts

    result = parts ? preParts + ". " + parts : preParts
    result += "." if !result.match(/[[:punct:]]$/)
    result.strip!
    result = nil if result.empty?
    result
  else
    nil
  end
end

#sw_full_title_without_commas ⇒ `Object`

Deprecated.

in favor of sw_title_display

remove trailing commas

# File 'lib/stanford-mods/searchworks.rb', line 202

def sw_full_title_without_commas
  result = self.sw_full_title
  result.sub!(/,$/, '') if result
  result
end

#sw_genre ⇒ `Array[String]`

return values for the genre facet in SearchWorks

# File 'lib/stanford-mods/searchworks.rb', line 336

def sw_genre
  val = []
  genres = self.term_values(:genre)
  if genres
    val << genres.map(&:capitalize)
    val.flatten! if !val.empty?
    if genres.include?('thesis') || genres.include?('Thesis')
      val << 'Thesis/Dissertation'
      val.delete 'Thesis'
    end
    conf_pub = ['conference publication', 'Conference publication', 'Conference Publication']
    if !(genres & conf_pub).empty?
      types = self.term_values(:typeOfResource)
      if types && types.include?('text')
        val << 'Conference proceedings'
        val.delete 'Conference publication'
      end
    end
  end
  val.uniq
end

#sw_geographic_search(sep = ' ') ⇒ `Array<String>`

Values are the contents of:

subject/geographic
subject/hierarchicalGeographic
subject/geographicCode  (only include the translated value if it isn't already present from other mods geo fields)

# File 'lib/stanford-mods/searchworks_subjects.rb', line 16

def sw_geographic_search(sep = ' ')
  result = term_values([:subject, :geographic]) || []

  # hierarchicalGeographic has sub elements
  @mods_ng_xml.subject.hierarchicalGeographic.each { |hg_node|
    hg_vals = []
    hg_node.element_children.each { |e|
      hg_vals << e.text unless e.text.empty?
    }
    result << hg_vals.join(sep) unless hg_vals.empty?
  }

  trans_code_vals = @mods_ng_xml.subject.geographicCode.translated_value
  if trans_code_vals
    trans_code_vals.each { |val|
      result << val if !result.include?(val)
    }
  end

  result
end

#sw_impersonal_authors ⇒ `Array<String>`

return the display_value_w_date for all <mods><name> elements that do not have type=‘personal’



75
76
77

# File 'lib/stanford-mods/searchworks.rb', line 75

def sw_impersonal_authors
  @mods_ng_xml.plain_name.select {|n| n.type_at != 'personal'}.map { |n| n.display_value_w_date }
end

#sw_language_facet ⇒ `Object`

include langagues known to SearchWorks; try to error correct when possible (e.g. when ISO-639 disagrees with MARC standard)

# File 'lib/stanford-mods/searchworks.rb', line 14

def sw_language_facet
  result = []
  @mods_ng_xml.language.each { |n|
    # get languageTerm codes and add their translations to the result
    n.code_term.each { |ct|
      if ct.authority.match(/^iso639/)
        begin
          vals = ct.text.split(/[,|\ ]/).reject {|x| x.strip.length == 0 }
          vals.each do |v|
            iso639_val = ISO_639.find(v.strip).english_name
            if SEARCHWORKS_LANGUAGES.has_value?(iso639_val)
              result << iso639_val
            else
              result << SEARCHWORKS_LANGUAGES[v.strip]
            end
          end
        rescue
          # TODO:  this should be written to a logger
          p "Couldn't find english name for #{ct.text}"
        end
      else
        vals = ct.text.split(/[,|\ ]/).reject {|x| x.strip.length == 0 }
        vals.each do |v|
          result << SEARCHWORKS_LANGUAGES[v.strip]
        end
      end
    }
    # add languageTerm text values
    n.text_term.each { |tt|
      val = tt.text.strip
      result << val if val.length > 0 && SEARCHWORKS_LANGUAGES.has_value?(val)
    }

    # add language values that aren't in languageTerm subelement
    if n.languageTerm.size == 0
      result << n.text if SEARCHWORKS_LANGUAGES.has_value?(n.text)
    end
  }
  result.uniq
end

#sw_logger ⇒ `Object`

—- PUBLICATION (place, year) —- see origin_info.rb (as all this information comes from top level originInfo element) —- end PUBLICATION (place, year) —-



218
219
220

# File 'lib/stanford-mods/searchworks.rb', line 218

def sw_logger
  @logger ||= Logger.new(STDOUT)
end

#sw_main_author ⇒ `String`



59
60
61

# File 'lib/stanford-mods/searchworks.rb', line 59

def sw_main_author
  main_author_w_date
end

#sw_meeting_authors ⇒ `Array<String>`



86
87
88

# File 'lib/stanford-mods/searchworks.rb', line 86

def sw_meeting_authors
  @mods_ng_xml.plain_name.select {|n| n.type_at == 'conference'}.map { |n| n.display_value_w_date }
end

#sw_person_authors ⇒ `Array<String>`



69
70
71

# File 'lib/stanford-mods/searchworks.rb', line 69

def sw_person_authors
  personal_names_w_dates
end

#sw_short_title ⇒ `String`



125
126
127

# File 'lib/stanford-mods/searchworks.rb', line 125

def sw_short_title
  short_titles ? short_titles.first : nil
end

#sw_sort_author ⇒ `String`

Returns a sortable version of the main_author:

main_author + sorting title

which is the mods approximation of the value created for a marc record

# File 'lib/stanford-mods/searchworks.rb', line 94

def sw_sort_author
  #  substitute java Character.MAX_CODE_POINT for nil main_author so missing main authors sort last
  val = '' + (main_author_w_date ? main_author_w_date : "\u{10FFFF} ") + ( sort_title ? sort_title : '')
  val.gsub(/[[:punct:]]*/, '').strip
end

#sw_sort_title ⇒ `String`

Returns a sortable version of the main title

# File 'lib/stanford-mods/searchworks.rb', line 186

def sw_sort_title
  # get nonSort piece
  outer_nodes = @mods_ng_xml.title_info
  outer_node = outer_nodes ? outer_nodes.first : nil
  if outer_node
    nonSort = outer_node.nonSort.text.strip.empty? ? nil : outer_node.nonSort.text.strip
  end

  val = '' + ( sw_full_title ? sw_full_title : '')
  val.sub!(Regexp.new("^" + Regexp.escape(nonSort)), '') if nonSort
  val.gsub!(/[[:punct:]]*/, '').strip
  val.squeeze(" ").strip
end

#sw_subject_names(sep = ', ') ⇒ `Array<String>`

Values are the contents of:

 subject/name/namePart
"Values from namePart subelements should be concatenated in the order they appear (e.g. "Shakespeare, William, 1564-1616")"

# File 'lib/stanford-mods/searchworks_subjects.rb', line 43

def sw_subject_names(sep = ', ')
  result = []
  @mods_ng_xml.subject.name_el.select { |n_el| n_el.namePart }.each { |name_el_w_np|
    parts = name_el_w_np.namePart.map { |npn| npn.text unless npn.text.empty? }.compact
    result << parts.join(sep).strip unless parts.empty?
  }
  result
end

#sw_subject_titles(sep = ' ') ⇒ `Array<String>`

Values are the contents of:

subject/titleInfo/(subelements)

# File 'lib/stanford-mods/searchworks_subjects.rb', line 56

def sw_subject_titles(sep = ' ')
  result = []
  @mods_ng_xml.subject.titleInfo.each { |ti_el|
    parts = ti_el.element_children.map { |el| el.text unless el.text.empty? }.compact
    result << parts.join(sep).strip unless parts.empty?
  }
  result
end

#sw_title_display ⇒ `String`

like sw_full_title without trailing ,/;:. spec from solrmarc-sw sw_index.properties

title_display = custom, removeTrailingPunct(245abdefghijklmnopqrstuvwxyz, [\\\\,/;:], ([A-Za-z]{4}|[0-9]{3}|\\)|\\,))

# File 'lib/stanford-mods/searchworks.rb', line 169

def sw_title_display
  result = sw_full_title ? sw_full_title : nil
  if result
    result.sub!(/[\.,;:\/\\]+$/, '')
    result.strip!
  end
  result
end

#topic_facet ⇒ `Array<String>`

Values are the contents of:

 subject/topic
 subject/name
 subject/title
 subject/occupation
with trailing comma, semicolon, and backslash (and any preceding spaces) removed

# File 'lib/stanford-mods/searchworks_subjects.rb', line 84

def topic_facet
  vals = subject_topics ? Array.new(subject_topics) : []
  vals.concat(subject_names) if subject_names
  vals.concat(subject_titles) if subject_titles
  vals.concat(subject_occupations) if subject_occupations
  vals.map! { |val|
    v = val.sub(/[\\,;]$/, '')
    v.strip
  }
  vals.empty? ? nil : vals
end

#topic_search ⇒ `Array<String>`

Values are the contents of:

mods/genre
mods/subject/topic

# File 'lib/stanford-mods/searchworks_subjects.rb', line 69

def topic_search
  @topic_search ||= begin
    vals = self.term_values(:genre) || []
    vals.concat(subject_topics) if subject_topics
    vals.empty? ? nil : vals
  end
end

#year_facet_str(date_el_array) ⇒ `String`

given the passed date elements, look for a single keyDate and use it if there is one;

otherwise pick earliest parseable date

# File 'lib/stanford-mods/origin_info.rb', line 52

def year_facet_str(date_el_array)
  result = date_parsing_result(date_el_array, :facet_string_from_date_str)
  return result if result
  _ignore, orig_str_to_parse = self.class.earliest_year_str(date_el_array)
  DateParsing.facet_string_from_date_str(orig_str_to_parse) if orig_str_to_parse
end

#year_int(date_el_array) ⇒ `Integer`

given the passed date elements, look for a single keyDate and use it if there is one;

otherwise pick earliest parseable date

# File 'lib/stanford-mods/origin_info.rb', line 63

def year_int(date_el_array)
  result = date_parsing_result(date_el_array, :year_int_from_date_str)
  return result if result
  year_int, _ignore = self.class.earliest_year_int(date_el_array)
  year_int if year_int
end

#year_sort_str(date_el_array) ⇒ `String`

given the passed date elements, look for a single keyDate and use it if there is one;

otherwise pick earliest parseable date

# File 'lib/stanford-mods/origin_info.rb', line 74

def year_sort_str(date_el_array)
  result = date_parsing_result(date_el_array, :sortable_year_string_from_date_str)
  return result if result
  sortable_str, _ignore = self.class.earliest_year_str(date_el_array)
  sortable_str if sortable_str
end

Class: Stanford::Mods::Record

Overview

Constant Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.date_is_approximate?(date_element) ⇒ Boolean

.earliest_year_int(date_el_array) ⇒ Object

.earliest_year_str(date_el_array) ⇒ Object

.keyDate(elements) ⇒ Nokogiri::XML::Element?

.remove_approximate(nodeset) ⇒ Array<Nokogiri::XML::Element>

Instance Method Details

#additional_authors_w_dates ⇒ Object

#box ⇒ Object

#catkey ⇒ String

#collectors_w_dates ⇒ Object

#coordinates ⇒ Object

#date_created_elements(ignore_approximate = false) ⇒ Array<Nokogiri::XML::Element>

#date_issued_elements(ignore_approximate = false) ⇒ Array<Nokogiri::XML::Element>

#druid ⇒ Object

#druid=(new_druid) ⇒ Object

#era_facet ⇒ Array<String>

#folder ⇒ Object

#format ⇒ Array[String]

#format_main ⇒ Array[String]

#geographic_facet ⇒ Array<String>

#geographic_search ⇒ Array<String>

#includes_marc_relator_collector_role?(role_node) ⇒ Boolean

#location ⇒ Object

#main_author_w_date ⇒ String

#main_author_w_date_test ⇒ Object

#non_collector_person_authors ⇒ Object

#place ⇒ Object

#point_bbox ⇒ Object

#pub_date_display ⇒ String

#pub_date_facet ⇒ Array[String]

#pub_date_facet_single_value(ignore_approximate = false) ⇒ String

#pub_date_sort ⇒ Object

#pub_year_int(ignore_approximate = false) ⇒ Integer

#pub_year_sort_str(ignore_approximate = false) ⇒ String

#series ⇒ Object

#subject_all_search ⇒ Array<String>

#subject_other_search ⇒ Array<String>

#subject_other_subvy_search ⇒ Array<String>

#sw_addl_authors ⇒ Array<String>

#sw_addl_titles ⇒ Array<String>

#sw_corporate_authors ⇒ Array<String>

#sw_full_title ⇒ String

#sw_full_title_without_commas ⇒ Object

#sw_genre ⇒ Array[String]

#sw_geographic_search(sep = ' ') ⇒ Array<String>

#sw_impersonal_authors ⇒ Array<String>

#sw_language_facet ⇒ Object

#sw_logger ⇒ Object

#sw_main_author ⇒ String

#sw_meeting_authors ⇒ Array<String>

#sw_person_authors ⇒ Array<String>

#sw_short_title ⇒ String

#sw_sort_author ⇒ String

#sw_sort_title ⇒ String

#sw_subject_names(sep = ', ') ⇒ Array<String>

#sw_subject_titles(sep = ' ') ⇒ Array<String>

#sw_title_display ⇒ String

#topic_facet ⇒ Array<String>

#topic_search ⇒ Array<String>

#year_facet_str(date_el_array) ⇒ String

#year_int(date_el_array) ⇒ Integer

#year_sort_str(date_el_array) ⇒ String

.date_is_approximate?(date_element) ⇒ `Boolean`

.earliest_year_int(date_el_array) ⇒ `Object`

.earliest_year_str(date_el_array) ⇒ `Object`

.keyDate(elements) ⇒ `Nokogiri::XML::Element`^?

.remove_approximate(nodeset) ⇒ `Array<Nokogiri::XML::Element>`

#additional_authors_w_dates ⇒ `Object`

#box ⇒ `Object`

#catkey ⇒ `String`

#collectors_w_dates ⇒ `Object`

#coordinates ⇒ `Object`

#date_created_elements(ignore_approximate = false) ⇒ `Array<Nokogiri::XML::Element>`

#date_issued_elements(ignore_approximate = false) ⇒ `Array<Nokogiri::XML::Element>`

#druid ⇒ `Object`

#druid=(new_druid) ⇒ `Object`

#era_facet ⇒ `Array<String>`

#folder ⇒ `Object`

#format ⇒ `Array[String]`

#format_main ⇒ `Array[String]`

#geographic_facet ⇒ `Array<String>`

#geographic_search ⇒ `Array<String>`

#includes_marc_relator_collector_role?(role_node) ⇒ `Boolean`

#location ⇒ `Object`

#main_author_w_date ⇒ `String`

#main_author_w_date_test ⇒ `Object`

#non_collector_person_authors ⇒ `Object`

#place ⇒ `Object`

#point_bbox ⇒ `Object`

#pub_date_display ⇒ `String`

#pub_date_facet ⇒ `Array[String]`

#pub_date_facet_single_value(ignore_approximate = false) ⇒ `String`

#pub_date_sort ⇒ `Object`

#pub_year_int(ignore_approximate = false) ⇒ `Integer`

#pub_year_sort_str(ignore_approximate = false) ⇒ `String`

#series ⇒ `Object`

#subject_all_search ⇒ `Array<String>`

#subject_other_search ⇒ `Array<String>`

#subject_other_subvy_search ⇒ `Array<String>`

#sw_addl_authors ⇒ `Array<String>`

#sw_addl_titles ⇒ `Array<String>`

#sw_corporate_authors ⇒ `Array<String>`

#sw_full_title ⇒ `String`

#sw_full_title_without_commas ⇒ `Object`

#sw_genre ⇒ `Array[String]`

#sw_geographic_search(sep = ' ') ⇒ `Array<String>`

#sw_impersonal_authors ⇒ `Array<String>`

#sw_language_facet ⇒ `Object`

#sw_logger ⇒ `Object`

#sw_main_author ⇒ `String`

#sw_meeting_authors ⇒ `Array<String>`

#sw_person_authors ⇒ `Array<String>`

#sw_short_title ⇒ `String`

#sw_sort_author ⇒ `String`

#sw_sort_title ⇒ `String`

#sw_subject_names(sep = ', ') ⇒ `Array<String>`

#sw_subject_titles(sep = ' ') ⇒ `Array<String>`

#sw_title_display ⇒ `String`

#topic_facet ⇒ `Array<String>`

#topic_search ⇒ `Array<String>`

#year_facet_str(date_el_array) ⇒ `String`

#year_int(date_el_array) ⇒ `Integer`

#year_sort_str(date_el_array) ⇒ `String`