Class: Stanford::Mods::Record
- Inherits:
-
Mods::Record
- Object
- Mods::Record
- Stanford::Mods::Record
- Defined in:
- lib/stanford-mods/geo_spatial.rb,
lib/stanford-mods.rb,
lib/stanford-mods/name.rb,
lib/stanford-mods/origin_info.rb,
lib/stanford-mods/searchworks.rb,
lib/stanford-mods/physical_location.rb,
lib/stanford-mods/searchworks_subjects.rb
Overview
Parsing MODS //location/physicalLocation for series, box, and folder for Special Collections. This is not used by Searchworks, otherwise it would have been in the searchworks.rb file. Note: mods_ng_xml_location.physicalLocation should find top level and relatedItem. Each method here expects to find at most ONE matching element. Subsequent potential matches are ignored.
Constant Summary collapse
- COLLECTOR_ROLE_URI =
'http://id.loc.gov/vocabulary/relators/col'.freeze
- GMLNS =
'http://www.opengis.net/gml/3.2/'.freeze
Instance Attribute Summary collapse
- #druid ⇒ Object
- #logger ⇒ Object (also: #sw_logger)
Class Method Summary collapse
-
.date_is_approximate?(date_element) ⇒ Boolean
NOTE: legal values for MODS date elements with attribute qualifier are ‘approximate’, ‘inferred’ or ‘questionable’.
-
.earliest_year_int(date_el_array) ⇒ Object
get earliest parseable year (as an Integer) from the passed date elements.
-
.earliest_year_str(date_el_array) ⇒ Object
get earliest parseable year (as a String) from the passed date elements.
-
.keyDate(elements) ⇒ Nokogiri::XML::Element?
given a set of date elements, return the single element with attribute keyDate=“yes” or return nil if no elements have attribute keyDate=“yes”, or if multiple elements have keyDate=“yes”.
-
.remove_approximate(nodeset) ⇒ Array<Nokogiri::XML::Element>
remove Elements from NodeSet if they have a qualifier attribute of ‘approximate’ or ‘questionable’.
Instance Method Summary collapse
-
#additional_authors_w_dates ⇒ Object
all names, in display form, except the main_author names will be the display_value_w_date form see Mods::Record.name in nom_terminology for details on the display_value algorithm.
-
#box ⇒ String
data in location/physicalLocation or in relatedItem/location/physicalLocation so use _location to get the data from either one of them.
-
#catkey ⇒ String
Value with the numeric catkey in it, or nil if none exists.
-
#collectors_w_dates ⇒ Object
Array of Strings, each containing the computed display value of a personal name with the role of Collector (see mods gem nom_terminology for display value algorithm).
-
#coordinates ⇒ Array{String}
Subject cartographic coordinates values.
-
#coordinates_as_bbox ⇒ Array{String}
(also: #point_bbox)
With 4-part space-delimted strings, like “-16.0 -15.0 28.0 13.0”.
-
#coordinates_as_envelope ⇒ Array{String}
Values suitable for solr SRPT fields, like “ENVELOPE(-16.0, 28.0, 13.0, -15.0)”.
-
#coordinates_objects ⇒ Array{Stanford::Mods::Coordinate}
Valid coordinates as objects.
-
#date_created_elements(ignore_approximate = false) ⇒ Array<Nokogiri::XML::Element>
return /originInfo/dateCreated elements in MODS records.
-
#date_issued_elements(ignore_approximate = false) ⇒ Array<Nokogiri::XML::Element>
return /originInfo/dateIssued elements in MODS records.
-
#era_facet ⇒ Array<String>
subject/temporal values with trailing comma, semicolon, and backslash (and any preceding spaces) removed.
-
#first_title_info_node ⇒ Nokogiri::XML::Node
The first titleInfo node if present, else nil.
-
#folder ⇒ String
data in location/physicalLocation or in relatedItem/location/physicalLocation so use _location to get the data from either one of them.
-
#format ⇒ Array[String]
deprecated
Deprecated.
-
kept for backwards compatibility but not part of SW UI redesign work Summer 2014
-
-
#format_main ⇒ Array[String]
select one or more format values from the controlled vocabulary per JVine Summer 2014 searchworks-solr-lb.stanford.edu:8983/solr/select?facet.field=format_main_ssim&rows=0&facet.sort=index github.com/sul-dlss/stanford-mods/issues/66 - For geodata, the resource type should be only Map and not include Software, multimedia.
-
#geo_extensions_as_envelope ⇒ Array{String}
Values suitable for solr SRPT fields, like “ENVELOPE(-16.0, 28.0, 13.0, -15.0)”.
-
#geo_extensions_point_data ⇒ Array{String}
Values suitable for solr SRPT fields, like “-16.0 28.0”.
-
#geographic_facet ⇒ Array<String>
geographic_search values with trailing comma, semicolon, and backslash (and any preceding spaces) removed.
-
#geographic_search ⇒ Array<String>
Values are the contents of: subject/geographic subject/hierarchicalGeographic subject/geographicCode (only include the translated value if it isn’t already present from other mods geo fields).
-
#imprint_display_str ⇒ String
Single String containing imprint information for display.
-
#includes_marc_relator_collector_role?(role_node) ⇒ Boolean
True if there is a MARC relator collector role assigned.
-
#main_author_w_date ⇒ String
the first encountered <mods><name> element with marcrelator flavor role of ‘Creator’ or ‘Author’.
- #main_author_w_date_test ⇒ Object
-
#non_collector_person_authors ⇒ Object
FIXME: this is broken if there are multiple role codes and some of them are not marcrelator.
-
#nonSort_title ⇒ String
The nonSort text portion of the titleInfo node as a string (if non-empty, else nil).
-
#physical_location_str ⇒ String
but only if it has series, accession, box or folder data data in location/physicalLocation or in relatedItem/location/physicalLocation so use _location to get the data from either one of them.
-
#place ⇒ Object
—- old date parsing methods used downstream of gem; will be deprecated/replaced with new date parsing methods.
-
#present_title_info_nodes ⇒ Nokogiri::XML::NodeSet
Title_info nodes, rejecting ones that just have blank text values.
-
#pub_date_display ⇒ String
deprecated
Deprecated.
DO NOT USE: this is no longer used in SW, Revs or Spotlight Jan 2016
-
#pub_date_facet ⇒ String
Values for the pub date facet.
-
#pub_date_sort ⇒ Object
deprecated
Deprecated.
use pub_year_int, or pub_year_sort_str if you must have a string (why?)
-
#pub_year_display_str(ignore_approximate = false) ⇒ Object
return a single string intended for display of pub year 0 < year < 1000: add A.D.
-
#pub_year_int(ignore_approximate = false) ⇒ Integer
return pub year as an Integer prefer dateIssued (any) before dateCreated (any) before dateCaptured (any) look for a keyDate and use it if there is one; otherwise pick earliest date.
-
#pub_year_sort_str(ignore_approximate = false) ⇒ String
deprecated
Deprecated.
use pub_year_int
-
#series ⇒ String
data in location/physicalLocation or in relatedItem/location/physicalLocation so use _location to get the data from either one of them.
-
#subject_all_search ⇒ Array<String>
Values are the contents of: all subject subelements except subject/cartographic plus genre top level element.
-
#subject_other_search ⇒ Array<String>
Values are the contents of: subject/name subject/occupation - no subelements subject/titleInfo.
-
#subject_other_subvy_search ⇒ Array<String>
Values are the contents of: subject/temporal subject/genre.
-
#sw_addl_authors ⇒ Array<String>
Values for author_7xx_search field.
-
#sw_addl_titles ⇒ Array<String>
this includes all titles except.
-
#sw_corporate_authors ⇒ Array<String>
Values for author_corp_display.
-
#sw_full_title ⇒ String
Value for title_245_search, title_full_display.
-
#sw_full_title_without_commas ⇒ Object
deprecated
Deprecated.
in favor of sw_title_display
-
#sw_genre ⇒ Array[String]
github.com/sul-dlss/stanford-mods/issues/66 Limit genre values to Government document, Conference proceedings, Technical report and Thesis/Dissertation.
-
#sw_geographic_search(sep = ' ') ⇒ Array<String>
Values are the contents of: subject/geographic subject/hierarchicalGeographic subject/geographicCode (only include the translated value if it isn’t already present from other mods geo fields).
-
#sw_impersonal_authors ⇒ Array<String>
return the display_value_w_date for all <mods><name> elements that do not have type=‘personal’.
-
#sw_language_facet ⇒ Object
include langagues known to SearchWorks; try to error correct when possible (e.g. when ISO-639 disagrees with MARC standard).
-
#sw_main_author ⇒ String
Value for author_1xx_search field.
-
#sw_meeting_authors ⇒ Array<String>
Values for author_meeting_display.
-
#sw_person_authors ⇒ Array<String>
Values for author_person_facet, author_person_display.
-
#sw_short_title ⇒ String
Value for title_245a_search field.
-
#sw_sort_author ⇒ String
Returns a sortable version of the main_author: main_author + sorting title which is the mods approximation of the value created for a marc record.
-
#sw_sort_title ⇒ String
Returns a sortable version of the main title.
-
#sw_subject_names(sep = ', ') ⇒ Array<String>
Values are the contents of: subject/name/namePart “Values from namePart subelements should be concatenated in the order they appear (e.g. ”Shakespeare, William, 1564-1616“)”.
-
#sw_subject_titles(sep = ' ') ⇒ Array<String>
Values are the contents of: subject/titleInfo/(subelements).
-
#sw_title_display ⇒ String
like sw_full_title without trailing ,/;:.
-
#title ⇒ String
The text of the titleInfo node as a string (if non-empty, else nil).
-
#topic_facet ⇒ Array<String>
Values are the contents of: subject/topic subject/name subject/title subject/occupation with trailing comma, semicolon, and backslash (and any preceding spaces) removed.
-
#topic_search ⇒ Array<String>
Values are the contents of: mods/genre mods/subject/topic.
-
#year_display_str(date_el_array) ⇒ String
given the passed date elements, look for a single keyDate and use it if there is one; otherwise pick earliest parseable date.
-
#year_int(date_el_array) ⇒ Integer
given the passed date elements, look for a single keyDate and use it if there is one; otherwise pick earliest parseable date.
-
#year_sort_str(date_el_array) ⇒ String
given the passed date elements, look for a single keyDate and use it if there is one; otherwise pick earliest parseable date.
Instance Attribute Details
#druid ⇒ Object
14 15 16 |
# File 'lib/stanford-mods/searchworks.rb', line 14 def druid @druid || 'Unknown item' end |
#logger ⇒ Object Also known as: sw_logger
18 19 20 |
# File 'lib/stanford-mods/searchworks.rb', line 18 def logger @logger ||= Logger.new(STDOUT) end |
Class Method Details
.date_is_approximate?(date_element) ⇒ Boolean
NOTE: legal values for MODS date elements with attribute qualifier are
'approximate', 'inferred' or 'questionable'
153 154 155 156 |
# File 'lib/stanford-mods/origin_info.rb', line 153 def self.date_is_approximate?(date_element) qualifier = date_element["qualifier"] if date_element.respond_to?('[]') qualifier == 'approximate' || qualifier == 'questionable' end |
.earliest_year_int(date_el_array) ⇒ Object
get earliest parseable year (as an Integer) from the passed date elements
163 164 165 |
# File 'lib/stanford-mods/origin_info.rb', line 163 def self.earliest_year_int(date_el_array) earliest_year(date_el_array, :year_int_from_date_str) end |
.earliest_year_str(date_el_array) ⇒ Object
get earliest parseable year (as a String) from the passed date elements
172 173 174 |
# File 'lib/stanford-mods/origin_info.rb', line 172 def self.earliest_year_str(date_el_array) earliest_year(date_el_array, :sortable_year_string_from_date_str) end |
.keyDate(elements) ⇒ Nokogiri::XML::Element?
given a set of date elements, return the single element with attribute keyDate=“yes”
or return nil if no elements have attribute keyDate="yes", or if multiple elements have keyDate="yes"
135 136 137 138 |
# File 'lib/stanford-mods/origin_info.rb', line 135 def self.keyDate(elements) keyDates = elements.select { |node| node["keyDate"] == 'yes' } keyDates.first if keyDates.size == 1 end |
.remove_approximate(nodeset) ⇒ Array<Nokogiri::XML::Element>
remove Elements from NodeSet if they have a qualifier attribute of ‘approximate’ or ‘questionable’
144 145 146 |
# File 'lib/stanford-mods/origin_info.rb', line 144 def self.remove_approximate(nodeset) nodeset.select { |node| node unless date_is_approximate?(node) } end |
Instance Method Details
#additional_authors_w_dates ⇒ Object
all names, in display form, except the main_author
names will be the display_value_w_date form
see Mods::Record.name in nom_terminology for details on the display_value algorithm
32 33 34 35 36 37 38 39 |
# File 'lib/stanford-mods/name.rb', line 32 def results = [] mods_ng_xml.plain_name.each { |n| results << n.display_value_w_date } results.delete() results end |
#box ⇒ String
should it be hierarchical series/box/folder?
data in location/physicalLocation or in relatedItem/location/physicalLocation so use _location to get the data from either one of them
15 16 17 18 19 20 21 |
# File 'lib/stanford-mods/physical_location.rb', line 15 def box mods_ng_xml._location.physicalLocation.each do |node| match_data = node.text.match(/Box ?:? ?([^,|(Folder)]+)/i) # note that this will also find Flatbox or Flat-box return match_data[1].strip if match_data.present? end nil end |
#catkey ⇒ String
Returns value with the numeric catkey in it, or nil if none exists.
366 367 368 369 370 |
# File 'lib/stanford-mods/searchworks.rb', line 366 def catkey catkey = term_values([:record_info, :recordIdentifier]) return nil unless catkey && !catkey.empty? catkey.first.tr('a', '') # ensure catkey is numeric only end |
#collectors_w_dates ⇒ Object
Returns Array of Strings, each containing the computed display value of a personal name with the role of Collector (see mods gem nom_terminology for display value algorithm).
57 58 59 60 61 62 63 64 65 66 |
# File 'lib/stanford-mods/name.rb', line 57 def collectors_w_dates result = [] mods_ng_xml.personal_name.each do |n| next if n.role.size.zero? n.role.each { |r| result << n.display_value_w_date if includes_marc_relator_collector_role?(r) } end result unless result.empty? end |
#coordinates ⇒ Array{String}
Returns subject cartographic coordinates values.
11 12 13 |
# File 'lib/stanford-mods/geo_spatial.rb', line 11 def coordinates Array(mods_ng_xml.subject.cartographics.coordinates).map(&:text) end |
#coordinates_as_bbox ⇒ Array{String} Also known as: point_bbox
Returns with 4-part space-delimted strings, like “-16.0 -15.0 28.0 13.0”.
62 63 64 |
# File 'lib/stanford-mods/geo_spatial.rb', line 62 def coordinates_as_bbox coordinates_objects.map(&:as_bbox).compact end |
#coordinates_as_envelope ⇒ Array{String}
Returns values suitable for solr SRPT fields, like “ENVELOPE(-16.0, 28.0, 13.0, -15.0)”.
57 58 59 |
# File 'lib/stanford-mods/geo_spatial.rb', line 57 def coordinates_as_envelope coordinates_objects.map(&:as_envelope).compact end |
#coordinates_objects ⇒ Array{Stanford::Mods::Coordinate}
Returns valid coordinates as objects.
52 53 54 |
# File 'lib/stanford-mods/geo_spatial.rb', line 52 def coordinates_objects coordinates.map { |n| Stanford::Mods::Coordinate.new(n) }.select(&:valid?) end |
#date_created_elements(ignore_approximate = false) ⇒ Array<Nokogiri::XML::Element>
return /originInfo/dateCreated elements in MODS records
115 116 117 118 119 |
# File 'lib/stanford-mods/origin_info.rb', line 115 def date_created_elements(ignore_approximate = false) date_created_nodeset = mods_ng_xml.origin_info.dateCreated return self.class.remove_approximate(date_created_nodeset) if ignore_approximate date_created_nodeset.to_a end |
#date_issued_elements(ignore_approximate = false) ⇒ Array<Nokogiri::XML::Element>
return /originInfo/dateIssued elements in MODS records
125 126 127 128 129 |
# File 'lib/stanford-mods/origin_info.rb', line 125 def date_issued_elements(ignore_approximate = false) date_issued_nodeset = mods_ng_xml.origin_info.dateIssued return self.class.remove_approximate(date_issued_nodeset) if ignore_approximate date_issued_nodeset.to_a end |
#era_facet ⇒ Array<String>
subject/temporal values with trailing comma, semicolon, and backslash (and any preceding spaces) removed
93 94 95 |
# File 'lib/stanford-mods/searchworks_subjects.rb', line 93 def era_facet subject_temporal.map { |val| val.sub(/[\\,;]$/, '').strip } if subject_temporal end |
#first_title_info_node ⇒ Nokogiri::XML::Node
Returns the first titleInfo node if present, else nil.
138 139 140 |
# File 'lib/stanford-mods/searchworks.rb', line 138 def first_title_info_node present_title_info_nodes ? present_title_info_nodes.first : nil end |
#folder ⇒ String
should it be hierarchical series/box/folder?
data in location/physicalLocation or in relatedItem/location/physicalLocation so use _location to get the data from either one of them
27 28 29 30 31 32 33 34 35 36 |
# File 'lib/stanford-mods/physical_location.rb', line 27 def folder mods_ng_xml._location.physicalLocation.each do |node| val = node.text match_data = val =~ /\|/ ? val.match(/Folder ?:? ?([^|]+)/) : # expect pipe-delimited, may contain commas within values val.match(/Folder ?:? ?([^,]+)/) # expect comma-delimited, may NOT contain commas within values return match_data[1].strip if match_data.present? end nil end |
#format ⇒ Array[String]
-
kept for backwards compatibility but not part of SW UI redesign work Summer 2014
select one or more format values from the controlled vocabulary here:
http://searchworks-solr-lb.stanford.edu:8983/solr/select?facet.field=format&rows=0&facet.sort=index
@deprecated: this is no longer used in SW, Revs or Spotlight Jan 2016
235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 |
# File 'lib/stanford-mods/searchworks.rb', line 235 def format types = term_values(:typeOfResource) return [] unless types genres = term_values(:genre) issuance = term_values([:origin_info, :issuance]) val = [] types.each do |type| case type when 'cartographic' val << 'Map/Globe' when 'mixed material' val << 'Manuscript/Archive' when 'moving image' val << 'Video' when 'notated music' val << 'Music - Score' when 'software, multimedia' val << 'Computer File' when 'sound recording-musical' val << 'Music - Recording' when 'sound recording-nonmusical', 'sound recording' val << 'Sound Recording' when 'still image' val << 'Image' when 'text' val << 'Book' if issuance && issuance.include?('monographic') book_genres = ['book chapter', 'Book chapter', 'Book Chapter', 'issue brief', 'Issue brief', 'Issue Brief', 'librettos', 'Librettos', 'project report', 'Project report', 'Project Report', 'technical report', 'Technical report', 'Technical Report', 'working paper', 'Working paper', 'Working Paper'] val << 'Book' if genres && !(genres & book_genres).empty? conf_pub = ['conference publication', 'Conference publication', 'Conference Publication'] val << 'Conference Proceedings' if genres && !(genres & conf_pub).empty? val << 'Journal/Periodical' if issuance && issuance.include?('continuing') article = ['article', 'Article'] val << 'Journal/Periodical' if genres && !(genres & article).empty? stu_proj_rpt = ['student project report', 'Student project report', 'Student Project report', 'Student Project Report'] val << 'Other' if genres && !(genres & stu_proj_rpt).empty? thesis = ['thesis', 'Thesis'] val << 'Thesis' if genres && !(genres & thesis).empty? when 'three dimensional object' val << 'Other' end end val.uniq end |
#format_main ⇒ Array[String]
select one or more format values from the controlled vocabulary per JVine Summer 2014
http://searchworks-solr-lb.stanford.edu:8983/solr/select?facet.field=format_main_ssim&rows=0&facet.sort=index
github.com/sul-dlss/stanford-mods/issues/66 - For geodata, the resource type should be only Map and not include Software, multimedia.
289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 |
# File 'lib/stanford-mods/searchworks.rb', line 289 def format_main types = term_values(:typeOfResource) return [] unless types article_genres = ['article', 'Article', 'book chapter', 'Book chapter', 'Book Chapter', 'issue brief', 'Issue brief', 'Issue Brief', 'project report', 'Project report', 'Project Report', 'student project report', 'Student project report', 'Student Project report', 'Student Project Report', 'technical report', 'Technical report', 'Technical Report', 'working paper', 'Working paper', 'Working Paper' ] book_genres = ['conference publication', 'Conference publication', 'Conference Publication', 'instruction', 'Instruction', 'librettos', 'Librettos', 'thesis', 'Thesis' ] val = [] genres = term_values(:genre) issuance = term_values([:origin_info, :issuance]) types.each do |type| case type when 'cartographic' val << 'Map' val.delete 'Software/Multimedia' when 'mixed material' val << 'Archive/Manuscript' when 'moving image' val << 'Video' when 'notated music' val << 'Music score' when 'software, multimedia' if genres && (genres.include?('dataset') || genres.include?('Dataset')) val << 'Dataset' elsif !val.include?('Map') val << 'Software/Multimedia' end when 'sound recording-musical' val << 'Music recording' when 'sound recording-nonmusical', 'sound recording' val << 'Sound recording' when 'still image' val << 'Image' when 'text' val << 'Book' if genres && !(genres & article_genres).empty? val << 'Book' if issuance && issuance.include?('monographic') val << 'Book' if genres && !(genres & book_genres).empty? val << 'Journal/Periodical' if issuance && issuance.include?('continuing') val << 'Archived website' if genres && genres.include?('archived website') when 'three dimensional object' val << 'Object' end end val.uniq end |
#geo_extensions_as_envelope ⇒ Array{String}
example xml leaf nodes <gml:lowerCorner>-122.191292 37.4063388</gml:lowerCorner> <gml:upperCorner>-122.149475 37.4435369</gml:upperCorner>
Returns values suitable for solr SRPT fields, like “ENVELOPE(-16.0, 28.0, 13.0, -15.0)”.
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
# File 'lib/stanford-mods/geo_spatial.rb', line 19 def geo_extensions_as_envelope mods_ng_xml.extension .xpath( '//rdf:RDF/rdf:Description/gml:boundedBy/gml:Envelope', 'gml' => GMLNS, 'rdf' => 'http://www.w3.org/1999/02/22-rdf-syntax-ns#' ).map do |v| uppers = v.xpath('gml:upperCorner', 'gml' => GMLNS).text.split lowers = v.xpath('gml:lowerCorner', 'gml' => GMLNS).text.split "ENVELOPE(#{lowers[0]}, #{uppers[0]}, #{uppers[1]}, #{lowers[1]})" end rescue RuntimeError => e logger.warn "failure parsing <extension> element: #{e.}" [] end |
#geo_extensions_point_data ⇒ Array{String}
example xml leaf nodes <gml:pos>-122.191292 37.4063388</gml:pos>
Returns values suitable for solr SRPT fields, like “-16.0 28.0”.
38 39 40 41 42 43 44 45 46 47 48 49 |
# File 'lib/stanford-mods/geo_spatial.rb', line 38 def geo_extensions_point_data mods_ng_xml.extension .xpath( '//rdf:RDF/rdf:Description/gmd:centerPoint/gml:Point[gml:pos]', 'gml' => GMLNS, 'gmd' => 'http://www.isotc211.org/2005/gmd', 'rdf' => 'http://www.w3.org/1999/02/22-rdf-syntax-ns#' ).map do |v| lat, long = v.xpath('gml:pos', 'gml' => GMLNS).text.split "#{long} #{lat}" end end |
#geographic_facet ⇒ Array<String>
geographic_search values with trailing comma, semicolon, and backslash (and any preceding spaces) removed
87 88 89 |
# File 'lib/stanford-mods/searchworks_subjects.rb', line 87 def geographic_facet geographic_search.map { |val| val.sub(/[\\,;]$/, '').strip } if geographic_search end |
#geographic_search ⇒ Array<String>
Values are the contents of:
subject/geographic
subject/hierarchicalGeographic
subject/geographicCode (only include the translated value if it isn't already present from other mods geo fields)
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
# File 'lib/stanford-mods/searchworks_subjects.rb', line 102 def geographic_search @geographic_search ||= begin result = sw_geographic_search # TODO: this should go into stanford-mods ... but then we have to set that gem up with a Logger # print a message for any unrecognized encodings xvals = subject.geographicCode.translated_value codes = term_values([:subject, :geographicCode]) if codes && codes.size > xvals.size subject.geographicCode.each { |n| next unless n. != 'marcgac' && n. != 'marccountry' sw_logger.info("#{druid} has subject geographicCode element with untranslated encoding (#{n.}): #{n.to_xml}") } end # FIXME: stanford-mods should be returning [], not nil ... return nil if !result || result.empty? result end end |
#imprint_display_str ⇒ String
Returns single String containing imprint information for display.
73 74 75 76 |
# File 'lib/stanford-mods/origin_info.rb', line 73 def imprint_display_str imp = Stanford::Mods::Imprint.new(origin_info) imp.display_str end |
#includes_marc_relator_collector_role?(role_node) ⇒ Boolean
Returns true if there is a MARC relator collector role assigned.
72 73 74 75 |
# File 'lib/stanford-mods/name.rb', line 72 def includes_marc_relator_collector_role?(role_node) (role_node..include?('marcrelator') && role_node.value.include?('Collector')) || role_node.roleTerm.valueURI.first == COLLECTOR_ROLE_URI end |
#main_author_w_date ⇒ String
the first encountered <mods><name> element with marcrelator flavor role of ‘Creator’ or ‘Author’. if no marcrelator ‘Creator’ or ‘Author’, the first name without a role. if no name without a role, then nil see Mods::Record.name in nom_terminology for details on the display_value algorithm
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
# File 'lib/stanford-mods/name.rb', line 13 def result = nil first_wo_role = nil mods_ng_xml.plain_name.each { |n| first_wo_role ||= n if n.role.empty? n.role.each { |r| if r..include?('marcrelator') && (r.value.include?('Creator') || r.value.include?('Author')) result ||= n.display_value_w_date end } } result = first_wo_role.display_value_w_date if !result && first_wo_role result end |
#main_author_w_date_test ⇒ Object
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
# File 'lib/stanford-mods/searchworks.rb', line 107 def result = nil first_wo_role = nil plain_name.each { |n| first_wo_role ||= n if n.role.empty? n.role.each { |r| if r..include?('marcrelator') && (r.value.include?('Creator') || r.value.include?('Author')) result ||= n.display_value_w_date end } } result = first_wo_role.display_value_w_date if !result && first_wo_role result end |
#non_collector_person_authors ⇒ Object
FIXME: this is broken if there are multiple role codes and some of them are not marcrelator
44 45 46 47 48 49 50 51 52 53 |
# File 'lib/stanford-mods/name.rb', line 44 def result = [] mods_ng_xml.personal_name.map do |n| next if n.role.size.zero? n.role.each { |r| result << n.display_value_w_date unless includes_marc_relator_collector_role?(r) } end result unless result.empty? end |
#nonSort_title ⇒ String
Returns the nonSort text portion of the titleInfo node as a string (if non-empty, else nil).
143 144 145 146 147 |
# File 'lib/stanford-mods/searchworks.rb', line 143 def nonSort_title return unless first_title_info_node && first_title_info_node.nonSort first_title_info_node.nonSort.text.strip.empty? ? nil : first_title_info_node.nonSort.text.strip end |
#physical_location_str ⇒ String
should it be hierarchical series/box/folder?
there is a “physicalLocation” and a “location” method defined in the mods gem, so we cannot use these names to avoid conflicts
but only if it has series, accession, box or folder data data in location/physicalLocation or in relatedItem/location/physicalLocation so use _location to get the data from either one of them
44 45 46 47 48 |
# File 'lib/stanford-mods/physical_location.rb', line 44 def physical_location_str mods_ng_xml._location.physicalLocation.map(&:text).find do |text| text =~ /.*(Series)|(Accession)|(Folder)|(Box).*/i end end |
#place ⇒ Object
—- old date parsing methods used downstream of gem; will be deprecated/replaced with new date parsing methods
226 227 228 |
# File 'lib/stanford-mods/origin_info.rb', line 226 def place term_values([:origin_info, :place, :placeTerm]) end |
#present_title_info_nodes ⇒ Nokogiri::XML::NodeSet
Returns title_info nodes, rejecting ones that just have blank text values.
133 134 135 |
# File 'lib/stanford-mods/searchworks.rb', line 133 def present_title_info_nodes mods_ng_xml.title_info.reject {|node| node.text.strip.empty?} end |
#pub_date_display ⇒ String
DO NOT USE: this is no longer used in SW, Revs or Spotlight Jan 2016
For the date display only, the first place to look is in the dates without encoding=marc array. If no such dates, select the first date in the dates_marc_encoding array. Otherwise return nil
261 262 263 264 265 |
# File 'lib/stanford-mods/origin_info.rb', line 261 def pub_date_display return dates_no_marc_encoding.first unless dates_no_marc_encoding.empty? return dates_marc_encoding.first unless dates_marc_encoding.empty? nil end |
#pub_date_facet ⇒ String
Values for the pub date facet. This is less strict than the 4 year date requirements for pub_date Jan 2016: used to populate Solr pub_date field for Spotlight and SearchWorks
Spotlight: pub_date field should be replaced by pub_year_w_approx_isi and pub_year_no_approx_isi
SearchWorks: pub_date field used for display in search results and show view; for sorting nearby-on-shelf
these could be done with more approp fields/methods (pub_year_int for sorting; new pub year methods to populate field)
TODO: prob should deprecate this in favor of pub_year_display_str;
need head-to-head testing with pub_year_display_str
238 239 240 241 242 243 |
# File 'lib/stanford-mods/origin_info.rb', line 238 def pub_date_facet return nil unless pub_date return "#{pub_date.to_i + 1000} B.C." if pub_date.start_with?('-') return pub_date unless pub_date.include? '--' "#{pub_date[0, 2].to_i + 1}th century" end |
#pub_date_sort ⇒ Object
use pub_year_int, or pub_year_sort_str if you must have a string (why?)
creates a date suitable for sorting. Guarnteed to be 4 digits or nil
247 248 249 250 251 252 253 254 255 |
# File 'lib/stanford-mods/origin_info.rb', line 247 def pub_date_sort if pub_date pd = pub_date pd = '0' + pd if pd.length == 3 pd = pd.gsub('--', '00') end fail "pub_date_sort was about to return a non 4 digit value #{pd}!" if pd && pd.length != 4 pd end |
#pub_year_display_str(ignore_approximate = false) ⇒ Object
return a single string intended for display of pub year 0 < year < 1000: add A.D. suffix year < 0: add B.C. suffix. (‘-5’ => ‘5 B.C.’, ‘700 B.C.’ => ‘700 B.C.’) 195u => 195x 19uu => 19xx
'-5' => '5 B.C.'
'700 B.C.' => '700 B.C.'
'7th century' => '7th century'
date ranges? prefer dateIssued (any) before dateCreated (any) before dateCaptured (any)
look for a keyDate and use it if there is one; otherwise pick earliest date
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
# File 'lib/stanford-mods/origin_info.rb', line 47 def pub_year_display_str(ignore_approximate = false) single_pub_year(ignore_approximate, :year_display_str) # TODO: want range displayed when start and end points # TODO: also want best year in year_isi fields # get_main_title_date # https://github.com/sul-dlss/SearchWorks/blob/7d4d870a9d450fed8b081c38dc3dbd590f0b706e/app/helpers/results_document_helper.rb#L8-L46 # "publication_year_isi" => "Publication date", <-- do it already # "beginning_year_isi" => "Beginning date", # "earliest_year_isi" => "Earliest date", # "earliest_poss_year_isi" => "Earliest possible date", # "ending_year_isi" => "Ending date", # "latest_year_isi" => "Latest date", # "latest_poss_year_isi" => "Latest possible date", # "production_year_isi" => "Production date", # "original_year_isi" => "Original date", # "copyright_year_isi" => "Copyright date"} %> # "creation_year_isi" => "Creation date", <-- do it already # {}"release_year_isi" => "Release date", # {}"reprint_year_isi" => "Reprint/reissue date", # {}"other_year_isi" => "Date", end |
#pub_year_int(ignore_approximate = false) ⇒ Integer
for sorting: 5 B.C. => -5; 666 B.C. => -666
return pub year as an Integer prefer dateIssued (any) before dateCreated (any) before dateCaptured (any)
look for a keyDate and use it if there is one; otherwise pick earliest date
19 20 21 |
# File 'lib/stanford-mods/origin_info.rb', line 19 def pub_year_int(ignore_approximate = false) single_pub_year(ignore_approximate, :year_int) end |
#pub_year_sort_str(ignore_approximate = false) ⇒ String
use pub_year_int
for string sorting 5 B.C. = -5 => -995; 6 B.C. => -994, so 6 B.C. sorts before 5 B.C.
return a single string intended for lexical sorting for pub date prefer dateIssued (any) before dateCreated (any) before dateCaptured (any)
look for a keyDate and use it if there is one; otherwise pick earliest date
30 31 32 |
# File 'lib/stanford-mods/origin_info.rb', line 30 def pub_year_sort_str(ignore_approximate = false) single_pub_year(ignore_approximate, :year_sort_str) end |
#series ⇒ String
should it be hierarchical series/box/folder?
data in location/physicalLocation or in relatedItem/location/physicalLocation so use _location to get the data from either one of them
54 55 56 57 58 59 60 61 |
# File 'lib/stanford-mods/physical_location.rb', line 54 def series mods_ng_xml._location.physicalLocation.each do |node| # feigenbaum uses 'Accession' match_data = node.text.match(/(?:(?:Series)|(?:Accession)):? ([^,|]+)/i) return match_data[1].strip if match_data.present? end nil end |
#subject_all_search ⇒ Array<String>
Values are the contents of:
all subject subelements except subject/cartographic plus genre top level element
159 160 161 162 163 164 165 |
# File 'lib/stanford-mods/searchworks_subjects.rb', line 159 def subject_all_search vals = topic_search ? Array.new(topic_search) : [] vals.concat(geographic_search) if geographic_search vals.concat(subject_other_search) if subject_other_search vals.concat(subject_other_subvy_search) if subject_other_subvy_search vals.empty? ? nil : vals end |
#subject_other_search ⇒ Array<String>
Values are the contents of:
subject/name
subject/occupation - no subelements
subject/titleInfo
128 129 130 131 132 133 134 135 |
# File 'lib/stanford-mods/searchworks_subjects.rb', line 128 def subject_other_search @subject_other_search ||= begin vals = subject_occupations ? Array.new(subject_occupations) : [] vals.concat(subject_names) if subject_names vals.concat(subject_titles) if subject_titles vals.empty? ? nil : vals end end |
#subject_other_subvy_search ⇒ Array<String>
Values are the contents of:
subject/temporal
subject/genre
141 142 143 144 145 146 147 148 149 150 151 152 153 154 |
# File 'lib/stanford-mods/searchworks_subjects.rb', line 141 def subject_other_subvy_search @subject_other_subvy_search ||= begin vals = subject_temporal ? Array.new(subject_temporal) : [] gvals = term_values([:subject, :genre]) vals.concat(gvals) if gvals # print a message for any temporal encodings subject.temporal.each { |n| sw_logger.info("#{druid} has subject temporal element with untranslated encoding: #{n.to_xml}") unless n.encoding.empty? } vals.empty? ? nil : vals end end |
#sw_addl_authors ⇒ Array<String>
Returns values for author_7xx_search field.
72 73 74 |
# File 'lib/stanford-mods/searchworks.rb', line 72 def end |
#sw_addl_titles ⇒ Array<String>
this includes all titles except
199 200 201 |
# File 'lib/stanford-mods/searchworks.rb', line 199 def sw_addl_titles full_titles.select { |s| s !~ Regexp.new(Regexp.escape(sw_short_title)) } end |
#sw_corporate_authors ⇒ Array<String>
Returns values for author_corp_display.
88 89 90 |
# File 'lib/stanford-mods/searchworks.rb', line 88 def mods_ng_xml.plain_name.select { |n| n.type_at == 'corporate' }.map { |n| n.display_value_w_date } end |
#sw_full_title ⇒ String
Returns value for title_245_search, title_full_display.
157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 |
# File 'lib/stanford-mods/searchworks.rb', line 157 def sw_full_title return nil unless first_title_info_node preSubTitle = nonSort_title ? [nonSort_title, title].compact.join(" ") : title preSubTitle.sub!(/:$/, '') if preSubTitle # remove trailing colon subTitle = first_title_info_node.subTitle.text.strip preParts = subTitle.empty? ? preSubTitle : preSubTitle + " : " + subTitle preParts.sub!(/\.$/, '') if preParts # remove trailing period partName = first_title_info_node.partName.text.strip unless first_title_info_node.partName.text.strip.empty? partNumber = first_title_info_node.partNumber.text.strip unless first_title_info_node.partNumber.text.strip.empty? partNumber.sub!(/,$/, '') if partNumber # remove trailing comma if partNumber && partName parts = partNumber + ", " + partName elsif partNumber parts = partNumber elsif partName parts = partName end parts.sub!(/\.$/, '') if parts result = parts ? preParts + ". " + parts : preParts return nil unless result result += "." unless result =~ /[[:punct:]]$/ result.strip! result = nil if result.empty? result end |
#sw_full_title_without_commas ⇒ Object
in favor of sw_title_display
remove trailing commas
214 215 216 217 218 |
# File 'lib/stanford-mods/searchworks.rb', line 214 def sw_full_title_without_commas result = sw_full_title result.sub!(/,$/, '') if result result end |
#sw_genre ⇒ Array[String]
github.com/sul-dlss/stanford-mods/issues/66 Limit genre values to Government document, Conference proceedings, Technical report and Thesis/Dissertation
348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 |
# File 'lib/stanford-mods/searchworks.rb', line 348 def sw_genre genres = term_values(:genre) return [] unless genres types = term_values(:typeOfResource) val = [] val << 'Thesis/Dissertation' if genres.include?('thesis') || genres.include?('Thesis') if genres && types && types.include?('text') conf_pub = ['conference publication', 'Conference publication', 'Conference Publication'] gov_pub = ['government publication', 'Government publication', 'Government Publication'] tech_rpt = ['technical report', 'Technical report', 'Technical Report'] val << 'Conference proceedings' unless (genres & conf_pub).empty? val << 'Government document' unless (genres & gov_pub).empty? val << 'Technical report' unless (genres & tech_rpt).empty? end val.uniq end |
#sw_geographic_search(sep = ' ') ⇒ Array<String>
Values are the contents of:
subject/geographic
subject/hierarchicalGeographic
subject/geographicCode (only include the translated value if it isn't already present from other mods geo fields)
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
# File 'lib/stanford-mods/searchworks_subjects.rb', line 15 def sw_geographic_search(sep = ' ') result = term_values([:subject, :geographic]) || [] # hierarchicalGeographic has sub elements mods_ng_xml.subject.hierarchicalGeographic.each { |hg_node| hg_vals = hg_node.element_children.map(&:text).reject(&:empty?) result << hg_vals.join(sep) unless hg_vals.empty? } trans_code_vals = mods_ng_xml.subject.geographicCode.translated_value || [] trans_code_vals.each { |val| result << val unless result.include?(val) } result end |
#sw_impersonal_authors ⇒ Array<String>
return the display_value_w_date for all <mods><name> elements that do not have type=‘personal’
83 84 85 |
# File 'lib/stanford-mods/searchworks.rb', line 83 def mods_ng_xml.plain_name.select { |n| n.type_at != 'personal' }.map { |n| n.display_value_w_date } end |
#sw_language_facet ⇒ Object
include langagues known to SearchWorks; try to error correct when possible (e.g. when ISO-639 disagrees with MARC standard)
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
# File 'lib/stanford-mods/searchworks.rb', line 24 def sw_language_facet result = [] mods_ng_xml.language.each { |n| # get languageTerm codes and add their translations to the result n.code_term.each { |ct| if ct. =~ /^iso639/ vals = ct.text.split(/[,|\ ]/).reject { |x| x.strip.empty? } vals.each do |v| if ISO_639.find(v.strip) iso639_val = ISO_639.find(v.strip).english_name if SEARCHWORKS_LANGUAGES.has_value?(iso639_val) result << iso639_val else result << SEARCHWORKS_LANGUAGES[v.strip] end else logger.warn "Couldn't find english name for #{ct.text}" end end else vals = ct.text.split(/[,|\ ]/).reject { |x| x.strip.empty? } vals.each do |v| result << SEARCHWORKS_LANGUAGES[v.strip] end end } # add languageTerm text values n.text_term.each { |tt| val = tt.text.strip result << val if !val.empty? && SEARCHWORKS_LANGUAGES.has_value?(val) } # add language values that aren't in languageTerm subelement if n.languageTerm.empty? result << n.text if SEARCHWORKS_LANGUAGES.has_value?(n.text) end } result.uniq end |
#sw_main_author ⇒ String
Returns value for author_1xx_search field.
67 68 69 |
# File 'lib/stanford-mods/searchworks.rb', line 67 def end |
#sw_meeting_authors ⇒ Array<String>
Returns values for author_meeting_display.
93 94 95 |
# File 'lib/stanford-mods/searchworks.rb', line 93 def mods_ng_xml.plain_name.select { |n| n.type_at == 'conference' }.map { |n| n.display_value_w_date } end |
#sw_person_authors ⇒ Array<String>
Returns values for author_person_facet, author_person_display.
77 78 79 |
# File 'lib/stanford-mods/searchworks.rb', line 77 def personal_names_w_dates end |
#sw_short_title ⇒ String
Returns value for title_245a_search field.
128 129 130 |
# File 'lib/stanford-mods/searchworks.rb', line 128 def sw_short_title short_titles ? short_titles.compact.reject(&:empty?).first : nil end |
#sw_sort_author ⇒ String
Returns a sortable version of the main_author:
main_author + sorting title
which is the mods approximation of the value created for a marc record
101 102 103 104 105 |
# File 'lib/stanford-mods/searchworks.rb', line 101 def # substitute java Character.MAX_CODE_POINT for nil main_author so missing main authors sort last val = '' + ( ? : "\u{10FFFF} ") + (sort_title ? sort_title : '') val.gsub(/[[:punct:]]*/, '').strip end |
#sw_sort_title ⇒ String
Returns a sortable version of the main title
205 206 207 208 209 210 |
# File 'lib/stanford-mods/searchworks.rb', line 205 def sw_sort_title val = '' + (sw_full_title ? sw_full_title : '') val.sub!(Regexp.new("^" + Regexp.escape(nonSort_title)), '') if nonSort_title val.gsub!(/[[:punct:]]*/, '').strip val.squeeze(" ").strip end |
#sw_subject_names(sep = ', ') ⇒ Array<String>
Values are the contents of:
subject/name/namePart
"Values from namePart subelements should be concatenated in the order they appear (e.g. "Shakespeare, William, 1564-1616")"
36 37 38 39 40 41 42 |
# File 'lib/stanford-mods/searchworks_subjects.rb', line 36 def sw_subject_names(sep = ', ') mods_ng_xml.subject.name_el .select { |n_el| n_el.namePart } .map { |name_el_w_np| name_el_w_np.namePart.map(&:text).reject(&:empty?) } .reject(&:empty?) .map { |parts| parts.join(sep).strip } end |
#sw_subject_titles(sep = ' ') ⇒ Array<String>
Values are the contents of:
subject/titleInfo/(subelements)
48 49 50 51 52 53 54 55 |
# File 'lib/stanford-mods/searchworks_subjects.rb', line 48 def sw_subject_titles(sep = ' ') result = [] mods_ng_xml.subject.titleInfo.each { |ti_el| parts = ti_el.element_children.map(&:text).reject(&:empty?) result << parts.join(sep).strip unless parts.empty? } result end |
#sw_title_display ⇒ String
like sw_full_title without trailing ,/;:. spec from solrmarc-sw sw_index.properties
title_display = custom, removeTrailingPunct(245abdefghijklmnopqrstuvwxyz, [\\\\,/;:], ([A-Za-z]{4}|[0-9]{3}|\\)|\\,))
191 192 193 194 195 |
# File 'lib/stanford-mods/searchworks.rb', line 191 def sw_title_display result = sw_full_title return nil unless result result.sub(/[\.,;:\/\\]+$/, '').strip end |
#title ⇒ String
Returns the text of the titleInfo node as a string (if non-empty, else nil).
150 151 152 153 154 |
# File 'lib/stanford-mods/searchworks.rb', line 150 def title return unless first_title_info_node && first_title_info_node.title first_title_info_node.title.text.strip.empty? ? nil : first_title_info_node.title.text.strip end |
#topic_facet ⇒ Array<String>
Values are the contents of:
subject/topic
subject/name
subject/title
subject/occupation
with trailing comma, semicolon, and backslash (and any preceding spaces) removed
76 77 78 79 80 81 82 83 |
# File 'lib/stanford-mods/searchworks_subjects.rb', line 76 def topic_facet vals = subject_topics ? Array.new(subject_topics) : [] vals.concat(subject_names) if subject_names vals.concat(subject_titles) if subject_titles vals.concat(subject_occupations) if subject_occupations vals.map! { |val| val.sub(/[\\,;]$/, '').strip } vals.empty? ? nil : vals end |
#topic_search ⇒ Array<String>
Values are the contents of:
mods/genre
mods/subject/topic
61 62 63 64 65 66 67 |
# File 'lib/stanford-mods/searchworks_subjects.rb', line 61 def topic_search @topic_search ||= begin vals = term_values(:genre) || [] vals.concat(subject_topics) if subject_topics vals.empty? ? nil : vals end end |
#year_display_str(date_el_array) ⇒ String
given the passed date elements, look for a single keyDate and use it if there is one;
otherwise pick earliest parseable date
82 83 84 85 86 87 |
# File 'lib/stanford-mods/origin_info.rb', line 82 def year_display_str(date_el_array) result = date_parsing_result(date_el_array, :date_str_for_display) return result if result _ignore, orig_str_to_parse = self.class.earliest_year_str(date_el_array) DateParsing.date_str_for_display(orig_str_to_parse) if orig_str_to_parse end |
#year_int(date_el_array) ⇒ Integer
given the passed date elements, look for a single keyDate and use it if there is one;
otherwise pick earliest parseable date
93 94 95 96 97 98 |
# File 'lib/stanford-mods/origin_info.rb', line 93 def year_int(date_el_array) result = date_parsing_result(date_el_array, :year_int_from_date_str) return result if result year_int, _ignore = self.class.earliest_year_int(date_el_array) year_int if year_int end |
#year_sort_str(date_el_array) ⇒ String
given the passed date elements, look for a single keyDate and use it if there is one;
otherwise pick earliest parseable date
104 105 106 107 108 109 |
# File 'lib/stanford-mods/origin_info.rb', line 104 def year_sort_str(date_el_array) result = date_parsing_result(date_el_array, :sortable_year_string_from_date_str) return result if result sortable_str, _ignore = self.class.earliest_year_str(date_el_array) sortable_str if sortable_str end |