Class: Stanford::Mods::Record
- Inherits:
-
Mods::Record
- Object
- Mods::Record
- Stanford::Mods::Record
- Defined in:
- lib/stanford-mods/geo_spatial.rb,
lib/stanford-mods.rb,
lib/stanford-mods/name.rb,
lib/stanford-mods/origin_info.rb,
lib/stanford-mods/searchworks.rb,
lib/stanford-mods/physical_location.rb,
lib/stanford-mods/searchworks_subjects.rb
Overview
NON-SearchWorks specific wranglings of MODS cartographics metadata
Constant Summary collapse
- COLLECTOR_ROLE_URI =
'http://id.loc.gov/vocabulary/relators/col'
Class Method Summary collapse
-
.date_is_approximate?(date_element) ⇒ Boolean
NOTE: legal values for MODS date elements with attribute qualifier are ‘approximate’, ‘inferred’ or ‘questionable’.
-
.earliest_year_int(date_el_array) ⇒ Object
get earliest parseable year (as an Integer) from the passed date elements.
-
.earliest_year_str(date_el_array) ⇒ Object
get earliest parseable year (as a String) from the passed date elements.
-
.keyDate(elements) ⇒ Nokogiri::XML::Element?
given a set of date elements, return the single element with attribute keyDate=“yes” or return nil if no elements have attribute keyDate=“yes”, or if multiple elements have keyDate=“yes”.
-
.remove_approximate(nodeset) ⇒ Array<Nokogiri::XML::Element>
remove Elements from NodeSet if they have a qualifier attribute of ‘approximate’ or ‘questionable’.
Instance Method Summary collapse
-
#additional_authors_w_dates ⇒ Object
all names, in display form, except the main_author names will be the display_value_w_date form see Mods::Record.name in nom_terminology for details on the display_value algorithm.
-
#box ⇒ Object
return box number (note: single valued and might be something like 35A) data in location/physicalLocation or in relatedItem/location/physicalLocation so use _location to get the data from either one of them TODO: should it be hierarchical series/box/folder?.
-
#catkey ⇒ String
Value with the numeric catkey in it, or nil if none exists.
-
#collectors_w_dates ⇒ Object
Array of Strings, each containing the computed display value of a personal name with the role of Collector (see mods gem nom_terminology for display value algorithm).
- #coordinates ⇒ Object
- #coordinates_as_bbox ⇒ Object (also: #point_bbox)
- #coordinates_as_envelope ⇒ Object
-
#date_created_elements(ignore_approximate = false) ⇒ Array<Nokogiri::XML::Element>
return /originInfo/dateCreated elements in MODS records.
-
#date_issued_elements(ignore_approximate = false) ⇒ Array<Nokogiri::XML::Element>
return /originInfo/dateIssued elements in MODS records.
- #druid ⇒ Object
- #druid=(new_druid) ⇒ Object
-
#era_facet ⇒ Array<String>
subject/temporal values with trailing comma, semicolon, and backslash (and any preceding spaces) removed.
-
#folder ⇒ Object
returns folder number (note: single valued) data in location/physicalLocation or in relatedItem/location/physicalLocation so use _location to get the data from either one of them TODO: should it be hierarchical series/box/folder?.
-
#format ⇒ Array[String]
deprecated
Deprecated.
-
kept for backwards compatibility but not part of SW UI redesign work Summer 2014
-
-
#format_main ⇒ Array[String]
select one or more format values from the controlled vocabulary per JVine Summer 2014 searchworks-solr-lb.stanford.edu:8983/solr/select?facet.field=format_main_ssim&rows=0&facet.sort=index github.com/sul-dlss/stanford-mods/issues/66 - For geodata, the resource type should be only Map and not include Software, multimedia.
-
#geographic_facet ⇒ Array<String>
geographic_search values with trailing comma, semicolon, and backslash (and any preceding spaces) removed.
-
#geographic_search ⇒ Array<String>
Values are the contents of: subject/geographic subject/hierarchicalGeographic subject/geographicCode (only include the translated value if it isn’t already present from other mods geo fields).
-
#includes_marc_relator_collector_role?(role_node) ⇒ Boolean
True if there is a MARC relator collector role assigned.
-
#location ⇒ Object
return entire contents of physicalLocation (note: single valued) but only if it has series, accession, box or folder data data in location/physicalLocation or in relatedItem/location/physicalLocation so use _location to get the data from either one of them TODO: should it be hierarchical series/box/folder?.
-
#main_author_w_date ⇒ String
the first encountered <mods><name> element with marcrelator flavor role of ‘Creator’ or ‘Author’.
- #main_author_w_date_test ⇒ Object
-
#non_collector_person_authors ⇒ Object
FIXME: this is broken if there are multiple role codes and some of them are not marcrelator.
-
#place ⇒ Object
—- old date parsing methods used downstream of gem; will be deprecated/replaced with new date parsing methods.
-
#pub_date_display ⇒ String
For the date display only, the first place to look is in the dates without encoding=marc array.
-
#pub_date_facet ⇒ Array[String]
Values for the pub date facet.
-
#pub_date_sort ⇒ Object
creates a date suitable for sorting.
-
#pub_year_display_str(ignore_approximate = false) ⇒ Object
return a single string intended for display of pub year 0 < year < 1000: add A.D.
-
#pub_year_int(ignore_approximate = false) ⇒ Integer
return pub year as an Integer prefer dateIssued (any) before dateCreated (any) before dateCaptured (any) look for a keyDate and use it if there is one; otherwise pick earliest date.
-
#pub_year_sort_str(ignore_approximate = false) ⇒ String
deprecated
Deprecated.
use pub_year_int
-
#series ⇒ Object
return series/accession ‘number’ (note: single valued) data in location/physicalLocation or in relatedItem/location/physicalLocation so use _location to get the data from either one of them TODO: should it be hierarchical series/box/folder?.
-
#subject_all_search ⇒ Array<String>
Values are the contents of: all subject subelements except subject/cartographic plus genre top level element.
-
#subject_other_search ⇒ Array<String>
Values are the contents of: subject/name subject/occupation - no subelements subject/titleInfo.
-
#subject_other_subvy_search ⇒ Array<String>
Values are the contents of: subject/temporal subject/genre.
-
#sw_addl_authors ⇒ Array<String>
Values for author_7xx_search field.
-
#sw_addl_titles ⇒ Array<String>
this includes all titles except.
-
#sw_corporate_authors ⇒ Array<String>
Values for author_corp_display.
-
#sw_full_title ⇒ String
Value for title_245_search, title_full_display.
-
#sw_full_title_without_commas ⇒ Object
deprecated
Deprecated.
in favor of sw_title_display
-
#sw_genre ⇒ Array[String]
return values for the genre facet in SearchWorks github.com/sul-dlss/stanford-mods/issues/66 Limit genre values to Government document, Conference proceedings, Technical report and Thesis/Dissertation.
-
#sw_geographic_search(sep = ' ') ⇒ Array<String>
Values are the contents of: subject/geographic subject/hierarchicalGeographic subject/geographicCode (only include the translated value if it isn’t already present from other mods geo fields).
-
#sw_impersonal_authors ⇒ Array<String>
return the display_value_w_date for all <mods><name> elements that do not have type=‘personal’.
-
#sw_language_facet ⇒ Object
include langagues known to SearchWorks; try to error correct when possible (e.g. when ISO-639 disagrees with MARC standard).
-
#sw_logger ⇒ Object
—- PUBLICATION (place, year) —- see origin_info.rb (as all this information comes from top level originInfo element) —- end PUBLICATION (place, year) —-.
-
#sw_main_author ⇒ String
Value for author_1xx_search field.
-
#sw_meeting_authors ⇒ Array<String>
Values for author_meeting_display.
-
#sw_person_authors ⇒ Array<String>
Values for author_person_facet, author_person_display.
-
#sw_short_title ⇒ String
Value for title_245a_search field.
-
#sw_sort_author ⇒ String
Returns a sortable version of the main_author: main_author + sorting title which is the mods approximation of the value created for a marc record.
-
#sw_sort_title ⇒ String
Returns a sortable version of the main title.
-
#sw_subject_names(sep = ', ') ⇒ Array<String>
Values are the contents of: subject/name/namePart “Values from namePart subelements should be concatenated in the order they appear (e.g. ”Shakespeare, William, 1564-1616“)”.
-
#sw_subject_titles(sep = ' ') ⇒ Array<String>
Values are the contents of: subject/titleInfo/(subelements).
-
#sw_title_display ⇒ String
like sw_full_title without trailing ,/;:.
-
#topic_facet ⇒ Array<String>
Values are the contents of: subject/topic subject/name subject/title subject/occupation with trailing comma, semicolon, and backslash (and any preceding spaces) removed.
-
#topic_search ⇒ Array<String>
Values are the contents of: mods/genre mods/subject/topic.
-
#year_display_str(date_el_array) ⇒ String
given the passed date elements, look for a single keyDate and use it if there is one; otherwise pick earliest parseable date.
-
#year_int(date_el_array) ⇒ Integer
given the passed date elements, look for a single keyDate and use it if there is one; otherwise pick earliest parseable date.
-
#year_sort_str(date_el_array) ⇒ String
given the passed date elements, look for a single keyDate and use it if there is one; otherwise pick earliest parseable date.
Class Method Details
.date_is_approximate?(date_element) ⇒ Boolean
NOTE: legal values for MODS date elements with attribute qualifier are
'approximate', 'inferred' or 'questionable'
151 152 153 154 |
# File 'lib/stanford-mods/origin_info.rb', line 151 def self.date_is_approximate?(date_element) qualifier = date_element["qualifier"] if date_element.respond_to?('[]') qualifier == 'approximate' || qualifier == 'questionable' end |
.earliest_year_int(date_el_array) ⇒ Object
get earliest parseable year (as an Integer) from the passed date elements
161 162 163 |
# File 'lib/stanford-mods/origin_info.rb', line 161 def self.earliest_year_int(date_el_array) earliest_year(date_el_array, :year_int_from_date_str) end |
.earliest_year_str(date_el_array) ⇒ Object
get earliest parseable year (as a String) from the passed date elements
170 171 172 |
# File 'lib/stanford-mods/origin_info.rb', line 170 def self.earliest_year_str(date_el_array) earliest_year(date_el_array, :sortable_year_string_from_date_str) end |
.keyDate(elements) ⇒ Nokogiri::XML::Element?
given a set of date elements, return the single element with attribute keyDate=“yes”
or return nil if no elements have attribute keyDate="yes", or if multiple elements have keyDate="yes"
133 134 135 136 |
# File 'lib/stanford-mods/origin_info.rb', line 133 def self.keyDate(elements) keyDates = elements.select { |node| node["keyDate"] == 'yes' } keyDates.first if keyDates.size == 1 end |
.remove_approximate(nodeset) ⇒ Array<Nokogiri::XML::Element>
remove Elements from NodeSet if they have a qualifier attribute of ‘approximate’ or ‘questionable’
142 143 144 |
# File 'lib/stanford-mods/origin_info.rb', line 142 def self.remove_approximate(nodeset) nodeset.select { |node| node unless date_is_approximate?(node) } end |
Instance Method Details
#additional_authors_w_dates ⇒ Object
all names, in display form, except the main_author
names will be the display_value_w_date form
see Mods::Record.name in nom_terminology for details on the display_value algorithm
39 40 41 42 43 44 45 46 |
# File 'lib/stanford-mods/name.rb', line 39 def results = [] @mods_ng_xml.plain_name.each { |n| results << n.display_value_w_date } results.delete() results end |
#box ⇒ Object
return box number (note: single valued and might be something like 35A)
data in location/physicalLocation or in relatedItem/location/physicalLocation
so use _location to get the data from either one of them
TODO: should it be hierarchical series/box/folder?
13 14 15 16 17 18 19 20 21 22 23 24 |
# File 'lib/stanford-mods/physical_location.rb', line 13 def box # _location.physicalLocation should find top level and relatedItem box_num = @mods_ng_xml._location.physicalLocation.map do |node| val = node.text # note that this will also find Flatbox or Flat-box match_data = val.match(/Box ?:? ?([^,|(Folder)]+)/i) match_data[1].strip if match_data.present? end.compact # There should only be one box box_num.first end |
#catkey ⇒ String
Returns value with the numeric catkey in it, or nil if none exists.
374 375 376 377 378 379 380 |
# File 'lib/stanford-mods/searchworks.rb', line 374 def catkey catkey = self.term_values([:record_info, :recordIdentifier]) if catkey && catkey.length > 0 return catkey.first.tr('a', '') # ensure catkey is numeric only end nil end |
#collectors_w_dates ⇒ Object
Returns Array of Strings, each containing the computed display value of a personal name with the role of Collector (see mods gem nom_terminology for display value algorithm).
64 65 66 67 68 69 70 71 72 73 |
# File 'lib/stanford-mods/name.rb', line 64 def collectors_w_dates result = [] @mods_ng_xml.personal_name.each do |n| next if n.role.size.zero? n.role.each { |r| result << n.display_value_w_date if includes_marc_relator_collector_role?(r) } end result unless result.empty? end |
#coordinates ⇒ Object
9 10 11 |
# File 'lib/stanford-mods/geo_spatial.rb', line 9 def coordinates Array(@mods_ng_xml.subject.cartographics.coordinates).map(&:text) end |
#coordinates_as_bbox ⇒ Object Also known as: point_bbox
21 22 23 24 25 26 27 |
# File 'lib/stanford-mods/geo_spatial.rb', line 21 def coordinates_as_bbox coordinates.map do |n| c = Stanford::Mods::Coordinate.new(n) c.as_bbox if c.valid? end.compact end |
#coordinates_as_envelope ⇒ Object
13 14 15 16 17 18 19 |
# File 'lib/stanford-mods/geo_spatial.rb', line 13 def coordinates_as_envelope coordinates.map do |n| c = Stanford::Mods::Coordinate.new(n) c.as_envelope if c.valid? end.compact end |
#date_created_elements(ignore_approximate = false) ⇒ Array<Nokogiri::XML::Element>
return /originInfo/dateCreated elements in MODS records
113 114 115 116 117 |
# File 'lib/stanford-mods/origin_info.rb', line 113 def date_created_elements(ignore_approximate=false) date_created_nodeset = @mods_ng_xml.origin_info.dateCreated return self.class.remove_approximate(date_created_nodeset) if ignore_approximate date_created_nodeset.to_a end |
#date_issued_elements(ignore_approximate = false) ⇒ Array<Nokogiri::XML::Element>
return /originInfo/dateIssued elements in MODS records
123 124 125 126 127 |
# File 'lib/stanford-mods/origin_info.rb', line 123 def date_issued_elements(ignore_approximate=false) date_issued_nodeset = @mods_ng_xml.origin_info.dateIssued return self.class.remove_approximate(date_issued_nodeset) if ignore_approximate date_issued_nodeset.to_a end |
#druid ⇒ Object
386 387 388 |
# File 'lib/stanford-mods/searchworks.rb', line 386 def druid @druid ? @druid : 'Unknown item' end |
#druid=(new_druid) ⇒ Object
382 383 384 |
# File 'lib/stanford-mods/searchworks.rb', line 382 def druid=(new_druid) @druid = new_druid end |
#era_facet ⇒ Array<String>
subject/temporal values with trailing comma, semicolon, and backslash (and any preceding spaces) removed
104 105 106 |
# File 'lib/stanford-mods/searchworks_subjects.rb', line 104 def era_facet subject_temporal.map { |val| val.sub(/[\\,;]$/, '').strip } unless !subject_temporal end |
#folder ⇒ Object
returns folder number (note: single valued)
data in location/physicalLocation or in relatedItem/location/physicalLocation
so use _location to get the data from either one of them
TODO: should it be hierarchical series/box/folder?
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
# File 'lib/stanford-mods/physical_location.rb', line 30 def folder # _location.physicalLocation should find top level and relatedItem folder_num = @mods_ng_xml._location.physicalLocation.map do |node| val = node.text match_data = if val =~ /\|/ # we assume the data is pipe-delimited, and may contain commas within values val.match(/Folder ?:? ?([^|]+)/) else # the data should be comma-delimited, and may not contain commas within values val.match(/Folder ?:? ?([^,]+)/) end match_data[1].strip if match_data.present? end.compact # There should be one folder folder_num.first end |
#format ⇒ Array[String]
-
kept for backwards compatibility but not part of SW UI redesign work Summer 2014
select one or more format values from the controlled vocabulary here:
http://searchworks-solr-lb.stanford.edu:8983/solr/select?facet.field=format&rows=0&facet.sort=index
@deprecated: this is no longer used in SW, Revs or Spotlight Jan 2016
227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 |
# File 'lib/stanford-mods/searchworks.rb', line 227 def format val = [] types = self.term_values(:typeOfResource) if types genres = self.term_values(:genre) issuance = self.term_values([:origin_info,:issuance]) types.each do |type| case type when 'cartographic' val << 'Map/Globe' when 'mixed material' val << 'Manuscript/Archive' when 'moving image' val << 'Video' when 'notated music' val << 'Music - Score' when 'software, multimedia' val << 'Computer File' when 'sound recording-musical' val << 'Music - Recording' when 'sound recording-nonmusical', 'sound recording' val << 'Sound Recording' when 'still image' val << 'Image' when 'text' val << 'Book' if issuance && issuance.include?('monographic') book_genres = ['book chapter', 'Book chapter', 'Book Chapter', 'issue brief', 'Issue brief', 'Issue Brief', 'librettos', 'Librettos', 'project report', 'Project report', 'Project Report', 'technical report', 'Technical report', 'Technical Report', 'working paper', 'Working paper', 'Working Paper'] val << 'Book' if genres && !(genres & book_genres).empty? conf_pub = ['conference publication', 'Conference publication', 'Conference Publication'] val << 'Conference Proceedings' if genres && !(genres & conf_pub).empty? val << 'Journal/Periodical' if issuance && issuance.include?('continuing') article = ['article', 'Article'] val << 'Journal/Periodical' if genres && !(genres & article).empty? stu_proj_rpt = ['student project report', 'Student project report', 'Student Project report', 'Student Project Report'] val << 'Other' if genres && !(genres & stu_proj_rpt).empty? thesis = ['thesis', 'Thesis'] val << 'Thesis' if genres && !(genres & thesis).empty? when 'three dimensional object' val << 'Other' end end end val.uniq end |
#format_main ⇒ Array[String]
select one or more format values from the controlled vocabulary per JVine Summer 2014
http://searchworks-solr-lb.stanford.edu:8983/solr/select?facet.field=format_main_ssim&rows=0&facet.sort=index
github.com/sul-dlss/stanford-mods/issues/66 - For geodata, the resource type should be only Map and not include Software, multimedia.
282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 |
# File 'lib/stanford-mods/searchworks.rb', line 282 def format_main val = [] types = self.term_values(:typeOfResource) article_genres = ['article', 'Article', 'book chapter', 'Book chapter', 'Book Chapter', 'issue brief', 'Issue brief', 'Issue Brief', 'project report', 'Project report', 'Project Report', 'student project report', 'Student project report', 'Student Project report', 'Student Project Report', 'technical report', 'Technical report', 'Technical Report', 'working paper', 'Working paper', 'Working Paper' ] book_genres = ['conference publication', 'Conference publication', 'Conference Publication', 'instruction', 'Instruction', 'librettos', 'Librettos', 'thesis', 'Thesis' ] if types genres = self.term_values(:genre) issuance = self.term_values([:origin_info, :issuance]) types.each do |type| case type when 'cartographic' val << 'Map' val.delete 'Software/Multimedia' when 'mixed material' val << 'Archive/Manuscript' when 'moving image' val << 'Video' when 'notated music' val << 'Music score' when 'software, multimedia' if genres && (genres.include?('dataset') || genres.include?('Dataset')) val << 'Dataset' elsif (!val.include?('Map')) val << 'Software/Multimedia' end when 'sound recording-musical' val << 'Music recording' when 'sound recording-nonmusical', 'sound recording' val << 'Sound recording' when 'still image' val << 'Image' when 'text' val << 'Book' if genres && !(genres & article_genres).empty? val << 'Book' if issuance && issuance.include?('monographic') val << 'Book' if genres && !(genres & book_genres).empty? val << 'Journal/Periodical' if issuance && issuance.include?('continuing') val << 'Archived website' if genres && genres.include?('archived website') when 'three dimensional object' val << 'Object' end end end val.uniq end |
#geographic_facet ⇒ Array<String>
geographic_search values with trailing comma, semicolon, and backslash (and any preceding spaces) removed
98 99 100 |
# File 'lib/stanford-mods/searchworks_subjects.rb', line 98 def geographic_facet geographic_search.map { |val| val.sub(/[\\,;]$/, '').strip } unless !geographic_search end |
#geographic_search ⇒ Array<String>
Values are the contents of:
subject/geographic
subject/hierarchicalGeographic
subject/geographicCode (only include the translated value if it isn't already present from other mods geo fields)
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
# File 'lib/stanford-mods/searchworks_subjects.rb', line 113 def geographic_search @geographic_search ||= begin result = self.sw_geographic_search # TODO: this should go into stanford-mods ... but then we have to set that gem up with a Logger # print a message for any unrecognized encodings xvals = self.subject.geographicCode.translated_value codes = self.term_values([:subject, :geographicCode]) if codes && codes.size > xvals.size self.subject.geographicCode.each { |n| if n. != 'marcgac' && n. != 'marccountry' sw_logger.info("#{druid} has subject geographicCode element with untranslated encoding (#{n.}): #{n.to_xml}") end } end # FIXME: stanford-mods should be returning [], not nil ... return nil if !result || result.empty? result end end |
#includes_marc_relator_collector_role?(role_node) ⇒ Boolean
Returns true if there is a MARC relator collector role assigned.
79 80 81 82 |
# File 'lib/stanford-mods/name.rb', line 79 def includes_marc_relator_collector_role?(role_node) (role_node..include?('marcrelator') && role_node.value.include?('Collector')) || role_node.roleTerm.valueURI.first == COLLECTOR_ROLE_URI end |
#location ⇒ Object
return entire contents of physicalLocation (note: single valued)
but only if it has series, accession, box or folder data
data in location/physicalLocation or in relatedItem/location/physicalLocation
so use _location to get the data from either one of them
TODO: should it be hierarchical series/box/folder?
55 56 57 58 59 60 61 62 63 |
# File 'lib/stanford-mods/physical_location.rb', line 55 def location # _location.physicalLocation should find top level and relatedItem loc = @mods_ng_xml._location.physicalLocation.map do |node| node.text if node.text.match(/.*(Series)|(Accession)|(Folder)|(Box).*/i) end.compact # There should only be one location loc.first end |
#main_author_w_date ⇒ String
the first encountered <mods><name> element with marcrelator flavor role of ‘Creator’ or ‘Author’. if no marcrelator ‘Creator’ or ‘Author’, the first name without a role. if no name without a role, then nil see Mods::Record.name in nom_terminology for details on the display_value algorithm
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
# File 'lib/stanford-mods/name.rb', line 16 def result = nil first_wo_role = nil @mods_ng_xml.plain_name.each { |n| if n.role.size == 0 first_wo_role ||= n end n.role.each { |r| if r..include?('marcrelator') && (r.value.include?('Creator') || r.value.include?('Author')) result ||= n.display_value_w_date end } } if !result && first_wo_role result = first_wo_role.display_value_w_date end result end |
#main_author_w_date_test ⇒ Object
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
# File 'lib/stanford-mods/searchworks.rb', line 100 def result = nil first_wo_role = nil self.plain_name.each { |n| if n.role.size == 0 first_wo_role ||= n end n.role.each { |r| if r..include?('marcrelator') && (r.value.include?('Creator') || r.value.include?('Author')) result ||= n.display_value_w_date end } } if !result && first_wo_role result = first_wo_role.display_value_w_date end result end |
#non_collector_person_authors ⇒ Object
FIXME: this is broken if there are multiple role codes and some of them are not marcrelator
51 52 53 54 55 56 57 58 59 60 |
# File 'lib/stanford-mods/name.rb', line 51 def result = [] @mods_ng_xml.personal_name.map do |n| next if n.role.size.zero? n.role.each { |r| result << n.display_value_w_date unless includes_marc_relator_collector_role?(r) } end result unless result.empty? end |
#place ⇒ Object
—- old date parsing methods used downstream of gem; will be deprecated/replaced with new date parsing methods
226 227 228 229 |
# File 'lib/stanford-mods/origin_info.rb', line 226 def place vals = self.term_values([:origin_info, :place, :placeTerm]) vals end |
#pub_date_display ⇒ String
For the date display only, the first place to look is in the dates without encoding=marc array. If no such dates, select the first date in the dates_marc_encoding array. Otherwise return nil @deprecated: DO NOT USE: this is no longer used in SW, Revs or Spotlight Jan 2016
272 273 274 275 276 |
# File 'lib/stanford-mods/origin_info.rb', line 272 def pub_date_display return dates_no_marc_encoding.first unless dates_no_marc_encoding.empty? return dates_marc_encoding.first unless dates_marc_encoding.empty? nil end |
#pub_date_facet ⇒ Array[String]
Values for the pub date facet. This is less strict than the 4 year date requirements for pub_date Jan 2016: used to populate Solr pub_date field for Spotlight and SearchWorks
Spotlight: pub_date field should be replaced by pub_year_w_approx_isi and pub_year_no_approx_isi
SearchWorks: pub_date field used for display in search results and show view; for sorting nearby-on-shelf
these could be done with more approp fields/methods (pub_year_int for sorting; new pub year methods to populate field)
TODO: prob should deprecate this in favor of pub_year_display_str;
need head-to-head testing with pub_year_display_str
239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 |
# File 'lib/stanford-mods/origin_info.rb', line 239 def pub_date_facet if pub_date if pub_date.start_with?('-') return (pub_date.to_i + 1000).to_s + ' B.C.' end if pub_date.include? '--' cent = pub_date[0, 2].to_i cent += 1 cent = cent.to_s + 'th century' return cent else return pub_date end end nil end |
#pub_date_sort ⇒ Object
creates a date suitable for sorting. Guarnteed to be 4 digits or nil @deprecated: use pub_year_int, or pub_year_sort_str if you must have a string (why?)
258 259 260 261 262 263 264 265 266 |
# File 'lib/stanford-mods/origin_info.rb', line 258 def pub_date_sort if pub_date pd = pub_date pd = '0' + pd if pd.length == 3 pd = pd.gsub('--', '00') end fail "pub_date_sort was about to return a non 4 digit value #{pd}!" if pd && pd.length != 4 pd end |
#pub_year_display_str(ignore_approximate = false) ⇒ Object
return a single string intended for display of pub year 0 < year < 1000: add A.D. suffix year < 0: add B.C. suffix. (‘-5’ => ‘5 B.C.’, ‘700 B.C.’ => ‘700 B.C.’) 195u => 195x 19uu => 19xx
'-5' => '5 B.C.'
'700 B.C.' => '700 B.C.'
'7th century' => '7th century'
date ranges? prefer dateIssued (any) before dateCreated (any) before dateCaptured (any)
look for a keyDate and use it if there is one; otherwise pick earliest date
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
# File 'lib/stanford-mods/origin_info.rb', line 51 def pub_year_display_str(ignore_approximate = false) single_pub_year(ignore_approximate, :year_display_str) # TODO: want range displayed when start and end points # TODO: also want best year in year_isi fields # get_main_title_date # https://github.com/sul-dlss/SearchWorks/blob/7d4d870a9d450fed8b081c38dc3dbd590f0b706e/app/helpers/results_document_helper.rb#L8-L46 #"publication_year_isi" => "Publication date", <-- do it already #"beginning_year_isi" => "Beginning date", #"earliest_year_isi" => "Earliest date", #"earliest_poss_year_isi" => "Earliest possible date", #"ending_year_isi" => "Ending date", #"latest_year_isi" => "Latest date", #"latest_poss_year_isi" => "Latest possible date", #"production_year_isi" => "Production date", #"original_year_isi" => "Original date", #"copyright_year_isi" => "Copyright date"} %> #"creation_year_isi" => "Creation date", <-- do it already #{}"release_year_isi" => "Release date", #{}"reprint_year_isi" => "Reprint/reissue date", #{}"other_year_isi" => "Date", end |
#pub_year_int(ignore_approximate = false) ⇒ Integer
return pub year as an Integer prefer dateIssued (any) before dateCreated (any) before dateCaptured (any)
look for a keyDate and use it if there is one; otherwise pick earliest date
22 23 24 |
# File 'lib/stanford-mods/origin_info.rb', line 22 def pub_year_int(ignore_approximate = false) single_pub_year(ignore_approximate, :year_int) end |
#pub_year_sort_str(ignore_approximate = false) ⇒ String
use pub_year_int
return a single string intended for lexical sorting for pub date prefer dateIssued (any) before dateCreated (any) before dateCaptured (any)
look for a keyDate and use it if there is one; otherwise pick earliest date
34 35 36 |
# File 'lib/stanford-mods/origin_info.rb', line 34 def pub_year_sort_str(ignore_approximate = false) single_pub_year(ignore_approximate, :year_sort_str) end |
#series ⇒ Object
return series/accession ‘number’ (note: single valued)
data in location/physicalLocation or in relatedItem/location/physicalLocation
so use _location to get the data from either one of them
TODO: should it be hierarchical series/box/folder?
69 70 71 72 73 74 75 76 77 78 79 80 |
# File 'lib/stanford-mods/physical_location.rb', line 69 def series # _location.physicalLocation should find top level and relatedItem series_num = @mods_ng_xml._location.physicalLocation.map do |node| val = node.text # feigenbaum uses 'Accession' match_data = val.match(/(?:(?:Series)|(?:Accession)):? ([^,|]+)/i) match_data[1].strip if match_data.present? end.compact # There should be only one series series_num.first end |
#subject_all_search ⇒ Array<String>
Values are the contents of:
all subject subelements except subject/cartographic plus genre top level element
171 172 173 174 175 176 177 |
# File 'lib/stanford-mods/searchworks_subjects.rb', line 171 def subject_all_search vals = topic_search ? Array.new(topic_search) : [] vals.concat(geographic_search) if geographic_search vals.concat(subject_other_search) if subject_other_search vals.concat(subject_other_subvy_search) if subject_other_subvy_search vals.empty? ? nil : vals end |
#subject_other_search ⇒ Array<String>
Values are the contents of:
subject/name
subject/occupation - no subelements
subject/titleInfo
140 141 142 143 144 145 146 147 |
# File 'lib/stanford-mods/searchworks_subjects.rb', line 140 def subject_other_search @subject_other_search ||= begin vals = subject_occupations ? Array.new(subject_occupations) : [] vals.concat(subject_names) if subject_names vals.concat(subject_titles) if subject_titles vals.empty? ? nil : vals end end |
#subject_other_subvy_search ⇒ Array<String>
Values are the contents of:
subject/temporal
subject/genre
153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
# File 'lib/stanford-mods/searchworks_subjects.rb', line 153 def subject_other_subvy_search @subject_other_subvy_search ||= begin vals = subject_temporal ? Array.new(subject_temporal) : [] gvals = self.term_values([:subject, :genre]) vals.concat(gvals) if gvals # print a message for any temporal encodings self.subject.temporal.each { |n| sw_logger.info("#{druid} has subject temporal element with untranslated encoding: #{n.to_xml}") if !n.encoding.empty? } vals.empty? ? nil : vals end end |
#sw_addl_authors ⇒ Array<String>
Returns values for author_7xx_search field.
64 65 66 |
# File 'lib/stanford-mods/searchworks.rb', line 64 def end |
#sw_addl_titles ⇒ Array<String>
this includes all titles except
180 181 182 |
# File 'lib/stanford-mods/searchworks.rb', line 180 def sw_addl_titles full_titles.select { |s| s !~ Regexp.new(Regexp.escape(sw_short_title)) } end |
#sw_corporate_authors ⇒ Array<String>
Returns values for author_corp_display.
80 81 82 83 |
# File 'lib/stanford-mods/searchworks.rb', line 80 def val = @mods_ng_xml.plain_name.select {|n| n.type_at == 'corporate'}.map { |n| n.display_value_w_date } val end |
#sw_full_title ⇒ String
Returns value for title_245_search, title_full_display.
130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 |
# File 'lib/stanford-mods/searchworks.rb', line 130 def sw_full_title outer_nodes = @mods_ng_xml.title_info outer_node = outer_nodes ? outer_nodes.first : nil if outer_node nonSort = outer_node.nonSort.text.strip.empty? ? nil : outer_node.nonSort.text.strip title = outer_node.title.text.strip.empty? ? nil : outer_node.title.text.strip preSubTitle = nonSort ? [nonSort, title].compact.join(" ") : title preSubTitle.sub!(/:$/, '') if preSubTitle # remove trailing colon subTitle = outer_node.subTitle.text.strip preParts = subTitle.empty? ? preSubTitle : preSubTitle + " : " + subTitle preParts.sub!(/\.$/, '') if preParts # remove trailing period partName = outer_node.partName.text.strip unless outer_node.partName.text.strip.empty? partNumber = outer_node.partNumber.text.strip unless outer_node.partNumber.text.strip.empty? partNumber.sub!(/,$/, '') if partNumber # remove trailing comma if partNumber && partName parts = partNumber + ", " + partName elsif partNumber parts = partNumber elsif partName parts = partName end parts.sub!(/\.$/, '') if parts result = parts ? preParts + ". " + parts : preParts result += "." if !result.match(/[[:punct:]]$/) result.strip! result = nil if result.empty? result else nil end end |
#sw_full_title_without_commas ⇒ Object
in favor of sw_title_display
remove trailing commas
202 203 204 205 206 |
# File 'lib/stanford-mods/searchworks.rb', line 202 def sw_full_title_without_commas result = self.sw_full_title result.sub!(/,$/, '') if result result end |
#sw_genre ⇒ Array[String]
return values for the genre facet in SearchWorks github.com/sul-dlss/stanford-mods/issues/66 Limit genre values to Government document, Conference proceedings, Technical report and Thesis/Dissertation
343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 |
# File 'lib/stanford-mods/searchworks.rb', line 343 def sw_genre val = [] genres = self.term_values(:genre) types = self.term_values(:typeOfResource) if genres if genres.include?('thesis') || genres.include?('Thesis') val << 'Thesis/Dissertation' end conf_pub = ['conference publication', 'Conference publication', 'Conference Publication'] if !(genres & conf_pub).empty? if types && types.include?('text') val << 'Conference proceedings' end end gov_pub = ['government publication', 'Government publication', 'Government Publication'] if !(genres & gov_pub).empty? if types && types.include?('text') val << 'Government document' end end tech_rpt = ['technical report', 'Technical report', 'Technical Report'] if !(genres & tech_rpt).empty? if types && types.include?('text') val << 'Technical report' end end end val.uniq end |
#sw_geographic_search(sep = ' ') ⇒ Array<String>
Values are the contents of:
subject/geographic
subject/hierarchicalGeographic
subject/geographicCode (only include the translated value if it isn't already present from other mods geo fields)
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
# File 'lib/stanford-mods/searchworks_subjects.rb', line 16 def sw_geographic_search(sep = ' ') result = term_values([:subject, :geographic]) || [] # hierarchicalGeographic has sub elements @mods_ng_xml.subject.hierarchicalGeographic.each { |hg_node| hg_vals = [] hg_node.element_children.each { |e| hg_vals << e.text unless e.text.empty? } result << hg_vals.join(sep) unless hg_vals.empty? } trans_code_vals = @mods_ng_xml.subject.geographicCode.translated_value if trans_code_vals trans_code_vals.each { |val| result << val if !result.include?(val) } end result end |
#sw_impersonal_authors ⇒ Array<String>
return the display_value_w_date for all <mods><name> elements that do not have type=‘personal’
75 76 77 |
# File 'lib/stanford-mods/searchworks.rb', line 75 def @mods_ng_xml.plain_name.select {|n| n.type_at != 'personal'}.map { |n| n.display_value_w_date } end |
#sw_language_facet ⇒ Object
include langagues known to SearchWorks; try to error correct when possible (e.g. when ISO-639 disagrees with MARC standard)
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
# File 'lib/stanford-mods/searchworks.rb', line 14 def sw_language_facet result = [] @mods_ng_xml.language.each { |n| # get languageTerm codes and add their translations to the result n.code_term.each { |ct| if ct..match(/^iso639/) begin vals = ct.text.split(/[,|\ ]/).reject {|x| x.strip.length == 0 } vals.each do |v| iso639_val = ISO_639.find(v.strip).english_name if SEARCHWORKS_LANGUAGES.has_value?(iso639_val) result << iso639_val else result << SEARCHWORKS_LANGUAGES[v.strip] end end rescue # TODO: this should be written to a logger p "Couldn't find english name for #{ct.text}" end else vals = ct.text.split(/[,|\ ]/).reject {|x| x.strip.length == 0 } vals.each do |v| result << SEARCHWORKS_LANGUAGES[v.strip] end end } # add languageTerm text values n.text_term.each { |tt| val = tt.text.strip result << val if val.length > 0 && SEARCHWORKS_LANGUAGES.has_value?(val) } # add language values that aren't in languageTerm subelement if n.languageTerm.size == 0 result << n.text if SEARCHWORKS_LANGUAGES.has_value?(n.text) end } result.uniq end |
#sw_logger ⇒ Object
—- PUBLICATION (place, year) —- see origin_info.rb (as all this information comes from top level originInfo element) —- end PUBLICATION (place, year) —-
218 219 220 |
# File 'lib/stanford-mods/searchworks.rb', line 218 def sw_logger @logger ||= Logger.new(STDOUT) end |
#sw_main_author ⇒ String
Returns value for author_1xx_search field.
59 60 61 |
# File 'lib/stanford-mods/searchworks.rb', line 59 def end |
#sw_meeting_authors ⇒ Array<String>
Returns values for author_meeting_display.
86 87 88 |
# File 'lib/stanford-mods/searchworks.rb', line 86 def @mods_ng_xml.plain_name.select {|n| n.type_at == 'conference'}.map { |n| n.display_value_w_date } end |
#sw_person_authors ⇒ Array<String>
Returns values for author_person_facet, author_person_display.
69 70 71 |
# File 'lib/stanford-mods/searchworks.rb', line 69 def personal_names_w_dates end |
#sw_short_title ⇒ String
Returns value for title_245a_search field.
125 126 127 |
# File 'lib/stanford-mods/searchworks.rb', line 125 def sw_short_title short_titles ? short_titles.first : nil end |
#sw_sort_author ⇒ String
Returns a sortable version of the main_author:
main_author + sorting title
which is the mods approximation of the value created for a marc record
94 95 96 97 98 |
# File 'lib/stanford-mods/searchworks.rb', line 94 def # substitute java Character.MAX_CODE_POINT for nil main_author so missing main authors sort last val = '' + ( ? : "\u{10FFFF} ") + ( sort_title ? sort_title : '') val.gsub(/[[:punct:]]*/, '').strip end |
#sw_sort_title ⇒ String
Returns a sortable version of the main title
186 187 188 189 190 191 192 193 194 195 196 197 198 |
# File 'lib/stanford-mods/searchworks.rb', line 186 def sw_sort_title # get nonSort piece outer_nodes = @mods_ng_xml.title_info outer_node = outer_nodes ? outer_nodes.first : nil if outer_node nonSort = outer_node.nonSort.text.strip.empty? ? nil : outer_node.nonSort.text.strip end val = '' + ( sw_full_title ? sw_full_title : '') val.sub!(Regexp.new("^" + Regexp.escape(nonSort)), '') if nonSort val.gsub!(/[[:punct:]]*/, '').strip val.squeeze(" ").strip end |
#sw_subject_names(sep = ', ') ⇒ Array<String>
Values are the contents of:
subject/name/namePart
"Values from namePart subelements should be concatenated in the order they appear (e.g. "Shakespeare, William, 1564-1616")"
43 44 45 46 47 48 49 50 |
# File 'lib/stanford-mods/searchworks_subjects.rb', line 43 def sw_subject_names(sep = ', ') result = [] @mods_ng_xml.subject.name_el.select { |n_el| n_el.namePart }.each { |name_el_w_np| parts = name_el_w_np.namePart.map { |npn| npn.text unless npn.text.empty? }.compact result << parts.join(sep).strip unless parts.empty? } result end |
#sw_subject_titles(sep = ' ') ⇒ Array<String>
Values are the contents of:
subject/titleInfo/(subelements)
56 57 58 59 60 61 62 63 |
# File 'lib/stanford-mods/searchworks_subjects.rb', line 56 def sw_subject_titles(sep = ' ') result = [] @mods_ng_xml.subject.titleInfo.each { |ti_el| parts = ti_el.element_children.map { |el| el.text unless el.text.empty? }.compact result << parts.join(sep).strip unless parts.empty? } result end |
#sw_title_display ⇒ String
like sw_full_title without trailing ,/;:. spec from solrmarc-sw sw_index.properties
title_display = custom, removeTrailingPunct(245abdefghijklmnopqrstuvwxyz, [\\\\,/;:], ([A-Za-z]{4}|[0-9]{3}|\\)|\\,))
169 170 171 172 173 174 175 176 |
# File 'lib/stanford-mods/searchworks.rb', line 169 def sw_title_display result = sw_full_title ? sw_full_title : nil if result result.sub!(/[\.,;:\/\\]+$/, '') result.strip! end result end |
#topic_facet ⇒ Array<String>
Values are the contents of:
subject/topic
subject/name
subject/title
subject/occupation
with trailing comma, semicolon, and backslash (and any preceding spaces) removed
84 85 86 87 88 89 90 91 92 93 94 |
# File 'lib/stanford-mods/searchworks_subjects.rb', line 84 def topic_facet vals = subject_topics ? Array.new(subject_topics) : [] vals.concat(subject_names) if subject_names vals.concat(subject_titles) if subject_titles vals.concat(subject_occupations) if subject_occupations vals.map! { |val| v = val.sub(/[\\,;]$/, '') v.strip } vals.empty? ? nil : vals end |
#topic_search ⇒ Array<String>
Values are the contents of:
mods/genre
mods/subject/topic
69 70 71 72 73 74 75 |
# File 'lib/stanford-mods/searchworks_subjects.rb', line 69 def topic_search @topic_search ||= begin vals = self.term_values(:genre) || [] vals.concat(subject_topics) if subject_topics vals.empty? ? nil : vals end end |
#year_display_str(date_el_array) ⇒ String
given the passed date elements, look for a single keyDate and use it if there is one;
otherwise pick earliest parseable date
80 81 82 83 84 85 |
# File 'lib/stanford-mods/origin_info.rb', line 80 def year_display_str(date_el_array) result = date_parsing_result(date_el_array, :date_str_for_display) return result if result _ignore, orig_str_to_parse = self.class.earliest_year_str(date_el_array) DateParsing.date_str_for_display(orig_str_to_parse) if orig_str_to_parse end |
#year_int(date_el_array) ⇒ Integer
given the passed date elements, look for a single keyDate and use it if there is one;
otherwise pick earliest parseable date
91 92 93 94 95 96 |
# File 'lib/stanford-mods/origin_info.rb', line 91 def year_int(date_el_array) result = date_parsing_result(date_el_array, :year_int_from_date_str) return result if result year_int, _ignore = self.class.earliest_year_int(date_el_array) year_int if year_int end |
#year_sort_str(date_el_array) ⇒ String
given the passed date elements, look for a single keyDate and use it if there is one;
otherwise pick earliest parseable date
102 103 104 105 106 107 |
# File 'lib/stanford-mods/origin_info.rb', line 102 def year_sort_str(date_el_array) result = date_parsing_result(date_el_array, :sortable_year_string_from_date_str) return result if result sortable_str, _ignore = self.class.earliest_year_str(date_el_array) sortable_str if sortable_str end |