Class: Stanford::Mods::Record
- Inherits:
-
Mods::Record
- Object
- Mods::Record
- Stanford::Mods::Record
- Defined in:
- lib/stanford-mods.rb,
lib/stanford-mods/name.rb,
lib/stanford-mods/geo_spatial.rb,
lib/stanford-mods/searchworks.rb,
lib/stanford-mods/physical_location.rb
Constant Summary collapse
- COLLECTOR_ROLE_URI =
'http://id.loc.gov/vocabulary/relators/col'
Instance Method Summary collapse
-
#additional_authors_w_dates ⇒ Object
all names, in display form, except the main_author names will be the display_value_w_date form see Mods::Record.name in nom_terminology for details on the display_value algorithm.
-
#box ⇒ Object
return box number (note: single valued and might be something like 35A) data in location/physicalLocation or in relatedItem/location/physicalLocation so use _location to get the data from either one of them TODO: should it be hierarchical series/box/folder?.
-
#catkey ⇒ String
Value with the numeric catkey in it, or nil if none exists.
-
#collectors_w_dates ⇒ Object
Array of Strings, each containing the computed display value of a personal name with the role of Collector (see mods gem nom_terminology for display value algorithm).
- #coordinates ⇒ Object
-
#dates_marc_encoding ⇒ Array<String>
Dates from dateIssued and dateCreated tags from origin_info with encoding=“marc”.
-
#dates_no_marc_encoding ⇒ Array<String>
Dates from dateIssued and dateCreated tags from origin_info with encoding not “marc”.
- #druid ⇒ Object
- #druid=(new_druid) ⇒ Object
-
#era_facet ⇒ Array<String>
subject/temporal values with trailing comma, semicolon, and backslash (and any preceding spaces) removed.
-
#folder ⇒ Object
returns folder number (note: single valued) data in location/physicalLocation or in relatedItem/location/physicalLocation so use _location to get the data from either one of them TODO: should it be hierarchical series/box/folder?.
-
#format ⇒ Array[String]
deprecated
Deprecated.
-
kept for backwards compatibility but not part of SW UI redesign work Summer 2014
-
-
#format_main ⇒ Array[String]
select one or more format values from the controlled vocabulary per JVine Summer 2014 searchworks-solr-lb.stanford.edu:8983/solr/select?facet.field=format_main_ssim&rows=0&facet.sort=index.
-
#geographic_facet ⇒ Array<String>
geographic_search values with trailing comma, semicolon, and backslash (and any preceding spaces) removed.
-
#geographic_search ⇒ Array<String>
Values are the contents of: subject/geographic subject/hierarchicalGeographic subject/geographicCode (only include the translated value if it isn’t already present from other mods geo fields).
-
#get_bc_year(dates) ⇒ Object
get the 3 digit BC year, return it as a negative, so -700 for 300 BC.
-
#get_double_digit_century(dates) ⇒ Object
get a double digit century like ‘12th century’ from the date array.
-
#get_plain_four_digit_year(dates) ⇒ Object
get a 4 digit year like 1865 from the date array.
-
#get_single_digit_century(dates) ⇒ Object
get a single digit century like ‘9th century’ from the date array.
-
#get_three_digit_year(dates) ⇒ Object
get a 3 digit year like 965 from the date array.
-
#get_u_year(dates) ⇒ Object
If a year has a “u” in it, replace instances of u with 0.
-
#includes_marc_relator_collector_role?(role_node) ⇒ Boolean
True if there is a MARC relator collector role assigned.
- #is_date?(object) ⇒ Boolean
- #is_number?(object) ⇒ Boolean
-
#location ⇒ Object
return entire contents of physicalLocation (note: single valued) but only if it has series, accession, box or folder data data in location/physicalLocation or in relatedItem/location/physicalLocation so use _location to get the data from either one of them TODO: should it be hierarchical series/box/folder?.
-
#main_author_w_date ⇒ String
the first encountered <mods><name> element with marcrelator flavor role of ‘Creator’ or ‘Author’.
- #main_author_w_date_test ⇒ Object
-
#non_collector_person_authors ⇒ Object
FIXME: this is broken if there are multiple role codes and some of them are not marcrelator.
-
#parse_dates_from_originInfo ⇒ Object
Populate @dates_marc_encoding and @dates_no_marc_encoding from dateIssued and dateCreated tags from origin_info with and without encoding=marc.
-
#place ⇒ Object
—- PUBLICATION (place, year) —-.
- #point_bbox ⇒ Object
-
#pub_date ⇒ String
The year the object was published, , filtered based on max_pub_date and min_pub_date from the config file.
-
#pub_date_display ⇒ String
For the date display only, the first place to look is in the dates without encoding=marc array.
-
#pub_date_facet ⇒ Array[String]
Values for the pub date facet.
-
#pub_date_sort ⇒ Object
creates a date suitable for sorting.
-
#pub_dates ⇒ Array<String>
For the date indexing, sorting and faceting, the first place to look is in the dates with encoding=marc array.
-
#pub_year ⇒ String
Get the publish year from mods.
-
#series ⇒ Object
return series/accession ‘number’ (note: single valued) data in location/physicalLocation or in relatedItem/location/physicalLocation so use _location to get the data from either one of them TODO: should it be hierarchical series/box/folder?.
-
#subject_all_search ⇒ Array<String>
Values are the contents of: all subject subelements except subject/cartographic plus genre top level element.
-
#subject_names ⇒ Object
convenience method for subject/name/namePart values (to avoid parsing the mods for the same thing multiple times).
-
#subject_occupations ⇒ Object
convenience method for subject/occupation values (to avoid parsing the mods for the same thing multiple times).
-
#subject_other_search ⇒ Array<String>
Values are the contents of: subject/name subject/occupation - no subelements subject/titleInfo.
-
#subject_other_subvy_search ⇒ Array<String>
Values are the contents of: subject/temporal subject/genre.
-
#subject_temporal ⇒ Object
convenience method for subject/temporal values (to avoid parsing the mods for the same thing multiple times).
-
#subject_titles ⇒ Object
convenience method for subject/titleInfo values (to avoid parsing the mods for the same thing multiple times).
-
#subject_topics ⇒ Object
convenience method for subject/topic values (to avoid parsing the mods for the same thing multiple times).
-
#sw_addl_authors ⇒ Array<String>
Values for author_7xx_search field.
-
#sw_addl_titles ⇒ Array<String>
this includes all titles except.
-
#sw_corporate_authors ⇒ Array<String>
Values for author_corp_display.
-
#sw_full_title ⇒ String
Value for title_245_search, title_full_display.
-
#sw_full_title_without_commas ⇒ Object
deprecated
Deprecated.
in favor of sw_title_display
-
#sw_genre ⇒ Array[String]
return values for the genre facet in SearchWorks.
-
#sw_geographic_search(sep = ' ') ⇒ Array<String>
Values are the contents of: subject/geographic subject/hierarchicalGeographic subject/geographicCode (only include the translated value if it isn’t already present from other mods geo fields).
-
#sw_impersonal_authors ⇒ Array<String>
return the display_value_w_date for all <mods><name> elements that do not have type=‘personal’.
-
#sw_language_facet ⇒ Object
include langagues known to SearchWorks; try to error correct when possible (e.g. when ISO-639 disagrees with MARC standard).
-
#sw_logger ⇒ Object
—- end PUBLICATION (place, year) —-.
-
#sw_main_author ⇒ String
Value for author_1xx_search field.
-
#sw_meeting_authors ⇒ Array<String>
Values for author_meeting_display.
-
#sw_person_authors ⇒ Array<String>
Values for author_person_facet, author_person_display.
-
#sw_short_title ⇒ String
Value for title_245a_search field.
-
#sw_sort_author ⇒ String
Returns a sortable version of the main_author: main_author + sorting title which is the mods approximation of the value created for a marc record.
-
#sw_sort_title ⇒ String
Returns a sortable version of the main title.
-
#sw_subject_names(sep = ', ') ⇒ Array<String>
Values are the contents of: subject/name/namePart “Values from namePart subelements should be concatenated in the order they appear (e.g. ”Shakespeare, William, 1564-1616“)”.
-
#sw_subject_titles(sep = ' ') ⇒ Array<String>
Values are the contents of: subject/titleInfo/(subelements).
-
#sw_title_display ⇒ String
like sw_full_title without trailing ,/;:.
-
#topic_facet ⇒ Array<String>
Values are the contents of: subject/topic subject/name subject/title subject/occupation with trailing comma, semicolon, and backslash (and any preceding spaces) removed.
-
#topic_search ⇒ Array<String>
Values are the contents of: mods/genre mods/subject/topic.
Instance Method Details
#additional_authors_w_dates ⇒ Object
all names, in display form, except the main_author
names will be the display_value_w_date form
see Mods::Record.name in nom_terminology for details on the display_value algorithm
39 40 41 42 43 44 45 46 |
# File 'lib/stanford-mods/name.rb', line 39 def results = [] @mods_ng_xml.plain_name.each { |n| results << n.display_value_w_date } results.delete() results end |
#box ⇒ Object
return box number (note: single valued and might be something like 35A)
data in location/physicalLocation or in /location/physicalLocation
so use _location to get the data from either one of them
TODO: should it be hierarchical series/box/folder?
13 14 15 16 17 18 19 20 21 22 23 24 |
# File 'lib/stanford-mods/physical_location.rb', line 13 def box # _location.physicalLocation should find top level and relatedItem box_num = @mods_ng_xml._location.physicalLocation.map do |node| val = node.text # note that this will also find Flatbox or Flat-box match_data = val.match(/Box ?:? ?([^,|(Folder)]+)/i) match_data[1].strip if match_data.present? end.compact # There should only be one box box_num.first end |
#catkey ⇒ String
Returns value with the numeric catkey in it, or nil if none exists.
635 636 637 638 639 640 641 |
# File 'lib/stanford-mods/searchworks.rb', line 635 def catkey catkey=self.term_values([:record_info,:recordIdentifier]) if catkey and catkey.length>0 return catkey.first.gsub('a','') #need to ensure catkey is numeric only end nil end |
#collectors_w_dates ⇒ Object
Returns Array of Strings, each containing the computed display value of a personal name with the role of Collector (see mods gem nom_terminology for display value algorithm).
64 65 66 67 68 69 70 71 72 73 |
# File 'lib/stanford-mods/name.rb', line 64 def collectors_w_dates result = [] @mods_ng_xml.personal_name.each do |n| next if n.role.size.zero? n.role.each { |r| result << n.display_value_w_date if includes_marc_relator_collector_role?(r) } end result unless result.empty? end |
#coordinates ⇒ Object
11 12 13 |
# File 'lib/stanford-mods/geo_spatial.rb', line 11 def coordinates Array(@mods_ng_xml.subject.cartographics.coordinates).map(&:text) end |
#dates_marc_encoding ⇒ Array<String>
Returns dates from dateIssued and dateCreated tags from origin_info with encoding=“marc”.
790 791 792 793 794 795 |
# File 'lib/stanford-mods/searchworks.rb', line 790 def dates_marc_encoding @dates_marc_encoding ||= begin parse_dates_from_originInfo @dates_marc_encoding end end |
#dates_no_marc_encoding ⇒ Array<String>
Returns dates from dateIssued and dateCreated tags from origin_info with encoding not “marc”.
798 799 800 801 802 803 |
# File 'lib/stanford-mods/searchworks.rb', line 798 def dates_no_marc_encoding @dates_no_marc_encoding ||= begin parse_dates_from_originInfo @dates_no_marc_encoding end end |
#druid ⇒ Object
645 646 647 |
# File 'lib/stanford-mods/searchworks.rb', line 645 def druid @druid ? @druid : 'Unknown item' end |
#druid=(new_druid) ⇒ Object
642 643 644 |
# File 'lib/stanford-mods/searchworks.rb', line 642 def druid= new_druid @druid=new_druid end |
#era_facet ⇒ Array<String>
subject/temporal values with trailing comma, semicolon, and backslash (and any preceding spaces) removed
305 306 307 |
# File 'lib/stanford-mods/searchworks.rb', line 305 def era_facet subject_temporal.map { |val| val.sub(/[\\,;]$/, '').strip } unless !subject_temporal end |
#folder ⇒ Object
returns folder number (note: single valued)
data in location/physicalLocation or in /location/physicalLocation
so use _location to get the data from either one of them
TODO: should it be hierarchical series/box/folder?
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
# File 'lib/stanford-mods/physical_location.rb', line 30 def folder # _location.physicalLocation should find top level and relatedItem folder_num = @mods_ng_xml._location.physicalLocation.map do |node| val = node.text match_data = if val =~ /\|/ # we assume the data is pipe-delimited, and may contain commas within values val.match(/Folder ?:? ?([^|]+)/) else # the data should be comma-delimited, and may not contain commas within values val.match(/Folder ?:? ?([^,]+)/) end match_data[1].strip if match_data.present? end.compact # There should be one folder folder_num.first end |
#format ⇒ Array[String]
-
kept for backwards compatibility but not part of SW UI redesign work Summer 2014
select one or more format values from the controlled vocabulary here:
http://searchworks-solr-lb.stanford.edu:8983/solr/select?facet.field=format&rows=0&facet.sort=index
503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 |
# File 'lib/stanford-mods/searchworks.rb', line 503 def format val = [] types = self.term_values(:typeOfResource) if types genres = self.term_values(:genre) issuance = self.term_values([:origin_info,:issuance]) types.each do |type| case type when 'cartographic' val << 'Map/Globe' when 'mixed material' val << 'Manuscript/Archive' when 'moving image' val << 'Video' when 'notated music' val << 'Music - Score' when 'software, multimedia' val << 'Computer File' when 'sound recording-musical' val << 'Music - Recording' when 'sound recording-nonmusical', 'sound recording' val << 'Sound Recording' when 'still image' val << 'Image' when 'text' val << 'Book' if issuance and issuance.include? 'monographic' book_genres = ['book chapter', 'Book chapter', 'Book Chapter', 'issue brief', 'Issue brief', 'Issue Brief', 'librettos', 'Librettos', 'project report', 'Project report', 'Project Report', 'technical report', 'Technical report', 'Technical Report', 'working paper', 'Working paper', 'Working Paper'] val << 'Book' if genres and !(genres & book_genres).empty? conf_pub = ['conference publication', 'Conference publication', 'Conference Publication'] val << 'Conference Proceedings' if genres and !(genres & conf_pub).empty? val << 'Journal/Periodical' if issuance and issuance.include? 'continuing' article = ['article', 'Article'] val << 'Journal/Periodical' if genres and !(genres & article).empty? stu_proj_rpt = ['student project report', 'Student project report', 'Student Project report', 'Student Project Report'] val << 'Other' if genres and !(genres & stu_proj_rpt).empty? thesis = ['thesis', 'Thesis'] val << 'Thesis' if genres and !(genres & thesis).empty? when 'three dimensional object' val << 'Other' end end end val.uniq end |
#format_main ⇒ Array[String]
select one or more format values from the controlled vocabulary per JVine Summer 2014
http://searchworks-solr-lb.stanford.edu:8983/solr/select?facet.field=format_main_ssim&rows=0&facet.sort=index
556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 |
# File 'lib/stanford-mods/searchworks.rb', line 556 def format_main val = [] types = self.term_values(:typeOfResource) article_genres = ['article', 'Article', 'book chapter', 'Book chapter', 'Book Chapter', 'issue brief', 'Issue brief', 'Issue Brief', 'project report', 'Project report', 'Project Report', 'student project report', 'Student project report', 'Student Project report', 'Student Project Report', 'technical report', 'Technical report', 'Technical Report', 'working paper', 'Working paper', 'Working Paper' ] book_genres = ['conference publication', 'Conference publication', 'Conference Publication', 'instruction', 'Instruction', 'librettos', 'Librettos', 'thesis', 'Thesis' ] if types genres = self.term_values(:genre) issuance = self.term_values([:origin_info,:issuance]) types.each do |type| case type when 'cartographic' val << 'Map' when 'mixed material' val << 'Archive/Manuscript' when 'moving image' val << 'Video' when 'notated music' val << 'Music score' when 'software, multimedia' if genres and (genres.include?('dataset') || genres.include?('Dataset')) val << 'Dataset' else val << 'Software/Multimedia' end when 'sound recording-musical' val << 'Music recording' when 'sound recording-nonmusical', 'sound recording' val << 'Sound recording' when 'still image' val << 'Image' when 'text' val << 'Book' if genres and !(genres & article_genres).empty? val << 'Book' if issuance and issuance.include? 'monographic' val << 'Book' if genres and !(genres & book_genres).empty? val << 'Journal/Periodical' if issuance and issuance.include? 'continuing' when 'three dimensional object' val << 'Object' end end end val.uniq end |
#geographic_facet ⇒ Array<String>
geographic_search values with trailing comma, semicolon, and backslash (and any preceding spaces) removed
299 300 301 |
# File 'lib/stanford-mods/searchworks.rb', line 299 def geographic_facet geographic_search.map { |val| val.sub(/[\\,;]$/, '').strip } unless !geographic_search end |
#geographic_search ⇒ Array<String>
Values are the contents of:
subject/geographic
subject/hierarchicalGeographic
subject/geographicCode (only include the translated value if it isn't already present from other mods geo fields)
314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 |
# File 'lib/stanford-mods/searchworks.rb', line 314 def geographic_search @geographic_search ||= begin result = self.sw_geographic_search # TODO: this should go into stanford-mods ... but then we have to set that gem up with a Logger # print a message for any unrecognized encodings xvals = self.subject.geographicCode.translated_value codes = self.term_values([:subject, :geographicCode]) if codes && codes.size > xvals.size self.subject.geographicCode.each { |n| if n. != 'marcgac' && n. != 'marccountry' sw_logger.info("#{druid} has subject geographicCode element with untranslated encoding (#{n.authority}): #{n.to_xml}") end } end # FIXME: stanford-mods should be returning [], not nil ... return nil if !result || result.empty? result end end |
#get_bc_year(dates) ⇒ Object
get the 3 digit BC year, return it as a negative, so -700 for 300 BC. Other methods will translate it to proper display, this is good for sorting.
754 755 756 757 758 759 760 761 762 763 |
# File 'lib/stanford-mods/searchworks.rb', line 754 def get_bc_year dates dates.each do |f_date| matches=f_date.scan(/\d{3} B.C./) if matches.length > 0 bc_year=matches.first[0..2] return (bc_year.to_i-1000).to_s end end return nil end |
#get_double_digit_century(dates) ⇒ Object
get a double digit century like ‘12th century’ from the date array
720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 |
# File 'lib/stanford-mods/searchworks.rb', line 720 def get_double_digit_century dates dates.each do |f_date| matches=f_date.scan(/\d{2}th/) if matches.length == 1 @pub_year=((matches.first[0,2].to_i)-1).to_s+'--' return @pub_year end #if there are multiples, check for ones with CE after them if matches.length > 0 matches.each do |match| pos = f_date.index(Regexp.new(match+'...CE')) pos = pos ? pos.to_i : f_date.index(Regexp.new(match+' century CE')) pos = pos ? pos.to_i : 0 if f_date.include?(match+' CE') or pos > 0 @pub_year=((match[0,2].to_i) - 1).to_s+'--' return @pub_year end end end end return nil end |
#get_plain_four_digit_year(dates) ⇒ Object
get a 4 digit year like 1865 from the date array
677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 |
# File 'lib/stanford-mods/searchworks.rb', line 677 def get_plain_four_digit_year dates dates.each do |f_date| matches=f_date.scan(/\d{4}/) if matches.length == 1 @pub_year=matches.first return matches.first else #if there are multiples, check for ones with CE after them matches.each do |match| #look for things like '1865-6 CE' pos = f_date.index(Regexp.new(match+'...CE')) pos = pos ? pos.to_i : 0 if f_date.include?(match+' CE') or pos > 0 @pub_year=match return match end end return matches.first end end return nil end |
#get_single_digit_century(dates) ⇒ Object
get a single digit century like ‘9th century’ from the date array
766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 |
# File 'lib/stanford-mods/searchworks.rb', line 766 def get_single_digit_century dates dates.each do |f_date| matches=f_date.scan(/\d{1}th/) if matches.length == 1 @pub_year=((matches.first[0,2].to_i)-1).to_s+'--' return @pub_year end #if there are multiples, check for ones with CE after them if matches.length > 0 matches.each do |match| pos = f_date.index(Regexp.new(match+'...CE')) pos = pos ? pos.to_i : f_date.index(Regexp.new(match+' century CE')) pos = pos ? pos.to_i : 0 if f_date.include?(match+' CE') or pos > 0 @pub_year=((match[0,1].to_i) - 1).to_s+'--' return @pub_year end end end end return nil end |
#get_three_digit_year(dates) ⇒ Object
get a 3 digit year like 965 from the date array
744 745 746 747 748 749 750 751 752 |
# File 'lib/stanford-mods/searchworks.rb', line 744 def get_three_digit_year dates dates.each do |f_date| matches=f_date.scan(/\d{3}/) if matches.length > 0 return matches.first end end return nil end |
#get_u_year(dates) ⇒ Object
If a year has a “u” in it, replace instances of u with 0
703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 |
# File 'lib/stanford-mods/searchworks.rb', line 703 def get_u_year dates dates.each do |f_date| # Single digit u notation matches = f_date.scan(/\d{3}u/) if matches.length == 1 return matches.first.gsub('u','0') end # Double digit u notation matches = f_date.scan(/\d{2}u{2}/) if matches.length == 1 return matches.first.gsub('u','-') end end return nil end |
#includes_marc_relator_collector_role?(role_node) ⇒ Boolean
Returns true if there is a MARC relator collector role assigned.
79 80 81 82 |
# File 'lib/stanford-mods/name.rb', line 79 def includes_marc_relator_collector_role?(role_node) (role_node..include?('marcrelator') && role_node.value.include?('Collector')) || role_node.roleTerm.valueURI.first == COLLECTOR_ROLE_URI end |
#is_date?(object) ⇒ Boolean
409 410 411 |
# File 'lib/stanford-mods/searchworks.rb', line 409 def is_date?(object) true if Date.parse(object) rescue false end |
#is_number?(object) ⇒ Boolean
406 407 408 |
# File 'lib/stanford-mods/searchworks.rb', line 406 def is_number?(object) true if Integer(object) rescue false end |
#location ⇒ Object
return entire contents of physicalLocation (note: single valued)
but only if it has series, accession, box or folder data
data in location/physicalLocation or in /location/physicalLocation
so use _location to get the data from either one of them
TODO: should it be hierarchical series/box/folder?
55 56 57 58 59 60 61 62 63 |
# File 'lib/stanford-mods/physical_location.rb', line 55 def location # _location.physicalLocation should find top level and relatedItem loc = @mods_ng_xml._location.physicalLocation.map do |node| node.text if node.text.match(/.*(Series)|(Accession)|(Folder)|(Box).*/i) end.compact # There should only be one location loc.first end |
#main_author_w_date ⇒ String
the first encountered <mods><name> element with marcrelator flavor role of ‘Creator’ or ‘Author’. if no marcrelator ‘Creator’ or ‘Author’, the first name without a role. if no name without a role, then nil see Mods::Record.name in nom_terminology for details on the display_value algorithm
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
# File 'lib/stanford-mods/name.rb', line 16 def result = nil first_wo_role = nil @mods_ng_xml.plain_name.each { |n| if n.role.size == 0 first_wo_role ||= n end n.role.each { |r| if r..include?('marcrelator') && (r.value.include?('Creator') || r.value.include?('Author')) result ||= n.display_value_w_date end } } if !result && first_wo_role result = first_wo_role.display_value_w_date end result end |
#main_author_w_date_test ⇒ Object
99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
# File 'lib/stanford-mods/searchworks.rb', line 99 def result = nil first_wo_role = nil self.plain_name.each { |n| if n.role.size == 0 first_wo_role ||= n end n.role.each { |r| if r..include?('marcrelator') && (r.value.include?('Creator') || r.value.include?('Author')) result ||= n.display_value_w_date end } } if !result && first_wo_role result = first_wo_role.display_value_w_date end result end |
#non_collector_person_authors ⇒ Object
FIXME: this is broken if there are multiple role codes and some of them are not marcrelator
51 52 53 54 55 56 57 58 59 60 |
# File 'lib/stanford-mods/name.rb', line 51 def result = [] @mods_ng_xml.personal_name.map do |n| next if n.role.size.zero? n.role.each { |r| result << n.display_value_w_date unless includes_marc_relator_collector_role?(r) } end result unless result.empty? end |
#parse_dates_from_originInfo ⇒ Object
Populate @dates_marc_encoding and @dates_no_marc_encoding from dateIssued and dateCreated tags from origin_info with and without encoding=marc
807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 |
# File 'lib/stanford-mods/searchworks.rb', line 807 def parse_dates_from_originInfo @dates_marc_encoding = [] @dates_no_marc_encoding = [] self.origin_info.dateIssued.each { |di| if di.encoding == "marc" @dates_marc_encoding << di.text else @dates_no_marc_encoding << di.text end } self.origin_info.dateCreated.each { |dc| if dc.encoding == "marc" @dates_marc_encoding << dc.text else @dates_no_marc_encoding << dc.text end } end |
#place ⇒ Object
—- PUBLICATION (place, year) —-
383 384 385 386 |
# File 'lib/stanford-mods/searchworks.rb', line 383 def place vals = self.term_values([:origin_info,:place,:placeTerm]) vals end |
#point_bbox ⇒ Object
15 16 17 18 19 20 21 |
# File 'lib/stanford-mods/geo_spatial.rb', line 15 def point_bbox coordinates.map do |n| matches = n.match(/^\(([^)]+)\)\.?$/) next unless matches coord_to_bbox(matches[1]) end.compact end |
#pub_date ⇒ String
The year the object was published, , filtered based on max_pub_date and min_pub_date from the config file
469 470 471 |
# File 'lib/stanford-mods/searchworks.rb', line 469 def pub_date pub_year || nil end |
#pub_date_display ⇒ String
For the date display only, the first place to look is in the dates without encoding=marc array. If no such dates, select the first date in the dates_marc_encoding array. Otherwise return nil
391 392 393 394 395 |
# File 'lib/stanford-mods/searchworks.rb', line 391 def pub_date_display return dates_no_marc_encoding.first unless dates_no_marc_encoding.empty? return dates_marc_encoding.first unless dates_marc_encoding.empty? return nil end |
#pub_date_facet ⇒ Array[String]
Values for the pub date facet. This is less strict than the 4 year date requirements for pub_date
475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 |
# File 'lib/stanford-mods/searchworks.rb', line 475 def pub_date_facet if pub_date if pub_date.start_with?('-') return (pub_date.to_i + 1000).to_s + ' B.C.' end if pub_date.include? '--' cent=pub_date[0,2].to_i cent+=1 cent=cent.to_s+'th century' return cent else return pub_date end else nil end end |
#pub_date_sort ⇒ Object
creates a date suitable for sorting. Guarnteed to be 4 digits or nil
454 455 456 457 458 459 460 461 462 463 464 465 |
# File 'lib/stanford-mods/searchworks.rb', line 454 def pub_date_sort pd=nil if pub_date pd=pub_date if pd.length == 3 pd='0'+pd end pd=pd.gsub('--','00') end raise "pub_date_sort was about to return a non 4 digit value #{pd}!" if pd and pd.length !=4 pd end |
#pub_dates ⇒ Array<String>
For the date indexing, sorting and faceting, the first place to look is in the dates with encoding=marc array. If that doesn’t exist, look in the dates without encoding=marc array. Otherwise return nil
400 401 402 403 404 |
# File 'lib/stanford-mods/searchworks.rb', line 400 def pub_dates return dates_marc_encoding unless dates_marc_encoding.empty? return dates_no_marc_encoding unless dates_no_marc_encoding.empty? return nil end |
#pub_year ⇒ String
Get the publish year from mods
415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 |
# File 'lib/stanford-mods/searchworks.rb', line 415 def pub_year #use the cached year if there is one if @pub_year if @pub_year == '' return nil end return @pub_year end dates = pub_dates if dates pruned_dates = [] dates.each do |f_date| #remove ? and [] if (f_date.length == 4 && f_date.end_with?('?')) pruned_dates << f_date.gsub('?','0') else pruned_dates << f_date.gsub('?','').gsub('[','').gsub(']','') end end #try to find a date starting with the most normal date formats and progressing to more wonky ones @pub_year = get_plain_four_digit_year pruned_dates return @pub_year if @pub_year # Check for years in u notation, e.g., 198u @pub_year = get_u_year pruned_dates return @pub_year if @pub_year @pub_year = get_double_digit_century pruned_dates return @pub_year if @pub_year @pub_year = get_bc_year pruned_dates return @pub_year if @pub_year @pub_year = get_three_digit_year pruned_dates return @pub_year if @pub_year @pub_year = get_single_digit_century pruned_dates return @pub_year if @pub_year end @pub_year='' return nil end |
#series ⇒ Object
return series/accession ‘number’ (note: single valued)
data in location/physicalLocation or in /location/physicalLocation
so use _location to get the data from either one of them
TODO: should it be hierarchical series/box/folder?
69 70 71 72 73 74 75 76 77 78 79 80 |
# File 'lib/stanford-mods/physical_location.rb', line 69 def series # _location.physicalLocation should find top level and relatedItem series_num = @mods_ng_xml._location.physicalLocation.map do |node| val = node.text # feigenbaum uses 'Accession' match_data = val.match(/(?:(?:Series)|(?:Accession)):? ([^,|]+)/i) match_data[1].strip if match_data.present? end.compact # There should be only one series series_num.first end |
#subject_all_search ⇒ Array<String>
Values are the contents of:
all subject subelements except subject/cartographic plus genre top level element
372 373 374 375 376 377 378 |
# File 'lib/stanford-mods/searchworks.rb', line 372 def subject_all_search vals = topic_search ? Array.new(topic_search) : [] vals.concat(geographic_search) if geographic_search vals.concat(subject_other_search) if subject_other_search vals.concat(subject_other_subvy_search) if subject_other_subvy_search vals.empty? ? nil : vals end |
#subject_names ⇒ Object
convenience method for subject/name/namePart values (to avoid parsing the mods for the same thing multiple times)
652 653 654 |
# File 'lib/stanford-mods/searchworks.rb', line 652 def subject_names @subject_names ||= self.sw_subject_names end |
#subject_occupations ⇒ Object
convenience method for subject/occupation values (to avoid parsing the mods for the same thing multiple times)
657 658 659 |
# File 'lib/stanford-mods/searchworks.rb', line 657 def subject_occupations @subject_occupations ||= self.term_values([:subject, :occupation]) end |
#subject_other_search ⇒ Array<String>
Values are the contents of:
subject/name
subject/occupation - no subelements
subject/titleInfo
341 342 343 344 345 346 347 348 |
# File 'lib/stanford-mods/searchworks.rb', line 341 def subject_other_search @subject_other_search ||= begin vals = subject_occupations ? Array.new(subject_occupations) : [] vals.concat(subject_names) if subject_names vals.concat(subject_titles) if subject_titles vals.empty? ? nil : vals end end |
#subject_other_subvy_search ⇒ Array<String>
Values are the contents of:
subject/temporal
subject/genre
354 355 356 357 358 359 360 361 362 363 364 365 366 367 |
# File 'lib/stanford-mods/searchworks.rb', line 354 def subject_other_subvy_search @subject_other_subvy_search ||= begin vals = subject_temporal ? Array.new(subject_temporal) : [] gvals = self.term_values([:subject, :genre]) vals.concat(gvals) if gvals # print a message for any temporal encodings self.subject.temporal.each { |n| sw_logger.info("#{druid} has subject temporal element with untranslated encoding: #{n.to_xml}") if !n.encoding.empty? } vals.empty? ? nil : vals end end |
#subject_temporal ⇒ Object
convenience method for subject/temporal values (to avoid parsing the mods for the same thing multiple times)
662 663 664 |
# File 'lib/stanford-mods/searchworks.rb', line 662 def subject_temporal @subject_temporal ||= self.term_values([:subject, :temporal]) end |
#subject_titles ⇒ Object
convenience method for subject/titleInfo values (to avoid parsing the mods for the same thing multiple times)
667 668 669 |
# File 'lib/stanford-mods/searchworks.rb', line 667 def subject_titles @subject_titles ||= self.sw_subject_titles end |
#subject_topics ⇒ Object
convenience method for subject/topic values (to avoid parsing the mods for the same thing multiple times)
672 673 674 |
# File 'lib/stanford-mods/searchworks.rb', line 672 def subject_topics @subject_topics ||= self.term_values([:subject, :topic]) end |
#sw_addl_authors ⇒ Array<String>
Returns values for author_7xx_search field.
63 64 65 |
# File 'lib/stanford-mods/searchworks.rb', line 63 def end |
#sw_addl_titles ⇒ Array<String>
this includes all titles except
179 180 181 |
# File 'lib/stanford-mods/searchworks.rb', line 179 def sw_addl_titles full_titles.select { |s| s !~ Regexp.new(Regexp.escape(sw_short_title)) } end |
#sw_corporate_authors ⇒ Array<String>
Returns values for author_corp_display.
79 80 81 82 |
# File 'lib/stanford-mods/searchworks.rb', line 79 def val = @mods_ng_xml.plain_name.select {|n| n.type_at == 'corporate'}.map { |n| n.display_value_w_date } val end |
#sw_full_title ⇒ String
Returns value for title_245_search, title_full_display.
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 |
# File 'lib/stanford-mods/searchworks.rb', line 129 def sw_full_title outer_nodes = @mods_ng_xml.title_info outer_node = outer_nodes ? outer_nodes.first : nil if outer_node nonSort = outer_node.nonSort.text.strip.empty? ? nil : outer_node.nonSort.text.strip title = outer_node.title.text.strip.empty? ? nil : outer_node.title.text.strip preSubTitle = nonSort ? [nonSort, title].compact.join(" ") : title preSubTitle.sub!(/:$/, '') if preSubTitle # remove trailing colon subTitle = outer_node.subTitle.text.strip preParts = subTitle.empty? ? preSubTitle : preSubTitle + " : " + subTitle preParts.sub!(/\.$/, '') if preParts # remove trailing period partName = outer_node.partName.text.strip unless outer_node.partName.text.strip.empty? partNumber = outer_node.partNumber.text.strip unless outer_node.partNumber.text.strip.empty? partNumber.sub!(/,$/, '') if partNumber # remove trailing comma if partNumber && partName parts = partNumber + ", " + partName elsif partNumber parts = partNumber elsif partName parts = partName end parts.sub!(/\.$/, '') if parts result = parts ? preParts + ". " + parts : preParts result += "." if !result.match(/[[:punct:]]$/) result.strip! result = nil if result.empty? result else nil end end |
#sw_full_title_without_commas ⇒ Object
in favor of sw_title_display
remove trailing commas
201 202 203 204 205 |
# File 'lib/stanford-mods/searchworks.rb', line 201 def sw_full_title_without_commas result = self.sw_full_title result.sub!(/,$/, '') if result result end |
#sw_genre ⇒ Array[String]
return values for the genre facet in SearchWorks
612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 |
# File 'lib/stanford-mods/searchworks.rb', line 612 def sw_genre val = [] genres = self.term_values(:genre) if genres val << genres.map(&:capitalize) val.flatten! if !val.empty? if genres.include?('thesis') || genres.include?('Thesis') val << 'Thesis/Dissertation' val.delete 'Thesis' end conf_pub = ['conference publication', 'Conference publication', 'Conference Publication'] if !(genres & conf_pub).empty? types = self.term_values(:typeOfResource) if types && types.include?('text') val << 'Conference proceedings' val.delete 'Conference publication' end end end val.uniq end |
#sw_geographic_search(sep = ' ') ⇒ Array<String>
Values are the contents of:
subject/geographic
subject/hierarchicalGeographic
subject/geographicCode (only include the translated value if it isn't already present from other mods geo fields)
217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 |
# File 'lib/stanford-mods/searchworks.rb', line 217 def sw_geographic_search(sep = ' ') result = term_values([:subject, :geographic]) || [] # hierarchicalGeographic has sub elements @mods_ng_xml.subject.hierarchicalGeographic.each { |hg_node| hg_vals = [] hg_node.element_children.each { |e| hg_vals << e.text unless e.text.empty? } result << hg_vals.join(sep) unless hg_vals.empty? } trans_code_vals = @mods_ng_xml.subject.geographicCode.translated_value if trans_code_vals trans_code_vals.each { |val| result << val if !result.include?(val) } end result end |
#sw_impersonal_authors ⇒ Array<String>
return the display_value_w_date for all <mods><name> elements that do not have type=‘personal’
74 75 76 |
# File 'lib/stanford-mods/searchworks.rb', line 74 def @mods_ng_xml.plain_name.select {|n| n.type_at != 'personal'}.map { |n| n.display_value_w_date } end |
#sw_language_facet ⇒ Object
include langagues known to SearchWorks; try to error correct when possible (e.g. when ISO-639 disagrees with MARC standard)
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
# File 'lib/stanford-mods/searchworks.rb', line 13 def sw_language_facet result = [] @mods_ng_xml.language.each { |n| # get languageTerm codes and add their translations to the result n.code_term.each { |ct| if ct..match(/^iso639/) begin vals = ct.text.split(/[,|\ ]/).reject {|x| x.strip.length == 0 } vals.each do |v| iso639_val = ISO_639.find(v.strip).english_name if SEARCHWORKS_LANGUAGES.has_value?(iso639_val) result << iso639_val else result << SEARCHWORKS_LANGUAGES[v.strip] end end rescue # TODO: this should be written to a logger p "Couldn't find english name for #{ct.text}" end else vals = ct.text.split(/[,|\ ]/).reject {|x| x.strip.length == 0 } vals.each do |v| result << SEARCHWORKS_LANGUAGES[v.strip] end end } # add languageTerm text values n.text_term.each { |tt| val = tt.text.strip result << val if val.length > 0 && SEARCHWORKS_LANGUAGES.has_value?(val) } # add language values that aren't in languageTerm subelement if n.languageTerm.size == 0 result << n.text if SEARCHWORKS_LANGUAGES.has_value?(n.text) end } result.uniq end |
#sw_logger ⇒ Object
—- end PUBLICATION (place, year) —-
495 496 497 |
# File 'lib/stanford-mods/searchworks.rb', line 495 def sw_logger @logger ||= Logger.new(STDOUT) end |
#sw_main_author ⇒ String
Returns value for author_1xx_search field.
58 59 60 |
# File 'lib/stanford-mods/searchworks.rb', line 58 def end |
#sw_meeting_authors ⇒ Array<String>
Returns values for author_meeting_display.
85 86 87 |
# File 'lib/stanford-mods/searchworks.rb', line 85 def @mods_ng_xml.plain_name.select {|n| n.type_at == 'conference'}.map { |n| n.display_value_w_date } end |
#sw_person_authors ⇒ Array<String>
Returns values for author_person_facet, author_person_display.
68 69 70 |
# File 'lib/stanford-mods/searchworks.rb', line 68 def personal_names_w_dates end |
#sw_short_title ⇒ String
Returns value for title_245a_search field.
124 125 126 |
# File 'lib/stanford-mods/searchworks.rb', line 124 def sw_short_title short_titles ? short_titles.first : nil end |
#sw_sort_author ⇒ String
Returns a sortable version of the main_author:
+ sorting title
which is the mods approximation of the value created for a marc record
93 94 95 96 97 |
# File 'lib/stanford-mods/searchworks.rb', line 93 def # substitute java Character.MAX_CODE_POINT for nil main_author so missing main authors sort last val = '' + ( ? : "\u{10FFFF} ") + ( sort_title ? sort_title : '') val.gsub(/[[:punct:]]*/, '').strip end |
#sw_sort_title ⇒ String
Returns a sortable version of the main title
185 186 187 188 189 190 191 192 193 194 195 196 197 |
# File 'lib/stanford-mods/searchworks.rb', line 185 def sw_sort_title # get nonSort piece outer_nodes = @mods_ng_xml.title_info outer_node = outer_nodes ? outer_nodes.first : nil if outer_node nonSort = outer_node.nonSort.text.strip.empty? ? nil : outer_node.nonSort.text.strip end val = '' + ( sw_full_title ? sw_full_title : '') val.sub!(Regexp.new("^" + Regexp.escape(nonSort)), '') if nonSort val.gsub!(/[[:punct:]]*/, '').strip val.squeeze(" ").strip end |
#sw_subject_names(sep = ', ') ⇒ Array<String>
Values are the contents of:
subject/name/namePart
"Values from namePart subelements should be concatenated in the order they appear (e.g. "Shakespeare, William, 1564-1616")"
244 245 246 247 248 249 250 251 |
# File 'lib/stanford-mods/searchworks.rb', line 244 def sw_subject_names(sep = ', ') result = [] @mods_ng_xml.subject.name_el.select { |n_el| n_el.namePart }.each { |name_el_w_np| parts = name_el_w_np.namePart.map { |npn| npn.text unless npn.text.empty? }.compact result << parts.join(sep).strip unless parts.empty? } result end |
#sw_subject_titles(sep = ' ') ⇒ Array<String>
Values are the contents of:
subject/titleInfo/(subelements)
257 258 259 260 261 262 263 264 |
# File 'lib/stanford-mods/searchworks.rb', line 257 def sw_subject_titles(sep = ' ') result = [] @mods_ng_xml.subject.titleInfo.each { |ti_el| parts = ti_el.element_children.map { |el| el.text unless el.text.empty? }.compact result << parts.join(sep).strip unless parts.empty? } result end |
#sw_title_display ⇒ String
like sw_full_title without trailing ,/;:. spec from solrmarc-sw sw_index.properties
title_display = custom, removeTrailingPunct(245abdefghijklmnopqrstuvwxyz, [\\\\,/;:], ([A-Za-z]{4}|[0-9]{3}|\\)|\\,))
168 169 170 171 172 173 174 175 |
# File 'lib/stanford-mods/searchworks.rb', line 168 def sw_title_display result = sw_full_title ? sw_full_title : nil if result result.sub!(/[\.,;:\/\\]+$/, '') result.strip! end result end |
#topic_facet ⇒ Array<String>
Values are the contents of:
subject/topic
subject/name
subject/title
subject/occupation
with trailing comma, semicolon, and backslash (and any preceding spaces) removed
285 286 287 288 289 290 291 292 293 294 295 |
# File 'lib/stanford-mods/searchworks.rb', line 285 def topic_facet vals = subject_topics ? Array.new(subject_topics) : [] vals.concat(subject_names) if subject_names vals.concat(subject_titles) if subject_titles vals.concat(subject_occupations) if subject_occupations vals.map! { |val| v = val.sub(/[\\,;]$/, '') v.strip } vals.empty? ? nil : vals end |
#topic_search ⇒ Array<String>
Values are the contents of:
mods/genre
mods/subject/topic
270 271 272 273 274 275 276 |
# File 'lib/stanford-mods/searchworks.rb', line 270 def topic_search @topic_search ||= begin vals = self.term_values(:genre) || [] vals.concat(subject_topics) if subject_topics vals.empty? ? nil : vals end end |