Class: Traject::Macros::MarcFormatClassifier
- Inherits:
-
Object
- Object
- Traject::Macros::MarcFormatClassifier
- Defined in:
- lib/traject/macros/marc_format_classifier.rb
Overview
Not actually a macro, but we’re keeping it here for now, a class for classifying marc according to format/genre/type.
VERY opinionated.
Instance Attribute Summary collapse
-
#record ⇒ Object
readonly
Returns the value of attribute record.
Instance Method Summary collapse
-
#formats(options = {}) ⇒ Object
A very opinionated method that just kind of jams together all the possible format/genre/types into one array of 1 to N elements.
-
#genre ⇒ Object
Returns 1 or more values in an array from: Book; Journal/Newspaper; Musical Score; Map/Globe; Non-musical Recording; Musical Recording Image; Software/Data; Video/Film.
-
#initialize(marc_record) ⇒ MarcFormatClassifier
constructor
A new instance of MarcFormatClassifier.
-
#manuscript_archive? ⇒ Boolean
Marked as manuscript OR archive.
-
#microform? ⇒ Boolean
if field 007 byte 0 is ‘h’, that’s microform.
-
#normalized_gmd ⇒ Object
downcased version of the gmd, or else empty string.
-
#online? ⇒ Boolean
We use marc 007 to determine if this represents an online resource.
-
#print? ⇒ Boolean
Algorithm with help from Chris Case.
-
#proceeding? ⇒ Boolean
Just checks all $6xx for a $v “Congresses”.
-
#thesis? ⇒ Boolean
Just checks if it has a 502, if it does it’s considered a thesis.
Constructor Details
#initialize(marc_record) ⇒ MarcFormatClassifier
Returns a new instance of MarcFormatClassifier.
22 23 24 |
# File 'lib/traject/macros/marc_format_classifier.rb', line 22 def initialize(marc_record) @record = marc_record end |
Instance Attribute Details
#record ⇒ Object (readonly)
Returns the value of attribute record.
20 21 22 |
# File 'lib/traject/macros/marc_format_classifier.rb', line 20 def record @record end |
Instance Method Details
#formats(options = {}) ⇒ Object
A very opinionated method that just kind of jams together all the possible format/genre/types into one array of 1 to N elements.
Default “Other” will be used
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
# File 'lib/traject/macros/marc_format_classifier.rb', line 30 def formats( = {}) = {:default => "Other"}.merge() formats = [] formats.concat genre formats << "Manuscript/Archive" if manuscript_archive? formats << "Microform" if microform? formats << "Online" if online? # In our own data, if it's an audio recording, it might show up # as print, but it's probably not. formats << "Print" if print? && ! (formats.include?("Non-musical Recording") || formats.include?("Musical Recording")) # If it's a Dissertation, we decide it's NOT a book if thesis? formats.delete("Book") formats << "Dissertation/Thesis" end if proceeding? formats << "Conference" end if formats.empty? formats << [:default] end return formats end |
#genre ⇒ Object
Returns 1 or more values in an array from: Book; Journal/Newspaper; Musical Score; Map/Globe; Non-musical Recording; Musical Recording Image; Software/Data; Video/Film
Uses leader byte 6, leader byte 7, and 007 byte 0.
Gets actual labels from marc_genre_leader and marc_genre_007 translation maps, so you can customize labels if you want.
72 73 74 75 76 77 78 79 80 81 |
# File 'lib/traject/macros/marc_format_classifier.rb', line 72 def genre marc_genre_leader = Traject::TranslationMap.new("marc_genre_leader") marc_genre_007 = Traject::TranslationMap.new("marc_genre_007") results = marc_genre_leader[ record.leader.slice(6,2) ] || marc_genre_leader[ record.leader.slice(6)] || record.find_all {|f| f.tag == "007"}.collect {|f| marc_genre_007[f.value.slice(0)]} [results].flatten end |
#manuscript_archive? ⇒ Boolean
Marked as manuscript OR archive.
157 158 159 160 161 162 163 164 165 166 167 168 |
# File 'lib/traject/macros/marc_format_classifier.rb', line 157 def manuscript_archive? leader06 = record.leader.slice(6) leader08 = record.leader.slice(8) # leader 6 t=Manuscript Language Material, d=Manuscript Music, # f=Manuscript Cartograhpic # # leader 06 = 'b' is obsolete, but if it exists it means archival countrl # # leader 08 'a'='archival control' %w{t d f b}.include?(leader06) || leader08 == "a" end |
#microform? ⇒ Boolean
if field 007 byte 0 is ‘h’, that’s microform. But many of our microform don’t have that. If leader byte 6 is ‘h’, that’s an obsolete way of saying microform. And finally, if GMD is
150 151 152 153 154 |
# File 'lib/traject/macros/marc_format_classifier.rb', line 150 def microform? normalized_gmd.start_with?("[microform]") || record.leader['6'] == "h" || record.find {|f| (f.tag == "007") && (f.value['0'] == "h")} end |
#normalized_gmd ⇒ Object
downcased version of the gmd, or else empty string
171 172 173 174 175 |
# File 'lib/traject/macros/marc_format_classifier.rb', line 171 def normalized_gmd @gmd ||= begin ((a245 = record['245']) && a245['h'] && a245['h'].downcase) || "" end end |
#online? ⇒ Boolean
We use marc 007 to determine if this represents an online resource. But sometimes resort to 245$h GMD too.
132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
# File 'lib/traject/macros/marc_format_classifier.rb', line 132 def online? # field 007, byte 0 c="electronic" byte 1 r="remote" ==> sure Online found_007 = record.find do |field| field.tag == "007" && field.value.slice(0) == "c" && field.value.slice(1) == "r" end return true if found_007 # Otherwise, if it has a GMD ["electronic resource"], we count it # as online only if NO 007[0] == 'c' exists, cause if it does we already # know it's electronic but not remote, otherwise first try would # have found it. return (normalized_gmd.start_with? "[electronic resource]") && ! record.find {|f| f.tag == '007' && f.value.slice(0) == "c"} end |
#print? ⇒ Boolean
Algorithm with help from Chris Case.
-
If it has any RDA 338, then it’s print if it has a value of volume, sheet, or card.
-
If it does not have an RDA 338, it’s print if and only if it has NO 245$h GMD.
-
Here at JH, for legacy reasons we also choose to not call it print if it’s already been marked audio, but we do that in a different method.
This algorithm is definitely going to get some things wrong in both directions, with real world data. But seems to be good enough.
111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
# File 'lib/traject/macros/marc_format_classifier.rb', line 111 def print? rda338 = record.find_all do |field| field.tag == "338" && field['2'] == "rdacarrier" end if rda338.length > 0 rda338.find do |field| field.subfields.find do |sf| (sf.code == "a" && %w{volume card sheet}.include?(sf.value)) || (sf.code == "b" && %w{nc no nb}.include?(sf.value)) end end else normalized_gmd.length == 0 end end |
#proceeding? ⇒ Boolean
Just checks all $6xx for a $v “Congresses”
91 92 93 94 95 96 97 |
# File 'lib/traject/macros/marc_format_classifier.rb', line 91 def proceeding? @proceeding_q ||= begin ! record.find do |field| field.tag.slice(0) == '6' && field.subfields.find {|sf| sf.code == "v" && sf.value =~ /^\s*(C|c)ongresses\.?\s*$/} end.nil? end end |
#thesis? ⇒ Boolean
Just checks if it has a 502, if it does it’s considered a thesis
84 85 86 87 88 |
# File 'lib/traject/macros/marc_format_classifier.rb', line 84 def thesis? @thesis_q ||= begin ! record.find {|a| a.tag == "502"}.nil? end end |