Class: Traject::HorizonBibAuthMerge
- Inherits:
-
Object
- Object
- Traject::HorizonBibAuthMerge
- Defined in:
- lib/traject/horizon_bib_auth_merge.rb
Overview
Constant Summary collapse
- @@up_to_subfield_t_re =
/\A(.*)\x1Ft/
Instance Attribute Summary collapse
-
#authtext ⇒ Object
readonly
Returns the value of attribute authtext.
-
#bibtext ⇒ Object
readonly
Returns the value of attribute bibtext.
-
#tag ⇒ Object
readonly
Returns the value of attribute tag.
Instance Method Summary collapse
-
#initialize(tag, bibtext, authtext) ⇒ HorizonBibAuthMerge
constructor
Pass in bibtext and authtext as String – you probably need to get column values from JDBC as bytes and then use String.from_java_bytes to avoid messing up possible Marc8 encoding.
-
#merge! ⇒ Object
Returns merged string, composed of a marc ‘field’, with subfields seperated by seperator control chars.
Constructor Details
#initialize(tag, bibtext, authtext) ⇒ HorizonBibAuthMerge
Pass in bibtext and authtext as String – you probably need to get column values from JDBC as bytes and then use String.from_java_bytes to avoid messing up possible Marc8 encoding.
bibtext is either text or longtext column from fullbib, preferring longtext. authtext is either xref_text or xref_longtext from fullbib, preferring xref_longtext.
22 23 24 25 26 27 28 29 30 31 32 |
# File 'lib/traject/horizon_bib_auth_merge.rb', line 22 def initialize(tag, bibtext, authtext) @merged = false @tag = tag @bibtext = bibtext @authtext = authtext # remove terminal MARC Field Terminator if present. @bibtext.chomp!("\x1E") if @bibtext @authtext.chomp!("\x1E") if @authtext end |
Instance Attribute Details
#authtext ⇒ Object (readonly)
Returns the value of attribute authtext.
11 12 13 |
# File 'lib/traject/horizon_bib_auth_merge.rb', line 11 def authtext @authtext end |
#bibtext ⇒ Object (readonly)
Returns the value of attribute bibtext.
11 12 13 |
# File 'lib/traject/horizon_bib_auth_merge.rb', line 11 def bibtext @bibtext end |
#tag ⇒ Object (readonly)
Returns the value of attribute tag.
11 12 13 |
# File 'lib/traject/horizon_bib_auth_merge.rb', line 11 def tag @tag end |
Instance Method Details
#merge! ⇒ Object
Returns merged string, composed of a marc ‘field’, with subfields seperated by seperator control chars. Does not include terminal MARC Field Seperator.
Will mutate bibtext and authtext for efficiency.
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
# File 'lib/traject/horizon_bib_auth_merge.rb', line 39 def merge! raise Exception.new("Can only call `merge!` once, already called.") if @merged @merged = true # just one? (Or neither?) Just return it. return authtext if bibtext.nil? return bibtext if authtext.nil? # For 240 and 243, it seems that anything before the first $t should # be ignored in authtext template -- we need to actually remove it, # so later when we append any leftover fields, we don't get those. if tag == '240' || tag == '243' authtext.sub!(@@up_to_subfield_t_re, "\x1Ft") end # We need to do a crazy combination of template in text with values in authtext. # horizon, you so crazy. text template is like: #"\x1Fa.\x1Fp ;\x1Fv81." # which means each subfield after the \x1F, merge in # the subfield value from the auth record if it's present, # otherwise don't. # # plus some weird as hell stuff with punctuation and spaces, I can't # even explain it, just trial and error'd it comparing to marcout. bibtext.gsub!(/\x1F([^\x1F\x1E])( ?)([[:punct:] ]*)/) do subfield = $1 space = $2 maybe_punct = $3 # okay this is crazy hacky reverse engineering, I don't really # know what's going on but for 240 and 243, 'a' in template # is filled by 't' in auth tag. auth_subfield = if subfield == "a" && (tag == "240" || tag == "243") "t" else subfield end # Find substitute fill-in value from authtext, if it can # be found -- first subfield indicated. Then we REMOVE # it from authtext, so next time this subfield is asked for, # subsequent subfield with that code will be used. substitute = nil authtext.sub!(/\x1F#{Regexp.escape auth_subfield}([^\x1F\x1E]*)/) do substitute = $1 '' end if substitute # Dealing with punctuation is REALLY CONFUSING -- reverse engineering # HIP/Horizon, which does WEIRD THINGS. # But we seem to have arrived at something that appears to match all cases # we can find of what HIP/Horizon does. # # If the auth value already ends up with the same punctuation from the template, # _leave it alone_ -- including preserving all spaces near the punct in the auth # value. # # Otherwise, remove all punct from the auth value, then add in the punct from the template, # along with any spaces before the punct in the template. if maybe_punct && maybe_punct.length > 0 # remove all punctuation from end of auth value? to use punct from template instead? # But preserve initial spaces from template? Unless it already ends # with the punctuation, in which case don't touch it, to avoid # messing up spaces? WEIRD, yeah. unless substitute.end_with? maybe_punct substitute.gsub!(/[[:punct:]]+\Z/, "") # This adding the #{space} back in, is consistent with what HIP does. # I have no idea if it's right or a bug in HIP, but being consistent. # neither leaving it in nor taking it out is exactly consistent with HznExportMarc, # which seems to have bugs. substitute << "#{space}#{maybe_punct}" end end "\x1F#{subfield}#{substitute}" else # just keep original, which has no maybe_punct "\x1F#{subfield}" end end # Sometimes there's leftover text at the end of authtext that wasn't # included in the bibtext template. Horizon's marc reconstruction # seems to just include this on the end, we will too. # Relies on 'prior to $t' fields being removed from 240 and 243 earlier, # to avoid including them when we shouldn't. if authtext.length > 0 bibtext << authtext end # We mutated bibtext to fill in template, now just return it. return bibtext end |