Class: Traject::HorizonBibAuthMerge

Inherits:
Object
  • Object
show all
Defined in:
lib/traject/horizon_bib_auth_merge.rb

Overview

Merges ‘bib text’ and ‘auth text’ lines from Horizon, using bib text as template when neccesary.

merged_str = HorizonBibAuthMerge.new(tag, bib_text_str, auth_text_str).merge!

Strings passed in may be mutated for efficiency. So you can only call merge! once, it’s just utility.

Constant Summary collapse

@@up_to_subfield_t_re =
/\A(.*)\x1Ft/

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(tag, bibtext, authtext) ⇒ HorizonBibAuthMerge

Pass in bibtext and authtext as String – you probably need to get column values from JDBC as bytes and then use String.from_java_bytes to avoid messing up possible Marc8 encoding.

bibtext is either text or longtext column from fullbib, preferring longtext. authtext is either xref_text or xref_longtext from fullbib, preferring xref_longtext.



22
23
24
25
26
27
28
29
30
31
32
# File 'lib/traject/horizon_bib_auth_merge.rb', line 22

def initialize(tag, bibtext, authtext)
  @merged = false

  @tag      = tag
  @bibtext  = bibtext
  @authtext = authtext

  # remove terminal MARC Field Terminator if present.
  @bibtext.chomp!("\x1E") if @bibtext
  @authtext.chomp!("\x1E") if @authtext
end

Instance Attribute Details

#authtextObject (readonly)

Returns the value of attribute authtext.



11
12
13
# File 'lib/traject/horizon_bib_auth_merge.rb', line 11

def authtext
  @authtext
end

#bibtextObject (readonly)

Returns the value of attribute bibtext.



11
12
13
# File 'lib/traject/horizon_bib_auth_merge.rb', line 11

def bibtext
  @bibtext
end

#tagObject (readonly)

Returns the value of attribute tag.



11
12
13
# File 'lib/traject/horizon_bib_auth_merge.rb', line 11

def tag
  @tag
end

Instance Method Details

#merge!Object

Returns merged string, composed of a marc ‘field’, with subfields seperated by seperator control chars. Does not include terminal MARC Field Seperator.

Will mutate bibtext and authtext for efficiency.

Raises:

  • (Exception)


39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
# File 'lib/traject/horizon_bib_auth_merge.rb', line 39

def merge!
  raise Exception.new("Can only call `merge!` once, already called.") if @merged
  @merged = true

  # just one? (Or neither?) Just return it.
  return authtext if bibtext.nil?
  return bibtext  if authtext.nil?


  # For 240 and 243, it seems that anything before the first $t should
  # be ignored in authtext template -- we need to actually remove it, 
  # so later when we append any leftover fields, we don't get those. 
  if tag == '240' || tag == '243'
    authtext.sub!(@@up_to_subfield_t_re, "\x1Ft")
  end


  # We need to do a crazy combination of template in text with values in authtext.
  # horizon, you so crazy. text template is like:
  #"\x1Fa.\x1Fp ;\x1Fv81."
  # which means each subfield after the \x1F, merge in
  # the subfield value from the auth record if it's present,
  # otherwise don't.
  #
  # plus some weird as hell stuff with punctuation and spaces, I can't
  # even explain it, just trial and error'd it comparing to marcout.
  bibtext.gsub!(/\x1F([^\x1F\x1E])( ?)([[:punct:] ]*)/) do

    subfield       = $1
    space          = $2
    maybe_punct    = $3


    # okay this is crazy hacky reverse engineering, I don't really
    # know what's going on but for 240 and 243, 'a' in template
    # is filled by 't' in auth tag.
    auth_subfield = if subfield == "a" && (tag == "240" || tag == "243")
      "t"
    else
      subfield
    end

    # Find substitute fill-in value from authtext, if it can
    # be found -- first subfield indicated. Then we REMOVE
    # it from authtext, so next time this subfield is asked for,
    # subsequent subfield with that code will be used.
    substitute = nil
    authtext.sub!(/\x1F#{Regexp.escape auth_subfield}([^\x1F\x1E]*)/) do
      substitute = $1
      ''
    end

    if substitute


      # Dealing with punctuation is REALLY CONFUSING -- reverse engineering
      # HIP/Horizon, which does WEIRD THINGS.
      # But we seem to have arrived at something that appears to match all cases
      # we can find of what HIP/Horizon does.
      #
      # If the auth value already ends up with the same punctuation from the template,
      # _leave it alone_ -- including preserving all spaces near the punct in the auth
      # value.
      #
      # Otherwise, remove all punct from the auth value, then add in the punct from the template,
      # along with any spaces before the punct in the template.
      if maybe_punct && maybe_punct.length > 0
        # remove all punctuation from end of auth value? to use punct from template instead?
        # But preserve initial spaces from template? Unless it already ends
        # with the punctuation, in which case don't touch it, to avoid
        # messing up spaces? WEIRD, yeah.
        unless substitute.end_with? maybe_punct
          substitute.gsub!(/[[:punct:]]+\Z/, "")
          # This adding the #{space} back in, is consistent with what HIP does.
          # I have no idea if it's right or a bug in HIP, but being consistent.
          # neither leaving it in nor taking it out is exactly consistent with HznExportMarc,
          # which seems to have bugs.
          substitute << "#{space}#{maybe_punct}"
        end
      end

      "\x1F#{subfield}#{substitute}"
    else # just keep original, which has no maybe_punct
      "\x1F#{subfield}"
    end
  end

  # Sometimes there's leftover text at the end of authtext that wasn't 
  # included in the bibtext template. Horizon's marc reconstruction
  # seems to just include this on the end, we will too. 
  # Relies on 'prior to $t' fields being removed from 240 and 243 earlier,
  # to avoid including them when we shouldn't. 
  if authtext.length > 0
    bibtext << authtext
  end



  # We mutated bibtext to fill in template, now just return it.
  return bibtext
end