Module: MARC2Solr::Custom

Defined in:
lib/marc2solr/marc2solr_custom.rb

Constant Summary collapse

LOG =
JLogger::RootLogger.new

Class Method Summary collapse

Class Method Details

.as_marc_in_json(doc, r) ⇒ Object

And another for marc-in-json



38
39
40
# File 'lib/marc2solr/marc2solr_custom.rb', line 38

def self.as_marc_in_json doc, r
  return r.to_marc_in_json
end

.asMARC(doc, r) ⇒ Object

Another for marc binary



31
32
33
# File 'lib/marc2solr/marc2solr_custom.rb', line 31

def self.asMARC doc, r
  return r.to_marc
end

.asXML(doc, r) ⇒ String

The simplest possible example; just call a method on the underlying MARC4J4R record Note that even though we don’t use the arguments, the method signature has to support it

Parameters:

  • doc (hashlike)

    The document object being added to; allows you to leverage already-done work

  • r (MARC4J4R::Record)

    A MARC4J4R record

  • doc (#[])

    A hashlike (responds to #[]) that holds the computed values for fields “so far”

Returns:

  • (String)

    The XML representation of the record



26
27
28
# File 'lib/marc2solr/marc2solr_custom.rb', line 26

def self.asXML doc, r  #Remember, module fucntion! Define with "def self.methodName"
  return r.to_xml
end

.fieldWithoutIndexingChars(doc, r, tag) ⇒ Object

A simple function to pull the non-indexing characters off the front of a field based on the second indicator



143
144
145
146
147
148
149
150
151
152
# File 'lib/marc2solr/marc2solr_custom.rb', line 143

def self.fieldWithoutIndexingChars doc, r, tag
  vals = []
  r.find_by_tag(tag).each do |df|
    ind2 = df.ind2.to_i
    if ind2 > 0
      vals << df.value[ind2..-1]
    end
  end
  return vals
end

.getAllSearchableFields(doc, r, lower, upper) ⇒ String

Here we get all the text from fields between (inclusive) the two tag strings in args;

the highest

Parameters:

  • doc (hashlike)

    The document object being added to; allows you to leverage already-done work

  • r (MARC4J4R::Record)

    A MARC4J4R record

  • args (Array<String>)

    An array of two strings, the lowest tag you want to include, and

Returns:

  • (String)

    A single single string with all the text from included fields



49
50
51
52
53
54
55
56
# File 'lib/marc2solr/marc2solr_custom.rb', line 49

def self.getAllSearchableFields(doc, r, lower, upper)
  data = []
  r.each do |field|
    next unless field.tag <= upper and field.tag >= lower
    data << field.value
  end
  return data.join(' ')
end

.getDate(doc, r) ⇒ String

An example of a DateOfPublication implementation

Parameters:

  • doc (hashlike)

    The document object being added to; allows you to leverage already-done work

  • r (MARC4J4R::Record)

    A MARC4J4R record

Returns:

  • (String)

    the found date, or nil if not found.



109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
# File 'lib/marc2solr/marc2solr_custom.rb', line 109

def self.getDate doc, r
  begin
    ohoh8 = r['008'].value
    date1 = ohoh8[7..10].downcase
    datetype = ohoh8[6..6]
    if ['n','u','b'].include? datetype
      date1 = ""
    else 
      date1 = date1.gsub('u', '0').gsub('|', ' ')
      date1 = '' if date1 == '0000'
    end

    if m = /^\d\d\d\d$/.match(date1)
      return m[0]
    end
  rescue
   # do nothing ... go on to the 260c
  end


  # No good? Fall back on the 260c
  begin
    d =  r['260']['c']
    if m = /\d\d\d\d/.match(d)
      return m[0]
    end
  rescue
    LOG.debug "Record #{r['001']} has no valid date"
    return nil
  end
end

.getDateRange(date, r) ⇒ Object

A helper function – take in a year, and return a date category



156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
# File 'lib/marc2solr/marc2solr_custom.rb', line 156

def self.getDateRange(date, r)
  if date < "1500"
    return "Pre-1500"
  end

  case date.to_i
  when 1500..1800 then 
    century = date[0..1]
    return century + '00' + century + '99'
  when 1801..2100 then
    decade = date[0..2]
    return decade + "0-" + decade + "9";
  else
  #      puts "getDateRange: #{r['001'].value} invalid date #{date}"
  end
end

.getISBNS(doc, r, codes = ['a', 'z']) ⇒ Object

Extract an ISBN from the given subfields of the 020 and provide both 10-character and 13-digit versions for each. If they appear to not be ISBNs, just return the original value



89
90
91
92
93
94
95
96
97
98
99
100
101
102
# File 'lib/marc2solr/marc2solr_custom.rb', line 89

def self.getISBNS doc, r, codes=['a', 'z']
  rv = []
  r.find_by_tag('020').each do |f|
    f.sub_values(codes).each do |v|
      std = StdNum::ISBN.allNormalizedValues(v)
      if std.size > 0
        rv.concat std
      else
        rv << v
      end
    end
  end
  return rv
end

.pubDateAndRange(doc, r) ⇒ Object



203
204
205
206
207
208
# File 'lib/marc2solr/marc2solr_custom.rb', line 203

def self.pubDateAndRange(doc, r)
  date = self.getDate(doc, r)
  return [nil, nil] unless date
  range = self.getDateRange(date, r)
  return [date, range]
end

.pubDateRange(doc, r, wherePubdateIsStored) ⇒ Object

Get the date range, based on the previously-computed pubdate



175
176
177
178
# File 'lib/marc2solr/marc2solr_custom.rb', line 175

def self.pubDateRange(doc, r, wherePubdateIsStored)
 previouslyComputedPubdate = doc[wherePubdateIsStored][0]
 return [self.getDateRange(previouslyComputedPubdate)]
end

.valsByPattern(doc, r, tag, codes, pattern, matchindex = 0) ⇒ Array<String>

How about one to sort out, say, the 035s? We’ll make a generic routine that looks for specified values in specified subfields of variable fields, and then make sure they match before returning them.

See the use of this in the simple_sample/simple_index.rb file for field ‘oclc’

The default is zero, which means “the whole string”

Parameters:

  • doc (hashlike)

    The document object being added to; allows you to leverage already-done work

  • r (MARC4J4R::Record)

    A MARC4J4R record

  • tag (String)

    A tag string (e.g., ‘035’)

  • codes (String, Array<String>)

    A subfield code (‘a’) or array of them ([‘a’, ‘c’])

  • pattern (Regexp)

    A pattern that must match for the value to be included

  • matchindex (Fixnum) (defaults to: 0)

    The number of the substring captured by parens in the pattern to return

Returns:

  • (Array<String>)

    a (possibly empty) array of found values



72
73
74
75
76
77
78
79
80
81
82
83
# File 'lib/marc2solr/marc2solr_custom.rb', line 72

def self.valsByPattern(doc, r, tag, codes, pattern, matchindex=0)
  data = []
  r.find_by_tag(tag).each do |f|
    f.sub_values(codes).each do |v|
      if m = pattern.match(v)
        data << m[matchindex]
      end
    end
  end
  data.uniq!
  return data
end