Class: Traject::MarcExtractor::Spec

Inherits:
Object
  • Object
show all
Defined in:
lib/traject/marc_extractor_spec.rb

Constant Summary collapse

DATAFIELD_PATTERN =

Converts from a string marc spec like "008[35]:245abc:700a" to a hash used internally to represent the specification. See comments at head of class for documentation of string specification format.

Return value

The hash returned is keyed by tag, and has as values an array of 0 or or more MarcExtractor::Spec objects representing the specified extraction operations for that tag.

It's an array of possibly more than one, because you can specify multiple extractions on the same tag: for instance "245a:245abc"

See tests for more examples.

/\A([a-zA-Z0-9]{3})(\|([a-z0-9\ \*])([a-z0-9\ \*])\|)?([a-z0-9]*)?\Z/
CONTROLFIELD_PATTERN =
/\A([a-zA-Z0-9]{3})(\[(\d+)(-(\d+))?\])\Z/

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(hash = nil) ⇒ Spec

Allow use of a hash to initialize. Should ditch this and use optional keyword args once folks move to 2.x syntax


77
78
79
80
81
82
83
# File 'lib/traject/marc_extractor_spec.rb', line 77

def initialize(hash = nil)
  if hash
    hash.each_pair do |key, value|
      self.send("#{key}=", value)
    end
  end
end

Instance Attribute Details

#byte1Object

Returns the value of attribute byte1


73
74
75
# File 'lib/traject/marc_extractor_spec.rb', line 73

def byte1
  @byte1
end

#byte2Object

Returns the value of attribute byte2


73
74
75
# File 'lib/traject/marc_extractor_spec.rb', line 73

def byte2
  @byte2
end

#bytesObject (readonly)

Returns the value of attribute bytes


73
74
75
# File 'lib/traject/marc_extractor_spec.rb', line 73

def bytes
  @bytes
end

#indicator1Object

Returns the value of attribute indicator1


73
74
75
# File 'lib/traject/marc_extractor_spec.rb', line 73

def indicator1
  @indicator1
end

#indicator2Object

Returns the value of attribute indicator2


73
74
75
# File 'lib/traject/marc_extractor_spec.rb', line 73

def indicator2
  @indicator2
end

#subfieldsObject

Returns the value of attribute subfields


72
73
74
# File 'lib/traject/marc_extractor_spec.rb', line 72

def subfields
  @subfields
end

#tagObject

Returns the value of attribute tag


72
73
74
# File 'lib/traject/marc_extractor_spec.rb', line 72

def tag
  @tag
end

Class Method Details

.create_controlfield_spec(tag, byte1, byte2) ⇒ Object

Create a new controlfield spec


218
219
220
221
222
# File 'lib/traject/marc_extractor_spec.rb', line 218

def self.create_controlfield_spec(tag, byte1, byte2)
  spec = Spec.new(:tag => tag.freeze)
  spec.set_bytes(byte1.freeze, byte2.freeze)
  spec
end

.create_datafield_spec(tag, ind1, ind2, subfields) ⇒ Object

Create a new datafield spec. Most of the logic about how to deal with special characters is built into the Spec class.


204
205
206
207
208
209
210
211
212
213
214
215
# File 'lib/traject/marc_extractor_spec.rb', line 204

def self.create_datafield_spec(tag, ind1, ind2, subfields)
  spec            = Spec.new(:tag => tag)
  spec.indicator1 = ind1.freeze
  spec.indicator2 = ind2.freeze

  if subfields and !subfields.empty?
    spec.subfields = subfields.split('')
  end

  spec

end

.hash_from_string(spec_string) ⇒ Object


168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
# File 'lib/traject/marc_extractor_spec.rb', line 168

def self.hash_from_string(spec_string)
  # hash defaults to []
  hash         = Hash.new

  # Split the string(s) given on colon
  spec_strings = spec_string.is_a?(Array) ? spec_string.map { |s| s.split(/\s*:\s*/) }.flatten : spec_string.split(/\s*:\s*/)

  spec_strings.each do |part|
    if m = DATAFIELD_PATTERN.match(part)

      tag, ind1, ind2, subfields = m[1], m[3], m[4], m[5]

      spec = create_datafield_spec(tag, ind1, ind2, subfields)

      hash[spec.tag] ||= []
      hash[spec.tag] << spec

    elsif m = CONTROLFIELD_PATTERN.match(part)
      tag, byte1, byte2 = m[1], m[3], m[5]

      spec = create_controlfield_spec(tag, byte1, byte2)

      hash[spec.tag] ||= []
      hash[spec.tag] << spec
    else
      raise ArgumentError.new("Unrecognized marc extract specification: #{part}")
    end
  end

  return hash
end

Instance Method Details

#==(spec) ⇒ Object

Simple equality definition


138
139
140
141
142
143
144
145
146
# File 'lib/traject/marc_extractor_spec.rb', line 138

def ==(spec)
  return false unless spec.kind_of?(Spec)

  return (self.tag == spec.tag) &&
      (self.subfields == spec.subfields) &&
      (self.indicator1 == spec.indicator1) &&
      (self.indicator2 == spec.indicator2) &&
      (self.bytes == spec.bytes)
end

#includes_subfield_code?(code) ⇒ Boolean

Pass in a string subfield code like 'a'; does this spec include it?

Returns:

  • (Boolean)

132
133
134
135
# File 'lib/traject/marc_extractor_spec.rb', line 132

def includes_subfield_code?(code)
  # subfields nil means include them all
  self.subfields.nil? || self.subfields.include?(code)
end

#joinable?Boolean

Should subfields extracted by joined, if we have a seperator?

  • '630' no subfields specified => join all subfields
  • '630abc' multiple subfields specified = join all subfields
  • '633a' one subfield => do not join, return one value for each $a in the field
  • '633aa' one subfield, doubled => do join after all, will return a single string joining all the values of all the $a's.

Last case is handled implicitly at the moment when subfields == ['a', 'a']

Returns:

  • (Boolean)

92
93
94
# File 'lib/traject/marc_extractor_spec.rb', line 92

def joinable?
  (self.subfields.nil? || self.subfields.size != 1)
end

#matches_indicators?(field) ⇒ Boolean

Pass in a MARC field, do it's indicators match indicators in this spec? nil indicators in spec mean we don't care, everything matches.

Returns:

  • (Boolean)

125
126
127
128
# File 'lib/traject/marc_extractor_spec.rb', line 125

def matches_indicators?(field)
  return (indicator1.nil? || indicator1 == field.indicator1) &&
      (indicator2.nil? || indicator2 == field.indicator2)
end

#set_bytes(byte1, byte2) ⇒ Object


114
115
116
117
118
119
120
# File 'lib/traject/marc_extractor_spec.rb', line 114

def set_bytes(byte1, byte2)
  if byte1 && byte2
    @bytes = ((byte1.to_i)..(byte2.to_i))
  elsif byte1
    @bytes = byte1.to_i
  end
end