Module: Traject::Macros::Transformation

Included in:
Indexer
Defined in:
lib/traject/macros/transformation.rb

Overview

Macros intended to be mixed into an Indexer and used in config as second or further args to #to_field, to transform existing accumulator values.

They have the same form as any proc/block passed to #to_field, but operate on an existing accumulator, intended to be used as non-first-step transformations.

Some of these are extracted from extract_marc options, so they can be used with any first-step extract methods. Some informed by current users.

Instance Method Summary collapse

Instance Method Details

#append(suffix) ⇒ Object

Append argument to end of each value in accumulator.


141
142
143
144
145
# File 'lib/traject/macros/transformation.rb', line 141

def append(suffix)
  lambda do |rec, acc|
    acc.collect! { |v| v + suffix }
  end
end

#default(default_value) ⇒ Object

Adds a literal to accumulator if accumulator was empty

Examples:

to_field "title", extract_marc("245abc"), default("Unknown Title")

85
86
87
88
89
90
91
# File 'lib/traject/macros/transformation.rb', line 85

def default(default_value)
  lambda do |rec, acc|
    if acc.empty?
      acc << default_value
    end
  end
end

#first_onlyObject

Removes all but the first value from accumulator, if more values were present.

Examples:

to_field "main_author", extract_marc("100"), first_only

97
98
99
100
101
102
# File 'lib/traject/macros/transformation.rb', line 97

def first_only
  lambda do |rec, acc|
    # kind of esoteric, but slice used this way does mutating first, yep
    acc.slice!(1, acc.length)
  end
end

#gsub(pattern, replace) ⇒ Object

Run ruby gsub on each value in accumulator, with pattern and replace value given.


155
156
157
158
159
# File 'lib/traject/macros/transformation.rb', line 155

def gsub(pattern, replace)
  lambda do |rec, acc|
    acc.collect! { |v| v.gsub(pattern, replace) }
  end
end

#prepend(prefix) ⇒ Object

prepend argument to beginning of each value in accumulator.


148
149
150
151
152
# File 'lib/traject/macros/transformation.rb', line 148

def prepend(prefix)
  lambda do |rec, acc|
    acc.collect! { |v| prefix + v }
  end
end

#split(separator) ⇒ Object

Run ruby split on each value in the accumulator, with separator given, flatten all results into single array as accumulator. Will generally result in more individual values in accumulator as output than were there in input, as input values are split up into multiple values.


134
135
136
137
138
# File 'lib/traject/macros/transformation.rb', line 134

def split(separator)
  lambda do |rec, acc|
    acc.replace( acc.flat_map { |v| v.split(separator) } )
  end
end

#stripObject

For each value in accumulator, remove all leading or trailing whitespace (unique aware). Like ruby #strip, but whitespace-aware

Examples:

to_field "title", extract_marc("245"), strip

121
122
123
124
125
126
127
128
# File 'lib/traject/macros/transformation.rb', line 121

def strip
  lambda do |rec, acc|
    acc.collect! do |v|
      # unicode whitespace class aware
      v.sub(/\A[[:space:]]+/,'').sub(/[[:space:]]+\Z/, '')
    end
  end
end

#transform(a_proc = nil, &block) ⇒ Object

Pass in a proc/lambda arg or a block (or both), that will be called on each value already in the accumulator, to transform it. (Ie, with #map!/#collect! on your proc(s)).

Due to how ruby syntax precedence works, the block form is probably not too useful in traject config files, except with the &: trick.

The "stabby lambda" may be convenient for passing an explicit proc argument.

You can pass both an explicit proc arg and a block, in which case the proc arg will be applied first.

Examples:

to_field("something"), extract_marc("something"), transform(&:upcase)
to_field("something"), extract_marc("something"), transform(->(val) { val.tr('^a-z', "\uFFFD") })

60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
# File 'lib/traject/macros/transformation.rb', line 60

def transform(a_proc=nil, &block)
  unless a_proc || block
    raise ArgumentError, "Needs a transform proc arg or block arg"
  end

  transformer_callable = if a_proc && block
    # need to make a combo wrapper.
    ->(val) { block.call(a_proc.call(val)) }
  elsif a_proc
    a_proc
  else
    block
  end

  lambda do |rec, acc|
    acc.collect! do |value|
      transformer_callable.call(value)
    end
  end
end

#translation_map(*translation_map_specifier) ⇒ Object

Maps all values on accumulator through a Traject::TranslationMap.

A Traject::TranslationMap is hash-like mapping from input to output, usually defined in a yaml or dot-properties file, which can be looked up in load path with a file name as arg. See Traject::TranslationMap header coments for details.

Using this macro, you can pass in one TranslationMap initializer arg, but you can also pass in multiple, and they will be merged into each other (last one last), so you can use this to apply over-rides: Either from another on-disk map, or even from an inline hash (since a Hash is a valid TranslationMap initialization arg too).

Examples:

to_field("something"), to_field "cataloging_agency", extract_marc("040a"), translation_map("marc_040a")

with override

to_field("something"), to_field "cataloging_agency", extract_marc("040a"), translation_map("marc_040a", "local_marc_040a")

with multiple overrides, including local hash

to_field("something"), to_field "cataloging_agency", extract_marc("040a"), translation_map("marc_040a", "local_marc_040a", {"DLC" => "U.S. LoC"})

34
35
36
37
38
39
40
41
42
# File 'lib/traject/macros/transformation.rb', line 34

def translation_map(*translation_map_specifier)
  translation_map = translation_map_specifier.
    collect { |spec| Traject::TranslationMap.new(spec) }.
    reduce(:merge)

  lambda do |rec, acc|
    translation_map.translate_array! acc
  end
end

#uniqueObject

calls ruby uniq! on accumulator, removes any duplicate values

Examples:

to_field "something", extract_marc("245:240"), unique

109
110
111
112
113
# File 'lib/traject/macros/transformation.rb', line 109

def unique
  lambda do |rec, acc|
    acc.uniq!
  end
end