Class: Traject::Indexer::Context

Inherits:
Object
  • Object
show all
Defined in:
lib/traject/indexer/context.rb

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(hash_init = {}) ⇒ Context

Returns a new instance of Context.



8
9
10
11
12
13
14
15
16
17
18
19
# File 'lib/traject/indexer/context.rb', line 8

def initialize(hash_init = {})
  # TODO, argument checking for required args?

  self.clipboard   = {}
  self.output_hash = {}

  hash_init.each_pair do |key, value|
    self.send("#{key}=", value)
  end

  @skip = false
end

Instance Attribute Details

#clipboardObject

Returns the value of attribute clipboard.



21
22
23
# File 'lib/traject/indexer/context.rb', line 21

def clipboard
  @clipboard
end

#index_stepObject

Returns the value of attribute index_step.



22
23
24
# File 'lib/traject/indexer/context.rb', line 22

def index_step
  @index_step
end

#input_nameObject

sometimes we have multiple inputs, input_name describes the current one, and position_in_input the position of the record in the current input -- both can sometimes be blanl when we don't know.



28
29
30
# File 'lib/traject/indexer/context.rb', line 28

def input_name
  @input_name
end

#loggerObject

Returns the value of attribute logger.



21
22
23
# File 'lib/traject/indexer/context.rb', line 21

def logger
  @logger
end

#output_hashObject

Returns the value of attribute output_hash.



21
22
23
# File 'lib/traject/indexer/context.rb', line 21

def output_hash
  @output_hash
end

#positionObject

'position' is a 1-based position in stream of processed records.



24
25
26
# File 'lib/traject/indexer/context.rb', line 24

def position
  @position
end

#position_in_inputObject

sometimes we have multiple inputs, input_name describes the current one, and position_in_input the position of the record in the current input -- both can sometimes be blanl when we don't know.



28
29
30
# File 'lib/traject/indexer/context.rb', line 28

def position_in_input
  @position_in_input
end

#settingsObject

Returns the value of attribute settings.



22
23
24
# File 'lib/traject/indexer/context.rb', line 22

def settings
  @settings
end

#skipmessageObject

Should we be skipping this record?



31
32
33
# File 'lib/traject/indexer/context.rb', line 31

def skipmessage
  @skipmessage
end

#source_recordObject

Returns the value of attribute source_record.



22
23
24
# File 'lib/traject/indexer/context.rb', line 22

def source_record
  @source_record
end

#source_record_id_procObject

Returns the value of attribute source_record_id_proc.



22
23
24
# File 'lib/traject/indexer/context.rb', line 22

def source_record_id_proc
  @source_record_id_proc
end

Instance Method Details

#add_output(field_name, *values) ⇒ Traject::Context

Add values to an array in context.output_hash with the specified key/field_name(s). Creates array in output_hash if currently nil.

Post-processing/filtering:

  • uniqs accumulator, unless settings["allow_dupicate_values"] is set.
  • Removes nil values unless settings["allow_nil_values"] is set.
  • Will not add an empty array to output_hash (will leave it nil instead) unless settings["allow_empty_fields"] is set.

Multiple values can be added with multiple arguments (we avoid an array argument meaning multiple values to accomodate odd use cases where array itself is desired in output_hash value)

Note for historical reasons relevant settings key names are in constants in Traject::Indexer::ToFieldStep, but the settings don't just apply to ToFieldSteps

Examples:

add one value

context.add_output(:additional_title, "a title")

add multiple values as multiple params

context.add_output("additional_title", "a title", "another title")

add multiple values as multiple params from array using ruby spread operator

context.add_output(:some_key, *array_of_values)

add to multiple keys in output hash

context.add_output(["key1", "key2"], "value")

Parameters:

  • field_name (String, Symbol, Array<String>, Array[<Symbol>])

    A key to set in output_hash, or an array of such keys.

Returns:

  • (Traject::Context)

    self



117
118
119
120
121
122
123
124
125
126
127
128
129
# File 'lib/traject/indexer/context.rb', line 117

def add_output(field_name, *values)
  values.compact! unless self.settings && self.settings[Traject::Indexer::ToFieldStep::ALLOW_NIL_VALUES]

  return self if values.empty? and not (self.settings && self.settings[Traject::Indexer::ToFieldStep::ALLOW_EMPTY_FIELDS])

  Array(field_name).each do |key|
    accumulator = (self.output_hash[key.to_s] ||= [])
    accumulator.concat values
    accumulator.uniq! unless self.settings && self.settings[Traject::Indexer::ToFieldStep::ALLOW_DUPLICATE_VALUES]
  end

  return self
end

#record_inspectObject

a string label that can be used to refer to a particular record in log messages and exceptions. Includes various parts depending on what we got.



59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
# File 'lib/traject/indexer/context.rb', line 59

def record_inspect
  str = "<"

  str << "record ##{position}" if position

  if input_name && position_in_input
    str << " (#{input_name} ##{position_in_input}), "
  elsif position
    str << ", "
  end

  if source_id = source_record_id
    str << "source_id:#{source_id} "
  end

  if output_id = self.output_hash["id"]
    str << "output_id:#{[output_id].join(',')}"
  end

  str.chomp!(" ")
  str.chomp!(",")
  str << ">"

  str
end

#skip!(msg = '(no message given)') ⇒ Object

Set the fact that this record should be skipped, with an optional message



35
36
37
38
# File 'lib/traject/indexer/context.rb', line 35

def skip!(msg = '(no message given)')
  @skipmessage = msg
  @skip        = true
end

#skip?Boolean

Should we skip this record?

Returns:

  • (Boolean)


41
42
43
# File 'lib/traject/indexer/context.rb', line 41

def skip?
  @skip
end

#source_record_idObject

Useful for describing a record in a log or especially error message. May be useful to combine with #position in output messages, especially since this method may sometimes return empty string if info on record id is not available.

Returns id from source_record (if we can get it from a source_record_id_proc), then a slash,then output_hash["id"] -- if both are present. Otherwise may return just one, or even an empty string.



53
54
55
# File 'lib/traject/indexer/context.rb', line 53

def source_record_id
  source_record_id_proc && source_record_id_proc.call(source_record)
end