Module: ClickHouse::Concerns::ConsistencyWorker

Extended by:
ActiveSupport::Concern
Includes:
Gitlab::Utils::StrongMemoize
Defined in:
app/workers/click_house/concerns/consistency_worker.rb

Overview

This module can be used for batching over a ClickHouse database table column and do something with the yielded values. The module is responsible for correctly restoring the state (cursor) in case the processing was interrupted or restart the processing from the beginning of the table when the table was fully processed.

This class acts like a “template method” pattern where the implementor classes need to define two methods:

  • init_context: Returns a memoized hash, initializing the context that controls the data processing.

  • pluck_column: which column value to take from the ClickHouse DB when iterating

  • process_collected_values: once a limit is reached or no more data, do something

  • collect_values: filter, process and store the returned values from ClickHouse

with the collected values.

Constant Summary collapse

MAX_RUNTIME =
150.seconds
MAX_TTL =
5.minutes.to_i
CLICK_HOUSE_BATCH_SIZE =
100_000
POSTGRESQL_BATCH_SIZE =
2500
LIMIT_STATUSES =
%i[limit_reached over_time].freeze

Instance Method Summary collapse

Instance Method Details

#performObject



33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# File 'app/workers/click_house/concerns/consistency_worker.rb', line 33

def perform
  return unless enabled?

  init_context
  runtime_limiter
  click_house_each_batch do |values|
    collect_values(values)

    break if limit_was_reached?
  end

  process_collected_values

  context[:last_processed_id] = 0 if table_fully_processed?
  ClickHouse::SyncCursor.update_cursor_for(sync_cursor, context[:last_processed_id])
  (:result, )
end