Module: ClickHouse::Concerns::ConsistencyWorker
- Extended by:
- ActiveSupport::Concern
- Includes:
- Gitlab::Utils::StrongMemoize
- Defined in:
- app/workers/click_house/concerns/consistency_worker.rb
Overview
This module can be used for batching over a ClickHouse database table column and do something with the yielded values. The module is responsible for correctly restoring the state (cursor) in case the processing was interrupted or restart the processing from the beginning of the table when the table was fully processed.
This class acts like a “template method” pattern where the implementor classes need to define two methods:
-
init_context: Returns a memoized hash, initializing the context that controls the data processing.
-
pluck_column: which column value to take from the ClickHouse DB when iterating
-
process_collected_values: once a limit is reached or no more data, do something
-
collect_values: filter, process and store the returned values from ClickHouse
with the collected values.
Constant Summary collapse
- MAX_RUNTIME =
150.seconds
- MAX_TTL =
5.minutes.to_i
- CLICK_HOUSE_BATCH_SIZE =
100_000
- POSTGRESQL_BATCH_SIZE =
2500
- LIMIT_STATUSES =
%i[limit_reached over_time].freeze
Instance Method Summary collapse
Instance Method Details
#perform ⇒ Object
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
# File 'app/workers/click_house/concerns/consistency_worker.rb', line 33 def perform return unless enabled? init_context runtime_limiter click_house_each_batch do |values| collect_values(values) break if limit_was_reached? end process_collected_values context[:last_processed_id] = 0 if table_fully_processed? ClickHouse::SyncCursor.update_cursor_for(sync_cursor, context[:last_processed_id]) (:result, ) end |