Module: Chicago::ETL

Defined in:
lib/chicago/etl.rb,
lib/chicago/etl/sink.rb,
lib/chicago/etl/batch.rb,
lib/chicago/etl/stage.rb,
lib/chicago/etl/tasks.rb,
lib/chicago/etl/errors.rb,
lib/chicago/etl/filter.rb,
lib/chicago/etl/counter.rb,
lib/chicago/etl/pipeline.rb,
lib/chicago/etl/null_sink.rb,
lib/chicago/etl/array_sink.rb,
lib/chicago/etl/stage_name.rb,
lib/chicago/etl/key_builder.rb,
lib/chicago/etl/array_source.rb,
lib/chicago/etl/stage_builder.rb,
lib/chicago/etl/table_builder.rb,
lib/chicago/etl/dataset_source.rb,
lib/chicago/etl/transformation.rb,
lib/chicago/etl/dataset_builder.rb,
lib/chicago/etl/mysql_file_sink.rb,
lib/chicago/etl/task_invocation.rb,
lib/chicago/etl/transformations.rb,
lib/chicago/etl/pipeline_endpoint.rb,
lib/chicago/etl/load_dataset_builder.rb,
lib/chicago/etl/transformation_chain.rb,
lib/chicago/etl/mysql_file_serializer.rb,
lib/chicago/etl/screens/column_screen.rb,
lib/chicago/etl/screens/missing_value.rb,
lib/chicago/etl/screens/out_of_bounds.rb,
lib/chicago/etl/screens/invalid_element.rb,
lib/chicago/etl/sequel/dependant_tables.rb,
lib/chicago/etl/row_transformation_stage.rb,
lib/chicago/etl/schema_table_sink_factory.rb,
lib/chicago/etl/schema_table_stage_builder.rb,
lib/chicago/etl/sequel/filter_to_etl_batch.rb,
lib/chicago/etl/transformations/uk_post_code.rb,
lib/chicago/etl/transformations/deduplicate_rows.rb,
lib/chicago/etl/transformations/uk_post_code_field.rb,
lib/chicago/etl/schema_sinks_and_transformations_builder.rb

Overview

Contains classes related to ETL processing.

Defined Under Namespace

Modules: Screens, SequelExtensions, Transformations Classes: ArraySink, ArraySource, Batch, Counter, DatasetBuilder, DatasetSource, DeduplicateRows, Error, ExistingHashColumnKeyBuilder, FactKeyBuilder, Filter, HashingKeyBuilder, IdentifiableDimensionKeyBuilder, KeyBuilder, LoadDatasetBuilder, LoadDimensionStageBuilder, LoadFactStageBuilder, MysqlFileSerializer, MysqlFileSink, NullSink, Pipeline, PipelineEndpoint, RaisingErrorHandler, RakeTasks, RowTransformationStage, SchemaSinksAndTransformationsBuilder, SchemaTableSinkFactory, SchemaTableStageBuilder, Sink, Stage, StageBuilder, StageName, TableBuilder, TaskInvocation, Transformation, TransformationChain

Constant Summary collapse

STREAM =

This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.

The key used to store the stream in the row.

:_stream

Class Method Summary collapse

Class Method Details

.execute(stage, etl_batch, logger) ⇒ Object

Executes a pipeline stage in the context of an ETL Batch.

Tasks execution status is stored in a database etl task invocations table - this ensures tasks aren’t run more than once within a batch.



63
64
65
66
67
68
69
70
71
72
73
# File 'lib/chicago/etl.rb', line 63

def self.execute(stage, etl_batch, logger)
  etl_batch.perform_task(:load, stage.name) do
    if stage.executable?
      logger.debug "Starting executing stage: #{stage.name}"
      stage.execute etl_batch
      logger.info "Finished executing stage: #{stage.name}"
    else
      logger.info "Skipping stage #{stage.name}"
    end
  end
end