Class: ETL::Control::DatabaseSource

Inherits:
Source show all
Defined in:
lib/etl/control/source/database_source.rb

Overview

Source object which extracts data from a database using ActiveRecord.

Instance Attribute Summary collapse

Attributes inherited from Source

#configuration, #control, #definition, #local_base, #store_locally

Instance Method Summary collapse

Methods inherited from Source

class_for_name, #errors, #last_local_file, #last_local_file_trigger, #local_file, #local_file_trigger, #read_locally, #timestamp

Constructor Details

#initialize(control, configuration, definition) ⇒ DatabaseSource

Initialize the source.

Arguments:

  • control: The ETL::Control::Control instance

  • configuration: The configuration Hash

  • definition: The source definition

Required configuration options:

  • :target: The target connection

  • :table: The source table name

  • :database: The database name

Other options:

  • :join: Optional join part for the query (ignored unless specified)

  • :select: Optional select part for the query (defaults to ‘*’)

  • :group: Optional group by part for the query (ignored unless specified)

  • :order: Optional order part for the query (ignored unless specified)

  • :new_records_only: Specify the column to use when comparing timestamps against the last successful ETL job execution for the current control file.

  • :store_locally: Set to false to not store a copy of the source data locally in a flat file (defaults to true)



40
41
42
43
44
45
# File 'lib/etl/control/source/database_source.rb', line 40

def initialize(control, configuration, definition)
  super
  @target = configuration[:target]
  @table = configuration[:table]
  @query = configuration[:query]
end

Instance Attribute Details

#tableObject

Returns the value of attribute table.



12
13
14
# File 'lib/etl/control/source/database_source.rb', line 12

def table
  @table
end

#targetObject

Returns the value of attribute target.



11
12
13
# File 'lib/etl/control/source/database_source.rb', line 11

def target
  @target
end

Instance Method Details

#columnsObject

Get the list of columns to read. This is defined in the source definition as either an Array or Hash



96
97
98
99
# File 'lib/etl/control/source/database_source.rb', line 96

def columns
  # weird default is required for writing to cache correctly
  @columns ||= query_rows.any? ? query_rows.first.keys : ['']
end

#count(use_cache = true) ⇒ Object

Get the number of rows in the source



85
86
87
88
89
90
91
92
# File 'lib/etl/control/source/database_source.rb', line 85

def count(use_cache=true)
  return @count if @count && use_cache
  if @store_locally || read_locally
    @count = count_locally
  else
    @count = connection.select_value(query.gsub(/SELECT .* FROM/, 'SELECT count(1) FROM'))
  end
end

#each(&block) ⇒ Object

Returns each row from the source. If read_locally is specified then this method will attempt to read from the last stored local file. If no locally stored file exists or if the trigger file for the last locally stored file does not exist then this method will raise an error.



106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
# File 'lib/etl/control/source/database_source.rb', line 106

def each(&block)
  if read_locally # Read from the last stored source
    ETL::Engine.logger.debug "Reading from local cache"
    read_rows(last_local_file, &block)
  else # Read from the original source
    if @store_locally
      file = local_file
      write_local(file)
      read_rows(file, &block)
    else
      query_rows.each do |r|
        row = ETL::Row.new()
        r.symbolize_keys.each_pair { |key, value|
          row[key] = value
        }
        row.source = self
        yield row
      end
    end
  end
end

#groupObject

Get the group by part of the query, defaults to nil



69
70
71
# File 'lib/etl/control/source/database_source.rb', line 69

def group
  configuration[:group]
end

#joinObject

Get the join part of the query, defaults to nil



59
60
61
# File 'lib/etl/control/source/database_source.rb', line 59

def join
  configuration[:join]
end

#local_directoryObject

Get the local directory to use, which is a combination of the local_base, the db hostname the db database name and the db table.



54
55
56
# File 'lib/etl/control/source/database_source.rb', line 54

def local_directory
  File.join(local_base, to_s)
end

#new_records_onlyObject

Return the column which is used for in the where clause to identify new rows



80
81
82
# File 'lib/etl/control/source/database_source.rb', line 80

def new_records_only
  configuration[:new_records_only]
end

#orderObject

Get the order for the query, defaults to nil



74
75
76
# File 'lib/etl/control/source/database_source.rb', line 74

def order
  configuration[:order]
end

#selectObject

Get the select part of the query, defaults to ‘*’



64
65
66
# File 'lib/etl/control/source/database_source.rb', line 64

def select
  configuration[:select] || '*'
end

#to_sObject

Get a String identifier for the source



48
49
50
# File 'lib/etl/control/source/database_source.rb', line 48

def to_s
  "#{host}/#{database}/#{@table}"
end