Class: DataMiner::Script

Inherits:
Object
  • Object
show all
Defined in:
lib/data_miner/script.rb

Overview

The container that holds each step in the script.

Constant Summary collapse

UNIQ_THREAD_VAR =
'DataMiner::Script.current_uniq'
STACK_THREAD_VAR =
'DataMiner::Script.current_stack'

Instance Attribute Summary collapse

Instance Method Summary collapse

Instance Attribute Details

#stepsArray<DataMiner::Step> (readonly)

The steps in the script.


46
47
48
# File 'lib/data_miner/script.rb', line 46

def steps
  @steps
end

Instance Method Details

#append(*args, &blk) ⇒ nil

Append a step to a script. Mostly for internal use.


225
226
227
228
# File 'lib/data_miner/script.rb', line 225

def append(*args, &blk)
  steps << make(*args, &blk)
  nil
end

#append_once(*args, &blk) ⇒ nil

Append a step to a script unless it's already there. Mostly for internal use.


214
215
216
217
218
219
220
# File 'lib/data_miner/script.rb', line 214

def append_once(*args, &blk)
  step = make(*args, &blk)
  unless steps.include? step
    steps << step
  end
  nil
end

#import(description, settings) { ... } ⇒ nil

Note:

Be sure to check out github.com/seamusabshere/remote_table and github.com/seamusabshere/errata for available settings.

Note:

There are hundreds of import examples in github.com/brighterplanet/earth. The README points to a few (at the bottom.)

Note:

We often use string primary keys to make idempotency easier. github.com/seamusabshere/active_record_inline_schema supports defining these inline.

Note:

Enabling :validate may slow down importing large files because it precludes bulk loading using github.com/seamusabshere/upsert.

Import rows into your model.

As long as…

  1. you key on the primary key, or

  2. the table has an auto-increment primary key, or

  3. you DON'T enable :validate

… then things will be sped up using the upsert library in streaming mode.

Otherwise, native ActiveRecord constuctors and validations will be used.

Examples:

From the README

data_miner do
  [...]
  import("OpenGeoCode.org's Country Codes to Country Names list",
         :url => 'http://opengeocode.org/download/countrynames.txt',
         :format => :delimited,
         :delimiter => '; ',
         :headers => false,
         :skip => 22) do
    key   :iso_3166_code, :field_number => 0
    store :iso_3166_alpha_3_code, :field_number => 1
    store :iso_3166_numeric_code, :field_number => 2
    store :name, :field_number => 5
  end
  [...]
end

Options Hash (settings):

  • :url (String)

    The URL of the data source. Passed directly to RemoteTable.new.

  • :errata (Hash)

    The :responder and :url settings that will be passed to Errata.new.

  • :validate (TrueClass, FalseClass)

    Whether to always run ActiveRecord validations.

  • anything (*)

    Any other setting will be passed to RemoteTable.new.

Yields:

  • A block defining how to key the import (to make it idempotent) and which columns to store.

See Also:


170
171
172
# File 'lib/data_miner/script.rb', line 170

def import(description, settings, &blk)
  append(:import, description, settings, &blk)
end

#prepend(*args, &blk) ⇒ nil

Prepend a step to a script. Mostly for internal use.


206
207
208
209
# File 'lib/data_miner/script.rb', line 206

def prepend(*args, &blk)
  steps.unshift make(*args, &blk)
  nil
end

#prepend_once(*args, &blk) ⇒ nil

Prepend a step to a script unless it's already there. Mostly for internal use.


195
196
197
198
199
200
201
# File 'lib/data_miner/script.rb', line 195

def prepend_once(*args, &blk)
  step = make(*args, &blk)
  unless steps.include? step
    steps.unshift step
  end
  nil
end

#process(method_id) ⇒ nil #process(description) { ... } ⇒ nil

Identify a single method or a define block of arbitrary code to be executed.

Examples:

Single class method

data_miner do
  [...]
  process :update_averages!
  [...]
end

Arbitrary code

data_miner do
  [...]
  process "do some arbitrary stuff" do
    [...]
  end
  [...]
end

Overloads:

  • #process(method_id) ⇒ nil

    Run a class method on the model.

  • #process(description) { ... } ⇒ nil

    Run a block of code.

    Yields:

    • The block to be evaluated in the context of the model (it's instance_eval'ed on the model class)

See Also:


91
92
93
# File 'lib/data_miner/script.rb', line 91

def process(method_id_or_description, &blk)
  append(:process, method_id_or_description, &blk)
end

#sql(description, url_or_statement) ⇒ Object

Note:

url_or_statement is auto-detected by looking for +%rhref=“^*”>s]*/+ (non-spaces followed by a slash followed by non-asterisk). Therefore if you're passing a local file path and want it to be treated like a URL, make it absolute.

Execute SQL, provided either as a string or a URL.

Examples:

Rapidly get a list of countries from Brighter Planet's Reference Data web service

data_miner do
  sql "Brighter Planet's countries", 'http://data.brighterplanet.com/countries.sql'
end

See Also:


188
189
190
# File 'lib/data_miner/script.rb', line 188

def sql(description, url_or_statement)
  append(:sql, description, url_or_statement)
end

#startObject

Note:

Normally you should use Country.run_data_miner!

Note:

A primitive “call stack” is kept that will prevent infinite loops. So, if Country's data miner script calls Province's AND vice-versa, each one will only be run once.

Run the script for this model. Mostly for internal use.


236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
# File 'lib/data_miner/script.rb', line 236

def start
  model_name = model.name
  # $stderr.write "0 - #{model_name}\n"
  # $stderr.write "A - current_uniq - #{Script.current_uniq ? 'true' : 'false'}\n"
  # $stderr.write "B - #{Script.current_stack.join(',')}\n"
  if Script.current_uniq and Script.current_stack.include?(model_name)
    # we've already done this in the current stack, so skip it
    return
  end
  if not Script.current_uniq
    # since we're not trying to uniq, ignore the current contents of the stack
    Script.current_stack.clear
  end
  Script.current_stack << model_name
  steps.each do |step|
    steps.each do |other|
      other.register step
    end
  end
  steps.each_with_index do |step, i|
    begin
      DataMiner.logger.info "[DataMiner] START #{step.model.name} step #{i} #{step.description.inspect}"
      step.start
      model.reset_column_information
    rescue
      DataMiner.logger.info "[DataMiner] FAIL #{step.model.name} step #{i} (#{step.description.inspect})"
      raise $!
    end
    DataMiner.logger.info "[DataMiner] DONE #{step.model.name} step #{i} (#{step.description.inspect})"
  end
  nil
end

#test(description, settings = {}) { ... } ⇒ nil

A step that runs tests and stops the data miner on failures.

rspec-expectations are automatically included.

Examples:

Tests

data_miner do
  [...]
  test "make sure something works" do
    expect(Pet.count).to be > 10
  end
  [...]
  test "make sure something works", after: 20 do
    [...]
  end
  [...]
end

Options Hash (settings):

  • :after (String)

    After how many rows of the previous step to run the tests.

Yields:

  • Tests to be run

See Also:


121
122
123
# File 'lib/data_miner/script.rb', line 121

def test(description, settings = {}, &blk)
  append(:test, description, settings, &blk)
end