Class: DataMiner::Script

Inherits:
Object
  • Object
show all
Defined in:
lib/data_miner/script.rb

Overview

The container that holds each step in the script.

Constant Summary collapse

UNIQ_THREAD_VAR =
'DataMiner::Script.current_uniq'
STACK_THREAD_VAR =
'DataMiner::Script.current_stack'

Instance Attribute Summary collapse

Instance Method Summary collapse

Instance Attribute Details

#stepsArray<DataMiner::Step> (readonly)

The steps in the script.

Returns:



46
47
48
# File 'lib/data_miner/script.rb', line 46

def steps
  @steps
end

Instance Method Details

#append(*args, &blk) ⇒ nil

Append a step to a script. Mostly for internal use.

Returns:

  • (nil)


195
196
197
198
# File 'lib/data_miner/script.rb', line 195

def append(*args, &blk)
  steps << make(*args, &blk)
  nil
end

#append_once(*args, &blk) ⇒ nil

Append a step to a script unless it’s already there. Mostly for internal use.

Returns:

  • (nil)


184
185
186
187
188
189
190
# File 'lib/data_miner/script.rb', line 184

def append_once(*args, &blk)
  step = make(*args, &blk)
  unless steps.include? step
    steps << step
  end
  nil
end

#import(description, settings) { ... } ⇒ nil

Note:

Be sure to check out github.com/seamusabshere/remote_table and github.com/seamusabshere/errata for available settings.

Note:

There are hundreds of import examples in github.com/brighterplanet/earth. The README points to a few (at the bottom.)

Note:

We often use string primary keys to make idempotency easier. github.com/seamusabshere/active_record_inline_schema supports defining these inline.

Note:

Enabling :validate may slow down importing large files because it precludes bulk loading using github.com/seamusabshere/upsert.

Import rows into your model.

As long as…

  1. you key on the primary key, or

  2. the table has an auto-increment primary key, or

  3. you DON’T enable :validate

… then things will be sped up using the upsert library in streaming mode.

Otherwise, native ActiveRecord constuctors and validations will be used.

Examples:

From the README

data_miner do
  [...]
  import("OpenGeoCode.org's Country Codes to Country Names list",
         :url => 'http://opengeocode.org/download/countrynames.txt',
         :format => :delimited,
         :delimiter => '; ',
         :headers => false,
         :skip => 22) do
    key   :iso_3166_code, :field_number => 0
    store :iso_3166_alpha_3_code, :field_number => 1
    store :iso_3166_numeric_code, :field_number => 2
    store :name, :field_number => 5
  end
  [...]
end

Parameters:

  • description (String)

    A description of the data source.

  • settings (Hash)

    Settings, including URL of the data source, that are used to download/parse (using RemoteTable) and (sometimes) correct (using Errata) the data.

Options Hash (settings):

  • :url (String)

    The URL of the data source. Passed directly to RemoteTable.new.

  • :errata (Hash)

    The :responder and :url settings that will be passed to Errata.new.

  • :validate (TrueClass, FalseClass)

    Whether to always run ActiveRecord validations.

  • anything (*)

    Any other setting will be passed to RemoteTable.new.

Yields:

  • A block defining how to key the import (to make it idempotent) and which columns to store.

Returns:

  • (nil)

See Also:



140
141
142
# File 'lib/data_miner/script.rb', line 140

def import(description, settings, &blk)
  append(:import, description, settings, &blk)
end

#prepend(*args, &blk) ⇒ nil

Prepend a step to a script. Mostly for internal use.

Returns:

  • (nil)


176
177
178
179
# File 'lib/data_miner/script.rb', line 176

def prepend(*args, &blk)
  steps.unshift make(*args, &blk)
  nil
end

#prepend_once(*args, &blk) ⇒ nil

Prepend a step to a script unless it’s already there. Mostly for internal use.

Returns:

  • (nil)


165
166
167
168
169
170
171
# File 'lib/data_miner/script.rb', line 165

def prepend_once(*args, &blk)
  step = make(*args, &blk)
  unless steps.include? step
    steps.unshift step
  end
  nil
end

#process(method_id) ⇒ nil #process(description) { ... } ⇒ nil

Identify a single method or a define block of arbitrary code to be executed.

Examples:

Single class method

data_miner do
  [...]
  process :update_averages!
  [...]
end

Arbitrary code

data_miner do
  [...]
  process "do some arbitrary stuff" do
    [...]
  end
  [...]
end

Overloads:

  • #process(method_id) ⇒ nil

    Run a class method on the model.

    Parameters:

    • method_id (Symbol)

      The class method to be run on the model.

  • #process(description) { ... } ⇒ nil

    Run a block of code.

    Parameters:

    • description (String)

      A description of what the block does.

    Yields:

    • The block to be evaluated in the context of the model (it’s instance_eval’ed on the model class)

Returns:

  • (nil)

See Also:



91
92
93
# File 'lib/data_miner/script.rb', line 91

def process(method_id_or_description, &blk)
  append(:process, method_id_or_description, &blk)
end

#sql(description, url_or_statement) ⇒ Object

Note:

url_or_statement is auto-detected by looking for %r{^[^s]/[^]} (non-spaces followed by a slash followed by non-asterisk). Therefore if you’re passing a local file path and want it to be treated like a URL, make it absolute.

Execute SQL, provided either as a string or a URL.

Examples:

Rapidly get a list of countries from Brighter Planet’s Reference Data web service

data_miner do
  sql "Brighter Planet's countries", 'http://data.brighterplanet.com/countries.sql'
end

Parameters:

  • description (String)

    What this step does.

  • url_or_statement (String)

    SQL statement as a String or location of the SQL file as a URL.

See Also:



158
159
160
# File 'lib/data_miner/script.rb', line 158

def sql(description, url_or_statement)
  append(:sql, description, url_or_statement)
end

#startDataMiner::Run

Note:

Normally you should use Country.run_data_miner!

Note:

A primitive “call stack” is kept that will prevent infinite loops. So, if Country’s data miner script calls Province’s AND vice-versa, each one will only be run once.

Run the script for this model. Mostly for internal use.

Returns:



206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
# File 'lib/data_miner/script.rb', line 206

def start
  model_name = model.name
  # $stderr.write "0 - #{model_name}\n"
  # $stderr.write "A - current_uniq - #{Script.current_uniq ? 'true' : 'false'}\n"
  # $stderr.write "B - #{Script.current_stack.join(',')}\n"
  if Script.current_uniq and Script.current_stack.include?(model_name)
    # we've already done this in the current stack, so skip it
    return
  end
  if not Script.current_uniq
    # since we're not trying to uniq, ignore the current contents of the stack
    Script.current_stack.clear
  end
  Script.current_stack << model_name
  unless Run.table_exists?
    Run.auto_upgrade!
  end
  run = Run.new
  run.model_name = model_name
  run.start do
    steps.each do |step|
      step.start
      model.reset_column_information
    end
  end
end