Class: SpreadsheetAgent::Runner

Inherits:
Db
  • Object
show all
Defined in:
lib/spreadsheet_agent/runner.rb

Overview

SpreadsheetAgent::Runner is a class designed to facilitate the automated traversal of all, or some defined set of pages, entries, and goals defined in a SpreadsheetAgent compatible Google Spreadsheet, and run agents or processes on them. By placing a SpreadsheetAgent::Runner script into the scheduling system (cron, etc.) on one or more compute nodes, desired pages, entries, and goals can be processed efficiently over a period of time, and new pages, entries, or goals can be automatically picked up as they are introduced. Runners can be designed to automate the submission of agent scripts, check the status of jobs, aggregate information about job status, or automate cleanup tasks.

Instance Attribute Summary collapse

Attributes inherited from Db

#config, #config_file, #db, #session

Instance Method Summary collapse

Methods inherited from Db

#build_db

Constructor Details

#initialize(attributes = { }) ⇒ Runner

Create a new SpreadsheetAgent::Runner. Can be created with any of the following optional attributes:

  • :skip_pages - raises SpreadsheetAgentError if passed along with :only_pages

  • :only_pages - raises SpreadsheetAgentError if passed along with :skip_pages

  • :dry_run

  • :run_in_serial

  • :debug

  • :config_file (see SpreadsheetAgent::Db)

  • :sleep_between

  • :agent_bin



67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
# File 'lib/spreadsheet_agent/runner.rb', line 67

def initialize(attributes = { })
  if (!attributes[:skip_pages].nil? && !attributes[:only_pages].nil?)
    raise SpreadsheetAgentError, "You cannot construct a runner with both only_pages and skip_pages"
  end

  @dry_run = attributes[:dry_run]
  @run_in_serial = attributes[:run_in_serial]
  @debug = attributes[:debug]
  @config_file = attributes[:config_file]

  @sleep_between = 5
  unless attributes[:sleep_between].nil?
    @sleep_between = attributes[:sleep_between]
  end

  @agent_bin = find_bin() + '../agent_bin'
  unless attributes[:agent_bin].nil?
    @agent_bin = attributes[:agent_bin]      
  end

  if attributes[:skip_pages]
    @skip_pages = attributes[:skip_pages].clone
  end

  if attributes[:only_pages]
    @only_pages = attributes[:only_pages].clone
  end

  build_db()
  @query_fields = build_query_fields()

  if @skip_pages
    skip_pages_if do |page|
      @skip_pages.include? page
    end
  end

  if @dry_run
    @debug = true
  end
end

Instance Attribute Details

#agent_binObject

String Path, Optional. The path to the directory containing agent executable programs that the default process PROC executes. The default is the ../agent_bin directory relative to the directory containing the calling script, $0.



41
42
43
# File 'lib/spreadsheet_agent/runner.rb', line 41

def agent_bin
  @agent_bin
end

#debugObject

Boolean, Optional (default false). If true, information about pages, entries, and goals that are checked and filtered is printed to STDERR.



32
33
34
# File 'lib/spreadsheet_agent/runner.rb', line 32

def debug
  @debug
end

#dry_runObject

Boolean. Optional (default false). If true, run will generate the commands that it would run for all runnable entry-goals, print them to STDERR, but not actually run the commands. Automatically sets debug to 1. Note, if the process_entries_with coderef is overridden, dry_run is ignored.



22
23
24
# File 'lib/spreadsheet_agent/runner.rb', line 22

def dry_run
  @dry_run
end

#only_pagesObject (readonly)

Readonly access to the array of pages to be processed. Only pages will only be defined when :only_pages or :skip_pages are defined in the constructor params, or when the skip_pages_if, or only_pages_if methods are called.



52
53
54
# File 'lib/spreadsheet_agent/runner.rb', line 52

def only_pages
  @only_pages
end

#query_fieldsObject (readonly)

Readonly access to the Hash of key_fields, as defined in :config. The runner uses this to construct the commandline for each agent on each entry in a page that gets run, with the value of the GoogleDrive::List entry for the given ‘key’ passed as argument in the order specified by the ‘rank’ field for each key in the config.



47
48
49
# File 'lib/spreadsheet_agent/runner.rb', line 47

def query_fields
  @query_fields
end

#run_in_serialObject

Boolean, Optional (default false). If true, the default process_entries_with PROC runs each agent_script executable in the foreground, rather than in the background, thus in serial. If false, all agent_script executables are run in parallel, in the background. This is not used when process_entries_with is set to a different PROC.



28
29
30
# File 'lib/spreadsheet_agent/runner.rb', line 28

def run_in_serial
  @run_in_serial
end

#sleep_betweenObject

Integer, Optional (default 5). The number of seconds that the runner sleeps between each call to process an entry-goal.



36
37
38
# File 'lib/spreadsheet_agent/runner.rb', line 36

def sleep_between
  @sleep_between
end

Instance Method Details

#only_pages_if(&include_code) ⇒ Object

Provide a PROC desinged to intelligently determine pages to process. If not called, all pages not affected by the :skip_pages, or :only_pages constructor params, or a previous call to skip_pages_if will be processed. This will override only_pages, or skip_pages passed as arguments to the constructor, and any previous call to skip_pages_if, or only_pages_if. The PROC should take the title of a page as a string, and return true if a process decides to include the page, false otherwise. Must be called before the process! method to affect the pages it processes. Returns the runner self to facilitate chained processing with skip_goal, skip_entry, and/or process! if desired.

include only pages whose title begins with 'foo'
runner.only_pages_if {|title| title.match(/^foo/)}.process!

Same, but without calling process so that skip_entry or skip_goal can be called on the runner
runner.only_pages_if do |title|
  title.match(/^foo/)
end
... can call skip_entry, skip_goal
runner.process!


152
153
154
155
# File 'lib/spreadsheet_agent/runner.rb', line 152

def only_pages_if(&include_code)
  @only_pages = @db.worksheets.collect{ |p| p.title }.select { |ptitle| include_code.call(ptitle) }
  self
end

#process!(&runner_code) ⇒ Object

Processes configured pages, entries, and goals with a PROC. The default PROC takes the entry, iterates over each goal not skipped by skip_goal, and:

  • determines if an executable #{ @agent_bin }/#{ goal }_agent.rb script exists

  • if so, executes the goal_agent script with commandline arguments constructed from the values in the entry for each field in the query_fields array defined in config.

If run_in_serial is false, the default PROC runs each agent in the background, in parallel. Otherwise, it runs each serially in the foreground. If dry_run is true, the command is printed to STDERR, but is not run. A PROC supplied to override the default PROC should take an GoogleDrive::List, and GoogleDrive::Worksheet as arguments. This allows the process to query the entry for information using its hash access, and/or update the entry on the spreadsheet. In order for changes to the GoogleDrive::List to take effect, the GoogleDrive::Worksheet must be saved in the PROC. The process sleeps @sleep_between between each call to the PROC (default or otherwise). If dry_run is true when a PROC is supplied, the page.title and runnable_entry hash inspection are printed to STDERR but the PROC is not called.

 # call each goal agent script in agent_bin on each entry in each page
 runner = SpreadsheetAgent::Runer.new
 runner.process!

 # find entries with a threshold > 5 and update the 'threshold_exceeded' field
 runner.skip_entry{|entry| entry['threshold'] <= 5 }.process! do |entry,page|
   entry.update 'threshold_exceeded', "1"
   page.save

# only process entries on the 'main' page where the threshold has not been exceeded
runner.only_pages = ['main']
runner.skip_entry{|entry| entry['threshold'] != 1 }.process!


227
228
229
230
231
232
233
234
235
236
237
238
239
# File 'lib/spreadsheet_agent/runner.rb', line 227

def process!(&runner_code)
  get_runnable_entries().each do ||
    entry_page, runnable_entry = 
    if runner_code.nil?
      default_process(runnable_entry)
    elsif @dry_run
      $stderr.print "Would run #{ entry_page.title } #{ runnable_entry.inspect }"
    else
      runner_code.call(runnable_entry, entry_page)
    end
    sleep @sleep_between
  end
end

#skip_entry(&skip_code) ⇒ Object

Provide a PROC desinged to intelligently determine entries on any page to skip. If not called, all entries on processed pages will be processed. The PROC should take a GoogleDrive::List representing the record in the spreadsheet, which can be accessed as a Hash with fields as key and that fields value as value. It should return true if the code decides to skip processing the entry, false otherwise. Must be called before the process! method to affect the entries on each page that it processes. Returns the runner self to facilitate chained processing with skip_pages_if, only_pages_if, skip_goal, and/or process! if desired.

skip entries which have run foo or bar
runner.only_pages_if {|entry| entry['foo'] == 1 || entry['bar'] == 1 }.process!

skip entries that a human reading the spreadsheet has annotated with less than 3.5 in the 'threshold' field
runner.only_pages_if do |entry|
  entry['threshold'] < 3.5
end
... can call skip_pages_if, only_pages_if, skip_goal
runner.process!


175
176
177
178
# File 'lib/spreadsheet_agent/runner.rb', line 175

def skip_entry(&skip_code)
  @skip_entry_code = skip_code
  self
end

#skip_goal(&skip_code) ⇒ Object

Provide a PROC desinged to skip a specific goal in any entry on all pages processed. If not called, all goals of each entry and page to be processed by the runner will be processed.

[note!] Ignored when a PROC is passed to the process! method, e.g. it is only used when process! executes
agent scripts for the goal.

The PROC should take a string, which will be one of the header fields in the spreadsheet. It should return true if that goal is to be skipped, falsed otherwise. Returns the runner self to facilitate chained processing with skip_pages_if, only_pages_if, skip_entry, and/or process! if desired.

skip the 'post_process' goal on each entry of each page processed
runner.skip_goal{|goal| goal == 'post_process' }.process!

This is best when used in conjunction with skip_entry to skip_goals for particular entries runner.skip_entry{|entry| entry < 2.5 }.skip_goal{|goal| goal == ‘post_process’ }.process!



195
196
197
198
# File 'lib/spreadsheet_agent/runner.rb', line 195

def skip_goal(&skip_code)
  @skip_goal_code = skip_code
  self
end

#skip_pages_if(&skip_code) ⇒ Object

Provide a PROC designed to intelligently filter out pages that are not to be processed. If not called, all pages not defined in :only_pages, or :skip_pages parameters in the constructor, or a previous call to only_pages_if will be processed. This will override only_pages, or skip_pages passed as arguments to the constructor, and any previous call to skip_pages_if, or only_pages_if. The PROC should take the title of a page as a string, and return true if a process decides to skip the page, false otherwise. Must be called before the process! method to affect the pages it processes. Returns the runner self to facilitate chained processing with skip_goal, skip_entry, and/or process! if desired.

skip pages whose title contains 'skip'
runner.skip_pages_if {|title| title.match(/skip/) }.process!

Same, but without calling process so that skip_entry or skip_goal can be called on the runner
runner.skip_pages_if do |title|
  title.match(/skip/)
end
... can call skip_entry, skip_goal, etc
runner.process!


128
129
130
131
# File 'lib/spreadsheet_agent/runner.rb', line 128

def skip_pages_if(&skip_code)
  @only_pages = @db.worksheets.collect{ |p| p.title }.reject{ |ptitle| skip_code.call(ptitle) }
  self
end