Module: IMW::Workflow
Overview
IMW encourages you to view a data transformation as a series of interdependent steps.
By default, IMW defines four main steps in such a transformation: rip
, parse
, fix
, and package
.
Each step is associated with a directory on disk in which it keeps its files: ripd
, prsd
, fixd
, and pkgd
.
The steps are:
- rip
-
Obtain data via HTTP, FTP, SCP, RSYNC, database query, &c and store the results in
ripd
. - parse
-
Parse data into a structured form using a library (JSON, YAML, &c.) or using your own parser (XML, flat files, &c.) and store the results in
prsd
- fix
-
Combine, filter, reconcile, and transform already structured data into a desired form and store the results in
fixd
. - package
-
Archive, compress, and deliver data in its final form to some location (HTTP, FTP, SCP, RSYNC, S3, EBS, &c.), optionally storing the ouptut in
pkgd
.
Each step depends upon the one before it. The steps are blank by default so there’s no need to write code for steps you don’t need to use. You can also define your own steps (using task
just like in Rake) and hook them into these pre-defined steps (or not…).
A dataset also has an :initialize
task (which by default just creates the directories for these steps) which you can use to hook in your own initialization tasks by making it depend on them.
A subclass of IMW::Dataset can customize how tasks are defined by overriding define_workflow_tasks
, among other methods, and introduce new tasks by overriding define_tasks
.
Constant Summary collapse
- DEFAULT_OPTIONS =
Default options passed to
Rake
. Any class including theRake::TaskManager
module must define a constant by this name. { :dry_run => false, :trace => false, :verbose => false }
Instance Method Summary collapse
-
#define_tasks ⇒ Object
Override this method to define default tasks for a subclass of IMW::Dataset.
-
#file(path, &block) ⇒ IMW::FileTask
Return a new (or existing)
IMW::FileTask
with the givenpath
. -
#file_create(path, &block) ⇒ IMW::FileCreationTask
Return a new (or existing)
IMW::FileCreationTask
with the givenpath
. -
#task(deps, &block) ⇒ IMW::Task
Return a new (or existing)
IMW::Task
with the givenname
. -
#workflow_dirs ⇒ Array
The steps of the IMW workflow each correspond to a directory in which it is customary that they deposit their files once they are finished processing (so ripped files wind up in the
ripd
directory, packaged files in thepkgd
directory, and so on). -
#workflow_steps ⇒ Array
The standard IMW workflow steps.
Instance Method Details
#define_tasks ⇒ Object
Override this method to define default tasks for a subclass of IMW::Dataset.
106 107 |
# File 'lib/imw/dataset/workflow.rb', line 106 def define_tasks end |
#file(path, &block) ⇒ IMW::FileTask
Return a new (or existing) IMW::FileTask
with the given path
. Dependencies can be declared and a block passed in just as in Rake.
88 89 90 91 |
# File 'lib/imw/dataset/workflow.rb', line 88 def file path, &block path = path.respond_to?(:path) ? path.path : path self.define_task IMW::FileTask, path, &block end |
#file_create(path, &block) ⇒ IMW::FileCreationTask
Return a new (or existing) IMW::FileCreationTask
with the given path
. Dependencies can be declared and a block passed in just as in Rake.
99 100 101 102 |
# File 'lib/imw/dataset/workflow.rb', line 99 def file_create path, &block path = path.respond_to?(:path) ? path.path : path self.define_task IMW::FileCreationTask, path, &block end |
#task(deps, &block) ⇒ IMW::Task
Return a new (or existing) IMW::Task
with the given name
. Dependencies can be declared and a block passed in just as in Rake.
Symbol or String) or the name of the task mapped to an Array of dependencies (if a Hash)
78 79 80 |
# File 'lib/imw/dataset/workflow.rb', line 78 def task deps, &block self.define_task IMW::Task, deps, &block end |
#workflow_dirs ⇒ Array
The steps of the IMW workflow each correspond to a directory in which it is customary that they deposit their files once they are finished processing (so ripped files wind up in the ripd
directory, packaged files in the pkgd
directory, and so on).
123 124 125 |
# File 'lib/imw/dataset/workflow.rb', line 123 def workflow_dirs [:ripd, :rawd, :fixd, :pkgd] end |
#workflow_steps ⇒ Array
The standard IMW workflow steps.
112 113 114 |
# File 'lib/imw/dataset/workflow.rb', line 112 def workflow_steps [:rip, :parse, :fix, :package] end |