Pwrake
Parallel Workflow extension for Rake, runs on multicores, clusters, clouds.
- Author: Masahiro Tanaka
README in Japanese, GitHub Repository, RubyGems
Features
- Pwrake executes a workflow written in Rakefile in parallel.
- The specification of Rakefile is same as Rake.
- The tasks which do not have mutual dependencies are automatically executed in parallel.
- The
multitask
which is a parallel task definition of Rake is no more necessary.
- Parallel and distributed execution is possible using a computer cluster which consists of multiple compute nodes.
- Cluster settings: SSH login (or MPI), and the directory sharing using a shared filesystem, e.g., NFS, Gfarm.
- Pwrake automatically connects to remote hosts using SSH. You do not need to start a daemon.
- Remote host names and the number of cores to use are provided in a hostfile.
- Gfarm file system utilizes storage of compute nodes. It provides the high-performance parallel I/O.
- Parallel I/O access to local storage of compute nodes enables scalable increase in the I/O performance.
- Gfarm schedules a compute node to store an output file, to local storage.
- Pwrake schedules a compute node to execute a task, to a node where input files are stored.
- Other supports for Gfarm: Automatic mount of the Gfarm file system, etc.
Requirement
- Ruby version 2.2.3 or later
- UNIX-like OS
- For distributed processing using multiple computers:
- SSH command
- distributed file system (NFS, Gfarm, etc.)
Installation
Install with RubyGems:
$ gem install pwrake
Or download source tgz/zip and expand, cd to subdirectory and install:
$ ruby setup.rb
If you use rbenv, your system may fail to find pwrake command after installation:
-bash: pwrake: command not found
In this case, you need the rehash of command paths:
$ rbenv rehash
Usage
Parallel execution using 4 cores at localhost:
$ pwrake -j 4
Parallel execution using all cores at localhost:
$ pwrake -j
Parallel execution using total 2*2 cores at remote 2 hosts:
- Share your directory among remote hosts via distributed file system such as NFS, Gfarm.
- Allow passphrase-less access via SSH in either way:
- Add passphrase-less key generated by
ssh-keygen
. (Be careful) - Add passphrase using
ssh-add
.
- Add passphrase-less key generated by
Make
hosts
file in which remote host names and the number of cores are listed:$ cat hosts host1 2 host2 2
Run
pwrake
with an option--hostfile
or-F
:$ pwrake -F hosts
Sustitute MPI for SSH to start remote worker (Experimental)
- Setup MPI on your cluster.
- Install MPipe gem. (requires
mpicc
) Run
pwrake-mpi
command.$ pwrake-mpi -F hosts
Options
Pwrake command line options (in addition to Rake option)
-F, --hostfile FILE [Pw] Read hostnames from FILE
-j, --jobs [N] [Pw] Number of threads at localhost (default: # of processors)
-L, --log, --log-dir [DIRECTORY] [Pw] Write log to DIRECTORY
--ssh-opt, --ssh-option OPTION
[Pw] Option passed to SSH
--filesystem FILESYSTEM [Pw] Specify FILESYSTEM (nfs|gfarm2fs)
--gfarm [Pw] (obsolete; Start pwrake on Gfarm FS)
-A, --disable-affinity [Pw] Turn OFF affinity (AFFINITY=off)
-S, --disable-steal [Pw] Turn OFF task steal
-d, --debug [Pw] Output Debug messages
--pwrake-conf [FILE] [Pw] Pwrake configuration file in YAML
--show-conf, --show-config [Pw] Show Pwrake configuration options
--report LOGDIR [Pw] Generate `report.html' (Report of workflow statistics) in LOGDIR and exit.
--report-image IMAGE_TYPE [Pw] Gnuplot output format (png,jpg,svg etc.) in report.html.
--clear-gfarm2fs [Pw] Clear gfarm2fs mountpoints left after failure.
pwrake_conf.yaml
- If
pwrake_conf.yaml
exists at current directory, Pwrake reads options from it. Example (in YAML form):
HOSTFILE: hosts LOG_DIR: true DISABLE_AFFINITY: true DISABLE_STEAL: true FAILED_TARGET: delete PASS_ENV : - ENV1 - ENV2
Option list:
HOSTFILE, HOSTS nil(default, localhost)|filename LOG_DIR, LOG nil(default, No log output)|true(dirname="Pwrake%Y%m%d-%H%M%S")|dirname LOG_FILE default="pwrake.log" TASK_CSV_FILE default="task.csv" COMMAND_CSV_FILE default="command.csv" GC_LOG_FILE default="gc.log" WORK_DIR default=$PWD FILESYSTEM default(autodetect)|gfarm SSH_OPTION SSH option PASS_ENV (Array) Environment variables passed to SSH HEARTBEAT default=240 - Hearbeat interval in seconds RETRY default=1 - The number of task retry HOST_FAILURE default=2 - The number of allowed continuous host failure (since v2.3) FAILED_TARGET rename(default)|delete|leave - Treatment of failed target files FAILURE_TERMINATION wait(default)|kill|continue - Behavior of other tasks when a task is failed QUEUE_PRIORITY LIFO(default)|FIFO|LIHR(LIfo&Highest-Rank-first; obsolete) DISABLE_RANK_PRIORITY false(default)|true - Disable rank-aware task scheduling (since v2.3) RESERVE_NODE false(default)|true - Reserve a node for tasks with ncore>1 (since v2.3) NOACTION_QUEUE_PRIORITY FIFO(default)|LIFO|RAND SHELL_START_INTERVAL default=0.012 (sec) GRAPH_PARTITION false(default)|true REPORT_IMAGE default=png
Options for Gfarm system:
DISABLE_AFFINITY default=false DISABLE_STEAL default=false GFARM_BASEDIR default="/tmp" GFARM_PREFIX default="pwrake_$USER" GFARM_SUBDIR default='/' MAX_GFWHERE_WORKER default=8 GFARM2FS_COMMAND default='gfarm2fs' GFARM2FS_OPTION default="" GFARM2FS_DEBUG default=false GFARM2FS_DEBUG_WAIT default=1
Task Properties
- Task properties are specified in
desc
strings above task definition in Rakefile.
Example of Rakefile:
desc "ncore=4 allow=ourhost*" # desc has no effect on rule in original Rake, but it is used for task property in Pwrake.
rule ".o" => ".c" do
sh "..."
end
(1..n).each do |i|
desc "ncore=2 steal=no" # desc should be inside of loop because it is effective only for the next task.
file "task#{i}" do
sh "..."
end
end
Properties (The leftmost item is default):
ncore=integer|rational - The number of cores used by this task.
exclusive=no|yes - Exclusively execute this task in a single node.
reserve=no|yes - Gives higher priority to this task if ncore>1. (reserve a host)
allow=hostname - Allow this host to execute this task. (accepts wild card)
deny=hostname - Deny this host to execute this task. (accepts wild card)
order=deny,allow|a llow,deny - The order of evaluation.
steal=yes|no - Allow task stealing for this task.
retry=integer - The number of retry for this task.
Note for Gfarm
- Gfarm file-affinity scheduling is achieved by
gfwhere-pipe
script bundled in the Pwrake package. This script accesseslibgfarm.so.1
through Fiddle (a Ruby's standard module) since Pwrake ver.2.2.7. Please set the environment variableLD_LIBRARY_PATH
correctly to findlibgfarm.so.1
.
Scheduling with Graph Partitioning
Compile and Install METIS 5.1.0 (http://www.cs.umn.edu/~metis/). This requires CMake.
Install RbMetis (https://github.com/masa16/rbmetis) by
gem install rbmetis -- \ --with-metis-include=/usr/local/include \ --with-metis-lib=/usr/local/lib
Option (
pwrake_conf.yaml
):GRAPH_PARTITION: true
See publication: M. Tanaka and O. Tatebe, “Workflow Scheduling to Minimize Data Movement Using Multi-constraint Graph Partitioning,” in CCGrid 2012
Publications
Acknowledgment
This work is supported by:
- JST CREST, research themes:
- MEXT Promotion of Research for Next Generation IT Infrastructure "Resources Linkage for e-Science (RENKEI)."