Pwrake

Parallel Workflow extension for Rake, runs on multicores, clusters, clouds.

  • Author: Masahiro Tanaka

README in Japanese, GitHub Repository, RubyGems

Features

  • Pwrake executes a workflow written in Rakefile in parallel.
    • The specification of Rakefile is same as Rake.
    • The tasks which do not have mutual dependencies are automatically executed in parallel.
    • The multitask which is a parallel task definition of Rake is no more necessary.
  • Parallel and distributed execution is possible using a computer cluster which consists of multiple compute nodes.
    • Cluster settings: SSH login (or MPI), and the directory sharing using a shared filesystem, e.g., NFS, Gfarm.
    • Pwrake automatically connects to remote hosts using SSH. You do not need to start a daemon.
    • Remote host names and the number of cores to use are provided in a hostfile.
  • Gfarm file system utilizes storage of compute nodes. It provides the high-performance parallel I/O.
    • Parallel I/O access to local storage of compute nodes enables scalable increase in the I/O performance.
    • Gfarm schedules a compute node to store an output file, to local storage.
    • Pwrake schedules a compute node to execute a task, to a node where input files are stored.
    • Other supports for Gfarm: Automatic mount of the Gfarm file system, etc.

Requirement

  • Ruby version 2.2.3 or later
  • UNIX-like OS
  • For distributed processing using multiple computers:
    • SSH command
    • distributed file system (NFS, Gfarm, etc.)

Installation

Install with RubyGems:

$ gem install pwrake

Or download source tgz/zip and expand, cd to subdirectory and install:

$ ruby setup.rb

If you use rbenv, your system may fail to find pwrake command after installation:

-bash: pwrake: command not found

In this case, you need the rehash of command paths:

$ rbenv rehash

Usage

Parallel execution using 4 cores at localhost:

$ pwrake -j 4

Parallel execution using all cores at localhost:

$ pwrake -j

Parallel execution using total 2*2 cores at remote 2 hosts:

  1. Share your directory among remote hosts via distributed file system such as NFS, Gfarm.
  2. Allow passphrase-less access via SSH in either way:
    • Add passphrase-less key generated by ssh-keygen. (Be careful)
    • Add passphrase using ssh-add.
  3. Make hosts file in which remote host names and the number of cores are listed:

    $ cat hosts
    host1 2
    host2 2
    
  4. Run pwrake with an option --hostfile or -F:

    $ pwrake -F hosts
    

Sustitute MPI for SSH to start remote worker (Experimental)

  1. Setup MPI on your cluster.
  2. Install MPipe gem. (requires mpicc)
  3. Run pwrake-mpi command.

    $ pwrake-mpi -F hosts
    

Options

Pwrake command line options (in addition to Rake option)

-F, --hostfile FILE              [Pw] Read hostnames from FILE
-j, --jobs [N]                   [Pw] Number of threads at localhost (default: # of processors)
-L, --log, --log-dir [DIRECTORY] [Pw] Write log to DIRECTORY
    --ssh-opt, --ssh-option OPTION
                                 [Pw] Option passed to SSH
    --filesystem FILESYSTEM      [Pw] Specify FILESYSTEM (nfs|gfarm)
    --gfarm                      [Pw] FILESYSTEM=gfarm
-A, --disable-affinity           [Pw] Turn OFF affinity (AFFINITY=off)
-S, --disable-steal              [Pw] Turn OFF task steal
-d, --debug                      [Pw] Output Debug messages
    --pwrake-conf [FILE]         [Pw] Pwrake configuration file in YAML
    --show-conf, --show-config   [Pw] Show Pwrake configuration options
    --report LOGDIR              [Pw] Generate `report.html' (Report of workflow statistics) in LOGDIR and exit.
    --report-image IMAGE_TYPE    [Pw] Gnuplot output format (png,jpg,svg etc.) in report.html.
    --clear-gfarm2fs             [Pw] Clear gfarm2fs mountpoints left after failure.

pwrake_conf.yaml

  • If pwrake_conf.yaml exists at current directory, Pwrake reads options from it.
  • Example (in YAML form):

    HOSTFILE: hosts
    LOG_DIR: true
    DISABLE_AFFINITY: true
    DISABLE_STEAL: true
    FAILED_TARGET: delete
    PASS_ENV :
     - ENV1
     - ENV2
    
  • Option list:

    HOSTFILE, HOSTS   nil(default, localhost)|filename
    LOG_DIR, LOG      nil(default, No log output)|true(dirname="Pwrake%Y%m%d-%H%M%S")|dirname
    LOG_FILE          default="pwrake.log"
    TASK_CSV_FILE     default="task.csv"
    COMMAND_CSV_FILE  default="command.csv"
    GC_LOG_FILE       default="gc.log"
    WORK_DIR          default=$PWD
    FILESYSTEM        default(autodetect)|gfarm
    SSH_OPTION        SSH option
    PASS_ENV          (Array) Environment variables passed to SSH
    HEARTBEAT         default=240 - Hearbeat interval in seconds
    RETRY             default=1 - The number of retry
    FAILED_TARGET     rename(default)|delete|leave - Treatment of failed target files
    FAILURE_TERMINATION wait(default)|kill|continue - Behavior of other tasks when a task is failed
    QUEUE_PRIORITY          LIHR(default)|FIFO|LIFO|RANK
    NOACTION_QUEUE_PRIORITY FIFO(default)|LIFO|RAND
    SHELL_START_INTERVAL    default=0.012 (sec)
    GRAPH_PARTITION         false(default)|true
    REPORT_IMAGE            default=png
    
  • Options for Gfarm system:

    DISABLE_AFFINITY    default=false
    DISABLE_STEAL       default=false
    GFARM_BASEDIR       default="/tmp"
    GFARM_PREFIX        default="pwrake_$USER"
    GFARM_SUBDIR        default='/'
    MAX_GFWHERE_WORKER  default=8
    GFARM2FS_OPTION     default=""
    GFARM2FS_DEBUG      default=false
    GFARM2FS_DEBUG_WAIT default=1
    

Task Properties

  • Task properties are specified in desc strings above task definition in Rakefile.

Example of Rakefile:

desc "ncore=4 allow=ourhost*" # desc has no effect on rule in original Rake, but it is used for task property in Pwrake.
rule ".o" => ".c" do
  sh "..."
end

(1..n).each do |i|
  desc "ncore=2 steal=no" # desc should be inside of loop because it is effective only for the next task.
  file "task#{i}" do
    sh "..."
  end
end

Properties (The leftmost item is default):

ncore=integer     - The number of cores used by this task.
exclusive=no|yes  - Exclusively execute this task in a single node.
reserve=no|yes    - Gives higher priority to this task if ncore>1. (reserve a host)
allow=hostname    - Allow this host to execute this task. (accepts wild card)
deny=hostname     - Deny this host to execute this task. (accepts wild card)
order=deny,allow|allow,deny - The order of evaluation.
steal=yes|no      - Allow task stealing for this task.
retry=integer     - The number of retry for this task.

Note for Gfarm

  • Gfarm file-affinity scheduling is achieved by gfwhere-pipe script bundled in the Pwrake package. This script accesses libgfarm.so.1 through Fiddle (a Ruby's standard module) since Pwrake ver.2.2.7. Please set the environment variable LD_LIBRARY_PATH correctly to find libgfarm.so.1.

Scheduling with Graph Partitioning

Publications

Acknowledgment

This work is supported by: