Pwrake

Parallel Workflow extension for Rake, runs on multicores, clusters, clouds.

  • Author: Masahiro Tanaka

(README in Japanese), (GitHub Repository), (RubyGems)

Features

  • Pwrake executes a workflow written in Rakefile in parallel.
    • The specification of Rakefile is same as Rake.
    • The tasks which do not have mutual dependencies are automatically executed in parallel.
    • The multitask which is a parallel task definition of Rake is no more necessary.
  • Parallel and distributed execution is possible using a computer cluster which consists of multiple compute nodes.
    • Cluster settings: SSH login, and the directory sharing using a shared filesystem, e.g., NFS, Gfarm.
    • Pwrake automatically connects to remote hosts using SSH. You do not need to start a daemon.
    • Remote host names and the number of cores to use are provided in a hostfile.
  • Gfarm file system utilizes storage of compute nodes. It provides the high-performance parallel I/O.
    • Parallel I/O access to local stroage of compute nodes enables scalable increase in the I/O performance.
    • Gfarm schedules a compute node to store an output file, to local storage.
    • Pwrake schedules a compute node to execute a task, to a node where input files are stored.
    • Other supports for Gfarm: Automatic mount of the Gfarm file system, etc.

Installation

Download source tgz/zip and expand, cd to subdirectory and install:

$ ruby setup.rb

Or, gem install:

$ gem install pwrake

Usage

Parallel execution using 4 cores at localhost:

$ pwrake -j 4

Parallel execution using all cores at localhost:

$ pwrake -j

Parallel execution using total 2*2 cores at remote 2 hosts:

  1. Share your directory among remote hosts via distributed file system such as NFS, Gfarm.
  2. Allow passphrase-less access via SSH in either way:
    • Add passphrase-less key generated by ssh-keygen. (Be careful)
    • Add passphrase using ssh-add.
  3. Make hosts file in which remote host names and the number of cores are listed:

    $ cat hosts
    host1 2
    host2 2
    
  4. Run pwrake with an option --hostfile or -F:

    $ pwrake --hostfile=hosts
    

Options

Pwrake command line options (in addition to Rake option)

-F, --hostfile FILE              [Pw] Read hostnames from FILE
-j, --jobs [N]                   [Pw] Number of threads at localhost (default: # of processors)
-L, --log, --log-dir [DIRECTORY] [Pw] Write log to DIRECTORY
    --ssh-opt, --ssh-option OPTION
                                 [Pw] Option passed to SSH
    --filesystem FILESYSTEM      [Pw] Specify FILESYSTEM (nfs|gfarm)
    --gfarm                      [Pw] FILESYSTEM=gfarm
-A, --disable-affinity           [Pw] Turn OFF affinity (AFFINITY=off)
-S, --disable-steal              [Pw] Turn OFF task steal
-d, --debug                      [Pw] Output Debug messages
    --pwrake-conf [FILE]         [Pw] Pwrake configuation file in YAML
    --show-conf, --show-config   [Pw] Show Pwrake configuration options
    --report LOGDIR              [Pw] Report workflow statistics from LOGDIR to HTML and exit.
    --clear-gfarm2fs             [Pw] Clear gfarm2fs mountpoints left after failure.

pwrake_conf.yaml

  • If pwrake_conf.yaml exists at current directory, Pwrake reads options from it.
  • Example (in YAML form):

    HOSTFILE: hosts
    LOG_DIR: true
    DISABLE_AFFINITY: true
    DISABLE_STEAL: true
    FAILED_TARGET: delete
    PASS_ENV :
     - ENV1
     - ENV2
    
  • Option list:

    HOSTFILE, HOSTS   nil(default, localhost)|filename
    LOG_DIR, LOG      nil(default, No log output)|true(dirname="Pwrake%Y%m%d-%H%M%S")|dirname
    LOG_FILE          default="pwrake.log"
    TASK_CSV_FILE     default="task.csv"
    COMMAND_CSV_FILE  default="command.csv"
    GC_LOG_FILE       default="gc.log"
    WORK_DIR          default=$PWD
    FILESYSTEM        default(autodetect)|gfarm
    SSH_OPTION        SSH option
    SHELL_COMMAND     default=$SHELL
    SHELL_RC          Run-Command when shell starts
    PASS_ENV          (Array) Environment variables passed to SSH
    HEARTBEAT         defulat=240 - Hearbeat interval in seconds
    RETRY             default=0 - The number of default task retry
    FAILED_TARGET     rename(default)|delete|leave - Treatment of failed target files
    FAILURE_TERMINATION wait(default)|kill|continue - Behavior of other tasks when a task is failed
    QUEUE_PRIORITY          LIHR(default)|FIFO|LIFO|RANK
    NOACTION_QUEUE_PRIORITY FIFO(default)|LIFO|RAND
    SHELL_START_INTERVAL    default=0.012 (sec)
    GRAPH_PARTITION         false(default)|true
    
  • Options for Gfarm system:

    DISABLE_AFFINITY    default=false
    DISABLE_STEAL       default=false
    GFARM_BASEDIR       default="/tmp"
    GFARM_PREFIX        default="pwrake_$USER"
    GFARM_SUBDIR        default='/'
    MAX_GFWHERE_WORKER  default=8
    GFARM2FS_OPTION     default=""
    GFARM2FS_DEBUG      default=false
    GFARM2FS_DEBUG_WAIT default=1
    

Task Properties

  • Task properties are specified in desc strings above task definition in Rakefile.

Example of Rakefile:

desc "ncore=4 allow=ourhost*" # desc has no effect on rule in original Rake, but it is used for task property in Pwrake.
rule ".o" => ".c" do
  sh "..."
end

(1..n).each do |i|
  desc "ncore=2 steal=no" # desc should be inside of loop because it is effective only for the next task.
  file "task#{i}" do
    sh "..."
  end
end

Properties (The leftmost item is default):

ncore=integer     - The number of cores used by this task.
exclusive=no|yes  - Exclusively execute this task in a single node.
allow=hostname    - Allow this host to execute this task. (accepts wild card)
deny=hostname     - Deny this host to execute this task. (accepts wild card)
order=deny,allow|allow,deny - The order of evaluation.
steal=yes|no      - Allow task stealing for this task.
retry=integer     - The number of retry for this task.

Note for Gfarm

  • gfwhere-pipe script (included in Pwrake) is used for file-affinity scheduling. This script requires Ruby/FFI (https://github.com/ffi/ffi). Install FFI by

    gem install ffi
    

For Graph Partitioning

Current version

  • Pwrake version 2.0.0

Tested Platform

  • Ruby 2.2.2
  • Rake 10.4.2
  • CentOS 6.7

Acknowledgment

This work is supported by

  • JST CREST, research area: "Development of System Software Technologies for Post-Peta Scale High Performance Computing," and
  • MEXT Promotion of Research for Next Generation IT Infrastructure "Resources Linkage for e-Science (RENKEI)."