server-scripts

A gem providing easily usable server scripts for various supercomputers and servers. The following functionality is provided:

  • Generate job scripts and run batch jobs on TSUBAME 3.0, ABCI and reedbush machines.
  • Parse various kinds of profiling files and generate meaningful output.

Table of Contents

Usage

ENV variables

Make sure the SYSTEM variable is set on your machine so that the gem will automatically select the appropriate commands to run.

Writing job scripts

Simple openMPI job script

Use the ServerScripts::BatchJob class in your Ruby for outputting and submitting job files. A simple MPI job can be generated and submitted as follows:

require 'server_scripts'

include ServerScripts

task = BatchJob.new do |t|
  t.nodes = 4
  t.npernode = 4
  t.wall_time = "1:30:00"
  t.out_file = "out.log"
  t.err_file = "err.log"
  t.node_type = NodeType::FULL
  t.mpi = OPENMPI
  t.set_env "STARPU_SCHED", "dmda"
  t.set_env "MKL_NUM_THREADS", "1"
  t.executable = "a.out"
  t.options = "3 32768 2048 2 2"
end

task.submit!

This will generate a unique file name and submit it using the system's batch job submission command.

Intel MPI profiling job script

If you want to generate traces using intel MPI, you can use additional options like setting the ITAC and VTUNE output file/folder names.

Parse intel VTune output

Output chart of intel VTune

The way VTune classfies the output in the CSV is a little funny and should be understood properly unless you want to have a hard time. The output can be said to be classified as a tree that looks like so:

CPU Time
  - Effective Time
    - Idle
    - Poor
    - Ok
    - Ideal
  - Spin Time
    - Imbalance or Serial Spinning
    - Lock Contention
    - MPI Busy Wait Time
    - Other
  - Overhead Time
    - Scheduling
    - Reduction
    - Atomics
    - Other
Wait Time
  - Idle
  - Poor
  - Ok
  - Ideal
  - Over
Wait Count
PID
TID

The total time the sum of CPU Time and Wait Time.

Usage

A sample program for parsing the firt 16 threads reported by the vtune command:

vtune -report hotspots -group-by thread -result-dir result_file.vtune \
    -report-output result_thread_res.csv -csv-delimiter=,
parser = Parser::VTune::Hotspots::SLATE.new(
  "test/artifacts/slate-two-proc-p1.csv", nthreads: 16)

puts parser.total_cpu_time
puts parser.total_cpu_effective_time
puts parser.total_cpu_overhead_time
puts parser.total_wait_time
puts parser.total_mpi_busy_time
puts parser.total_time

Parse intel ITAC output

The intel ITAC tool can be helpful for generating traces of parallel MPI programs. This class can be used for converting an ITAC file to an ideal trace and then generating the function profile for obtaining things like the MPI wait time.

Usage

For extracting the MPI wait time from an ITAC trace, do the following:

require 'server_scripts'

itac = ServerScripts::Parser::ITAC.new("itac_file.stf")
itac.generate_ideal_trace!

# All times are reported in seconds.

puts itac.mpi_time(kind: :ideal)
puts itac.mpi_time(kind: :real)
puts itac.event_time("getrf_start", how: :total, kind: :real)
puts itac.event_time("getrf_start", how: :per_proc, kind: :real)

Parse starpu worker info

The ServerScripts::Parser::StarpuProfile class has various functions for parsing the *.starpu_profile files that are generated by starpu with per-worker CPU execution info. These can be batch-processed using server_scripts by specifying a regex that will match the profile for each process that produces it. You can either get per-worker or per-process information from this.

Usage

parser = Parser::StarpuProfile.new("test/artifacts/4_proc_profile_8_*.starpu_profile")

puts parser.total_time
puts parser.total_exec_time
puts parser.total_sleep_time
puts parser.total_overhead_time
puts parser.time(event: :total_time, proc_id: 0, worker_id: 4)
puts parser.proc_time event: :exec_time, proc_id: 2

CUBEX profiles

Parse data from profiles generated by scorep in .cubex file format.

Usage for getting performance metrics

Use the Parser::Cubex class and provide it with a folder name. The folder should contain a profile.cubex file that will be parsed for the output. Then use the parse method for obtaining various perf counters for any event:

parser = Parser::Cubex.new("test/artifacts/cubex")
puts parser.parse(counter: "PAPI_L3_TCM", event: "gemv")