Class: OodCore::Job::Adapters::Slurm
- Inherits:
-
OodCore::Job::Adapter
- Object
- OodCore::Job::Adapter
- OodCore::Job::Adapters::Slurm
- Defined in:
- lib/ood_core/job/adapters/slurm.rb
Overview
An adapter object that describes the communication with a Slurm resource manager for job management.
Defined Under Namespace
Classes: Batch
Constant Summary collapse
- STATE_MAP =
Mapping of state codes for Slurm
{ 'BF' => :completed, # BOOT_FAIL 'CA' => :completed, # CANCELLED 'CD' => :completed, # COMPLETED 'CF' => :queued, # CONFIGURING 'CG' => :running, # COMPLETING 'F' => :completed, # FAILED 'NF' => :completed, # NODE_FAIL 'PD' => :queued, # PENDING 'PR' => :suspended, # PREEMPTED 'RV' => :completed, # REVOKED 'R' => :running, # RUNNING 'SE' => :completed, # SPECIAL_EXIT 'ST' => :running, # STOPPED 'S' => :suspended, # SUSPENDED 'TO' => :completed # TIMEOUT }
Instance Method Summary collapse
-
#delete(id) ⇒ void
Delete the submitted job.
-
#hold(id) ⇒ void
Put the submitted job on hold.
-
#info(id) ⇒ Info
Retrieve job info from the resource manager.
-
#info_all ⇒ Array<Info>
Retrieve info for all jobs from the resource manager.
-
#initialize(opts = {}) ⇒ Slurm
constructor
private
A new instance of Slurm.
-
#release(id) ⇒ void
Release the job that is on hold.
-
#status(id) ⇒ Status
Retrieve job status from resource manager.
-
#submit(script, after: [], afterok: [], afternotok: [], afterany: []) ⇒ String
Submit a job with the attributes defined in the job template instance.
Methods inherited from OodCore::Job::Adapter
Constructor Details
#initialize(opts = {}) ⇒ Slurm
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Returns a new instance of Slurm.
231 232 233 234 235 |
# File 'lib/ood_core/job/adapters/slurm.rb', line 231 def initialize(opts = {}) o = opts.to_h.symbolize_keys @slurm = o.fetch(:slurm) { raise ArgumentError, "No slurm object specified. Missing argument: slurm" } end |
Instance Method Details
#delete(id) ⇒ void
This method returns an undefined value.
Delete the submitted job
409 410 411 412 413 414 |
# File 'lib/ood_core/job/adapters/slurm.rb', line 409 def delete(id) @slurm.delete_job(id.to_s) rescue Batch::Error => e # assume successful job deletion if can't find job id raise JobAdapterError, e. unless /Invalid job id specified/ =~ e. end |
#hold(id) ⇒ void
This method returns an undefined value.
Put the submitted job on hold
385 386 387 388 389 390 |
# File 'lib/ood_core/job/adapters/slurm.rb', line 385 def hold(id) @slurm.hold_job(id.to_s) rescue Batch::Error => e # assume successful job hold if can't find job id raise JobAdapterError, e. unless /Invalid job id specified/ =~ e. end |
#info(id) ⇒ Info
Retrieve job info from the resource manager
324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 |
# File 'lib/ood_core/job/adapters/slurm.rb', line 324 def info(id) id = id.to_s info_ary = @slurm.get_jobs(id: id).map do |v| parse_job_info(v) end # A job id can return multiple jobs if it corresponds to a job # array id, so we need to find the job that corresponds to the # given job id (if we can't find it, we assume it has completed) info_ary.detect( -> { Info.new(id: id, status: :completed) } ) do |info| # Match the job id or the formatted job & task id "1234_0" info.id == id || info.native[:array_job_task_id] == id end rescue Batch::Error => e # set completed status if can't find job id if /Invalid job id specified/ =~ e. Info.new( id: id, status: :completed ) else raise JobAdapterError, e. end end |
#info_all ⇒ Array<Info>
Retrieve info for all jobs from the resource manager
311 312 313 314 315 316 317 |
# File 'lib/ood_core/job/adapters/slurm.rb', line 311 def info_all @slurm.get_jobs.map do |v| parse_job_info(v) end rescue Batch::Error => e raise JobAdapterError, e. end |
#release(id) ⇒ void
This method returns an undefined value.
Release the job that is on hold
397 398 399 400 401 402 |
# File 'lib/ood_core/job/adapters/slurm.rb', line 397 def release(id) @slurm.release_job(id.to_s) rescue Batch::Error => e # assume successful job release if can't find job id raise JobAdapterError, e. unless /Invalid job id specified/ =~ e. end |
#status(id) ⇒ Status
Retrieve job status from resource manager
354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 |
# File 'lib/ood_core/job/adapters/slurm.rb', line 354 def status(id) id = id.to_s jobs = @slurm.get_jobs( id: id, filters: [:job_id, :array_job_task_id, :state_compact] ) # A job id can return multiple jobs if it corresponds to a job array # id, so we need to find the job that corresponds to the given job id # (if we can't find it, we assume it has completed) # # Match against the job id or the formatted job & task id "1234_0" if job = jobs.detect { |j| j[:job_id] == id || j[:array_job_task_id] == id } Status.new(state: get_state(job[:state_compact])) else # set completed status if can't find job id Status.new(state: :completed) end rescue Batch::Error => e # set completed status if can't find job id if /Invalid job id specified/ =~ e. Status.new(state: :completed) else raise JobAdapterError, e. end end |
#submit(script, after: [], afterok: [], afternotok: [], afterany: []) ⇒ String
Submit a job with the attributes defined in the job template instance
252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 |
# File 'lib/ood_core/job/adapters/slurm.rb', line 252 def submit(script, after: [], afterok: [], afternotok: [], afterany: []) after = Array(after).map(&:to_s) afterok = Array(afterok).map(&:to_s) afternotok = Array(afternotok).map(&:to_s) afterany = Array(afterany).map(&:to_s) # Set sbatch options args = [] # ignore args, don't know how to do this for slurm args += ["-H"] if script.submit_as_hold args += (script.rerunnable ? ["--requeue"] : ["--no-requeue"]) unless script.rerunnable.nil? args += ["-D", script.workdir.to_s] unless script.workdir.nil? args += ["--mail-user", script.email.join(",")] unless script.email.nil? if script.email_on_started && script.email_on_terminated args += ["--mail-type", "ALL"] elsif script.email_on_started args += ["--mail-type", "BEGIN"] elsif script.email_on_terminated args += ["--mail-type", "END"] elsif script.email_on_started == false && script.email_on_terminated == false args += ["--mail-type", "NONE"] end args += ["-J", script.job_name] unless script.job_name.nil? args += ["-i", script.input_path] unless script.input_path.nil? args += ["-o", script.output_path] unless script.output_path.nil? args += ["-e", script.error_path] unless script.error_path.nil? args += ["--reservation", script.reservation_id] unless script.reservation_id.nil? args += ["-p", script.queue_name] unless script.queue_name.nil? args += ["--priority", script.priority] unless script.priority.nil? args += ["--begin", script.start_time.localtime.strftime("%C%y-%m-%dT%H:%M:%S")] unless script.start_time.nil? args += ["-A", script.accounting_id] unless script.accounting_id.nil? args += ["-t", seconds_to_duration(script.wall_time)] unless script.wall_time.nil? # ignore nodes, don't know how to do this for slurm # Set dependencies depend = [] depend << "after:#{after.join(":")}" unless after.empty? depend << "afterok:#{afterok.join(":")}" unless afterok.empty? depend << "afternotok:#{afternotok.join(":")}" unless afternotok.empty? depend << "afterany:#{afterany.join(":")}" unless afterany.empty? args += ["-d", depend.join(",")] unless depend.empty? # Set environment variables env = script.job_environment || {} args += ["--export", script.job_environment.keys.join(",")] unless script.job_environment.nil? || script.job_environment.empty? # Set native options args += script.native if script.native # Submit job @slurm.submit_string(script.content, args: args, env: env) rescue Batch::Error => e raise JobAdapterError, e. end |