in-parallel
A lightweight Ruby library with very simple syntax, making use of process.fork for parallelization
Other popular Ruby librarys that do parallel execution support one primary use case - crunching through a large queue of small tasks as quickly and efficiently as possible. This library primarily supports the use case of executing a few larger tasks in parallel and managing the stdout and return values to make it easy to understand which processes are logging what, and what the outcome of the execution was. This library was created to be used by Puppet's Beaker test framework to enable parallel execution of some of the framework's tasks, and allow people within thier tests to execute code in parallel when wanted. This solution does not check to see how many processors you have, it just forks as many processes as you ask for. That means that it will handle a handful of parallel processes well, but could definitely overload your system with ruby processes if you try to spin up a LOT of processes. If you're looking for something simple and light-weight and on either linux or mac (forking processes is not supported on Windows), then this solution could be what you want.
If you are looking for something to support executing a lot of tasks in parallel as efficiently as possible, you should take a look at the parallel project.
Methods:
run_in_parallel(&block)
- You can put whatever methods you want to execute in parallel into a block, and each method will be executed in parallel (unless the method is defined in kernel).
- Any methods further down the stack won't be affected, only the ones directly within the block.
- Any methods further down the stack won't be affected, only the ones directly within the block.
- You can assign the results to instance variables and it just works, no dealing with an array or map of results.
- Log STDOUT and STDERR chunked per process to the console so that it is easy to see what happened in which process.
- Waits for each process in realtime and logs immediately upon completion of each process
- If an exception is raised by a child process, it will immediately be re-raised in the primary process and kill all other still running child processes
def method_with_param(name)
ret_val = "hello #{name} \n"
puts ret_val
ret_val
end
def method_without_param
# A result more complex than a string will be marshalled and unmarshalled and work
ret_val = {:foo => "bar"}
puts ret_val
return ret_val
end
# Example:
# will spawn 2 processes, (1 for each method) wait until they both complete, log chunked STDOUT/STDERR for
# each process and assign the method return values to instance variables:
InParallel.run_in_parallel {
@result_1 = method_with_param('world')
@result_2 = method_without_param
}
puts "#{@result_1}, #{@result_2[:foo]}"
STDOUT would be:
Forked process for 'method_with_param' - PID = '49398'
Forked process for 'method_without_param' - PID = '49399'
------ Begin output for method_with_param - 49398
hello world
------ Completed output for method_with_param - 49398
------ Begin output for method_without_param - 49399
{:foo=>"bar"}
------ Completed output for method_without_param - 49399
hello world, bar
run_in_background(ignore_results = true, &block)
- This does basically the same thing as run_in_parallel, except it does not wait for execution of all processes to complete, it returns immediately.
- You can optionally ignore results completely (default) or delay evaluating the results until later
- You can run multiple blocks in the background and then at some later point evaluate all of the results
TMP_FILE = '/tmp/test_file.txt'
def create_file_with_delay(file_path)
sleep 2
File.open(file_path, 'w') { |f| f.write('contents')}
return true
end
# Example 1 - ignore results
run_in_background{
create_file_with_delay(TMP_FILE)
}
# Should not exist immediately upon block completion
puts(File.exists?(TMP_FILE)) # false
sleep(3)
# Should exist once the delay from create_file_with_delay is done
puts(File.exists?(TMP_FILE)) # true
# Example 2 - delay results
run_in_background(false){
@result = create_file_with_delay(TMP_FILE)
}
# Do something else
run_in_background(false){
@result2 = create_file_with_delay('/tmp/someotherfile.txt')
}
# @result has not been assigned yet
puts @result >> "unresolved_parallel_result_0"
# This assigns all instance variables within the block and writes STDOUT and STDERR from the process to console.
wait_for_processes
puts @result # true
puts @result2 # true
Enumerable.each_in_parallel(&block)
- This is very similar to other solutions, except that it directly extends the Enumerable class with an each_in_parallel method, giving you the ability to pretty simply spawn a process for any item in an array or map.
- Identifies the block location (or caller location if the block does not have a source_location) in the console log to make it clear which block is being executed
["foo", "bar", "baz"].each_in_parallel { |item|
puts |item|
}
STDOUT:
'each_in_parallel' spawned process for '/Users/samwoods/parallel_test/test.rb:77:in `block (2 levels) in <top (required)>'' - PID = '51600'
'each_in_parallel' spawned process for '/Users/samwoods/parallel_test/test.rb:77:in `block (2 levels) in <top (required)>'' - PID = '51601'
'each_in_parallel' spawned process for '/Users/samwoods/parallel_test/test.rb:77:in `block (2 levels) in <top (required)>'' - PID = '51602'
------ Begin output for /Users/samwoods/parallel_test/test.rb:77:in `block (2 levels) in <top (required)>' - 51600
foo
------ Completed output for /Users/samwoods/parallel_test/test.rb:77:in `block (2 levels) in <top (required)>' - 51600
------ Begin output for /Users/samwoods/parallel_test/test.rb:77:in `block (2 levels) in <top (required)>' - 51601
bar
------ Completed output for /Users/samwoods/parallel_test/test.rb:77:in `block (2 levels) in <top (required)>' - 51601
------ Begin output for /Users/samwoods/parallel_test/test.rb:77:in `block (2 levels) in <top (required)>' - 51602
baz
------ Completed output for /Users/samwoods/parallel_test/test.rb:77:in `block (2 levels) in <top (required)>' - 51602