Module: Parallel
- Defined in:
- lib/parallel/forkmanager.rb
Overview
Parallel::ForkManager – A simple parallel processing fork manager.
Copyright © 2008 Nathan Patwardhan
Author: Nathan Patwardhan <[email protected]>
Documentation: Nathan Patwardhan <[email protected]>, based on Perl Parallel::ForkManager documentation by Noah Robin <[email protected]> and dlux <[email protected]>.
Credits (for original Perl implementation):
-
Chuck Hirstius <[email protected]> (callback exit status, original Perl example)
-
Grant Hopwood <[email protected]> (win32 port)
-
Mark Southern <[email protected]> (bugfix)
Credits (Ruby port):
-
Robert Klemme <[email protected]> (clarification on Ruby lambda)
-
David A. Black <[email protected]> (clarification on Ruby lambda)
-
Roger Pack <[email protected]> (bugfix)
Overview
Parallel::ForkManager is used for operations that you would like to do in parallel (e.g. downloading a bunch of web content simultaneously) but would prefer to use fork() instead of threads. Instead of managing child processes yourself Parallel::ForkManager handles the cleanup for you. Parallel::ForkManager also provides some nifty callbacks you can use at start and finish, or while you’re waiting for child processes to complete.
Introduction
If you’ve used fork() before, you’re well aware that you need to be responsible for managing (i.e. cleaning up) the processes that were created as a result. Parallel::ForkManager handles this for you such that you start() and finish() a process without having to worry about child processes along the way.
For instance you can use the following code to grab a list of webpages in parallel using Net::HTTP – and store the output in files.
Example
#!/usr/bin/env ruby
require 'net/http'
require 'Parallel/ForkManager'
save_dir = '/tmp'
my_urls = [
'http://www.cnn.com/index.html',
'http://www.oreilly.com/index.html',
'http://www.cakewalk.com/index.html',
'http://www.asdfsemicolonl.kj/index.htm'
]
max_proc = 20
pfm = Parallel::ForkManager.new(max_proc)
pfm.run_on_finish(
lambda {
|pid,exit_code,ident|
print "** PID (#{pid}) for #{ident} exited with code #{exit_code}!\n"
}
)
for my_url in my_urls
pfm.start(my_url) and next
url = URI.parse(my_url)
begin
req = Net::HTTP::Get.new(url.path)
res = Net::HTTP.start(url.host, url.port) {|http|
http.request(req)
}
rescue
pfm.finish(255)
end
status = res.code
out_file = save_dir + '/' + url.host + '.txt';
if status.to_i == 200
f = File.open(out_file, 'w')
f.print res.body
f.close()
pfm.finish(0)
else
pfm.finish(255)
end
end
pfm.wait_all_children()
First you need to instantiate the ForkManager with the “new” constructor. You must specify the maximum number of processes to be created. If you specify 0, then NO fork will be done; this is good for debugging purposes.
Next, use pfm.start() to do the fork. pfm returns 0 for the child process, and child pid for the parent process. The “and next” skips the internal loop in the parent process.
-
pm.start() dies if the fork fails.
-
pfm.finish() terminates the child process (assuming a fork was done in the “start”).
-
You cannot use pfm.start() if you are already in the child process.
If you want to manage another set of subprocesses in the child process, you must instantiate another Parallel::ForkManager object!
Bugs and Limitations
Parallel::ForkManager is a Ruby-centric rebase of Perl Parallel::ForkManager 0.7.5. While much of the original code was rewritten such that ForkManager worked in the “Ruby way”, you might find some “warts” due to inconsistencies between Ruby and the original Perl code.
Do not use Parallel::ForkManager in an environment where other child processes can affect the run of the main program, so using this module is not recommended in an environment where fork() / wait() is already used.
If you want to use more than one copy of the Parallel::ForkManager then you have to make sure that all children processes are terminated – before you use the second object in the main program.
You are free to use a new copy of Parallel::ForkManager in the child processes, although I don’t think it makes sense.
Defined Under Namespace
Classes: ForkManager