Class: NDD::UrlChecker::ForkedUrlChecker

Inherits:
AbstractUrlChecker show all
Defined in:
lib/ndd/url_checker/forked_url_checker.rb

Overview

An URL checker using forks to parallelize processing. To be used with MRI.

Author:

  • David DIDIER

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(delegate_checker: nil, parallelism: 10) ⇒ ForkedUrlChecker

Create a new instance.

Parameters:



22
23
24
25
26
# File 'lib/ndd/url_checker/forked_url_checker.rb', line 22

def initialize(delegate_checker: nil, parallelism: 10)
  @logger = Logging.logger[self]
  @delegate = delegate_checker || BlockingUrlChecker.new
  @parallelism = parallelism
end

Instance Attribute Details

#delegate#check (readonly)

the delegate URL checker.

Returns:

  • (#check)

    the current value of delegate



14
15
16
# File 'lib/ndd/url_checker/forked_url_checker.rb', line 14

def delegate
  @delegate
end

#parallelismFixnum (readonly)

the number of processes.

Returns:

  • (Fixnum)

    the current value of parallelism



14
15
16
# File 'lib/ndd/url_checker/forked_url_checker.rb', line 14

def parallelism
  @parallelism
end

Instance Method Details

#check(*urls) ⇒ NDD::UrlChecker::Status+

Checks that the given URLs are valid.

Parameters:

  • urls (String, Array<String>)

    the URLs to check

Returns:



32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# File 'lib/ndd/url_checker/forked_url_checker.rb', line 32

def check(*urls)
  return delegate.check(*urls) if urls.size < 2

  # for receiving results
  result_pipe = Cod.pipe
  # partition the URLs, but not too much :)
  url_slices = partition(urls, [parallelism, urls.size].min)
  # and distribute them among the workers
  pids = url_slices.each_with_index.map do |url_slice, index|
    fork { Worker.new(index, result_pipe, delegate).check(url_slice) }
  end

  # read back the results
  results = urls.each_with_index.map do |_, index|
    result = result_pipe.get
    processed_number = index + 1
    processed_percentage = processed_number.to_f / urls.size.to_f * 100.0
    @logger.debug("Processed URLs #{processed_number}/#{urls.size} (#{processed_percentage.round(2)} %)")
    result
  end

  # kill all the workers
  pids.each { |pid| Process.kill(:TERM, pid) }

  results
end