Module: WebpageArchivist::Fetcher

Defined in:
lib/webpage-archivist/fetcher/fetcher.rb,
lib/webpage-archivist/fetcher/element_request.rb,
lib/webpage-archivist/fetcher/webpage_request.rb,
lib/webpage-archivist/fetcher/requests_plumber.rb,
lib/webpage-archivist/fetcher/stylesheet_request.rb

Overview

Module in charge of fetching pages

Defined Under Namespace

Classes: ElementRequest, FetcherWatcher, RequestsPlumber, StyleSheetRequest, WebpageRequest

Constant Summary collapse

SEMAPHORE =
Mutex.new

Class Method Summary collapse

Class Method Details

.fetch_webpages(webpages) ⇒ Object

Fetch several webpages, return an hash indexed by the webpages holding the corresponding Instances or http result codes (may be existing instances if the pages haven’t changed)



14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# File 'lib/webpage-archivist/fetcher/fetcher.rb', line 14

def self.fetch_webpages webpages
  if webpages.empty?
    []
  else
    SEMAPHORE.synchronize do
      @fetcher_watcher = FetcherWatcher.new
      EventMachine.run do
        webpages.each do |webpage|
          @fetcher_watcher.add_request WebpageRequest.new(webpage, @fetcher_watcher)
        end
        @fetcher_watcher.wait
      end

      result = {}
      @fetcher_watcher.requests.each do |webpage_request|
        result[webpage_request.webpage] = webpage_request.instance ? webpage_request.instance : webpage_request.result_code
      end
      result
    end
  end
end