Class: Grell::CrawlerManager
- Inherits:
-
Object
- Object
- Grell::CrawlerManager
- Defined in:
- lib/grell/crawler_manager.rb
Overview
Manages the state of the process crawling, does not care about individual pages but about logging, restarting and quiting the crawler correctly.
Defined Under Namespace
Classes: PhantomJSManager
Class Method Summary collapse
Instance Method Summary collapse
-
#check_periodic_restart(collection) ⇒ Object
PhantomJS seems to consume memory increasingly as it crawls, periodic restart allows to restart the driver, potentially calling a block.
-
#initialize(logger: nil, on_periodic_restart: {}, driver: nil) ⇒ CrawlerManager
constructor
logger: logger to use for Grell’s messages on_periodic_restart: if set, the driver will restart every :each visits (100 default) and execute the :do block driver_options: Any extra options for the Capybara driver.
-
#quit ⇒ Object
Quits the poltergeist driver.
-
#restart ⇒ Object
Restarts the PhantomJS process without modifying the state of visited and discovered pages.
Constructor Details
#initialize(logger: nil, on_periodic_restart: {}, driver: nil) ⇒ CrawlerManager
logger: logger to use for Grell’s messages on_periodic_restart: if set, the driver will restart every :each visits (100 default) and execute the :do block driver_options: Any extra options for the Capybara driver
8 9 10 11 12 13 14 15 16 |
# File 'lib/grell/crawler_manager.rb', line 8 def initialize(logger: nil, on_periodic_restart: {}, driver: nil) Grell.logger = logger ? logger : Logger.new(STDOUT) @periodic_restart_block = on_periodic_restart[:do] @periodic_restart_period = on_periodic_restart[:each] || PAGES_TO_RESTART @driver = driver || CapybaraDriver.new. if @periodic_restart_period <= 0 Grell.logger.warn "GRELL. Restart option misconfigured with a negative period. Ignoring option." end end |
Class Method Details
.cleanup_all_processes ⇒ Object
41 42 43 |
# File 'lib/grell/crawler_manager.rb', line 41 def self.cleanup_all_processes PhantomJSManager.new.cleanup_all_processes end |
Instance Method Details
#check_periodic_restart(collection) ⇒ Object
PhantomJS seems to consume memory increasingly as it crawls, periodic restart allows to restart the driver, potentially calling a block.
33 34 35 36 37 38 39 |
# File 'lib/grell/crawler_manager.rb', line 33 def check_periodic_restart(collection) return unless @periodic_restart_block return unless @periodic_restart_period > 0 return unless (collection.visited_pages.size % @periodic_restart_period).zero? restart @periodic_restart_block.call end |