Class: Wayfarer::Job
- Inherits:
-
ActiveJob::Base
- Object
- ActiveJob::Base
- Wayfarer::Job
- Extended by:
- Forwardable
- Includes:
- Hooks, Locals
- Defined in:
- lib/wayfarer/job.rb
Overview
A Job is a class that has a Routing::Router with many Routing::Rules which are matched against a URI. Rules map URIs onto job instance methods. Under the hood, jobs are instantiated within separate threads by a Processor. Every instance gets its own thread. If a URI is matched, its Page is retrieved, and made available to instance methods via #page.
Jobs implement ActiveJob's Job API and are therefore compatible with a wide range of job queues. To run a job immediately, call ::perform_now. enqueue a job, call ::perform_later.
Callbacks collapse
-
.config {|Configuration| ... } ⇒ Configuration
A configuration based off the global config.
-
.router(&proc) ⇒ Routing::Router
(also: route, routes)
A router.
- #adapter ⇒ Object
-
#page ⇒ Object
protected
The Page representing the URI currently processed by an action.
- #params ⇒ Object
-
#staged_uris ⇒ Array<String>, Array<URI>
readonly
URIs to stage for the next cycle.
Callbacks collapse
-
.after_crawl ⇒ Object
Callback that fires once after all pages have been retrieved and processing is done.
-
.before_crawl ⇒ Object
Callback that fires once before any pages are retrieved.
-
.prepare ⇒ Object
Returns a class copy.
-
.setup_adapter {|[HTTPAdapters::NetHTTPAdapter, HTTPAdapters::SeleniumAdapter], [Selenium::WebDriver::Driver, nil], [Capybara::Selenium::Driver, nil]| ... } ⇒ Object
Callback that fires when HTTP adapters are instantiated.
-
#browser ⇒ Object
protected
A Capybara driver that wraps the #driver.
-
#doc ⇒ Object
protected
The parsed response body.
-
#driver ⇒ Object
protected
The Selenium WebDriver.
-
#halt ⇒ Object
protected
Sets a halting flag that signals the processor to stop its work.
-
#halts? ⇒ Boolean
Whether this job will stop processing.
-
#initialize(*argv) ⇒ Job
constructor
A new instance of Job.
- #logger ⇒ Object protected
-
#perform(*uris) ⇒ Object
Performs this job.
-
#stage(*uris) ⇒ Object
protected
Adds URIs to process in the next cycle.
Methods included from Locals
included, thread_safe_counterpart
Constructor Details
#initialize(*argv) ⇒ Job
Returns a new instance of Job.
119 120 121 122 123 |
# File 'lib/wayfarer/job.rb', line 119 def initialize(*argv) @halts = false @staged_uris = [] super(*argv) end |
Class Attribute Details
.config {|Configuration| ... } ⇒ Configuration
A configuration based off the global Wayfarer.config.
83 84 85 86 87 |
# File 'lib/wayfarer/job.rb', line 83 def config @config ||= Wayfarer.config.clone yield(@config) if block_given? @config end |
.router(&proc) ⇒ Routing::Router Also known as: route, routes
A router. If a block is passed in, it is evaluated within the Router's instance.
92 93 94 95 96 |
# File 'lib/wayfarer/job.rb', line 92 def router(&proc) @router ||= Routing::Router.new @router.instance_eval(&proc) if block_given? @router end |
Instance Attribute Details
#adapter ⇒ Object
114 115 116 |
# File 'lib/wayfarer/job.rb', line 114 def adapter @adapter end |
#page ⇒ Object (protected)
111 |
# File 'lib/wayfarer/job.rb', line 111 attr_writer :page |
#params ⇒ Object
117 118 119 |
# File 'lib/wayfarer/job.rb', line 117 def params @params end |
#staged_uris ⇒ Array<String>, Array<URI> (readonly)
Returns URIs to stage for the next cycle.
108 109 110 |
# File 'lib/wayfarer/job.rb', line 108 def staged_uris @staged_uris end |
Class Method Details
.after_crawl ⇒ Object
Callback that fires once after all pages have been retrieved and processing is done.
40 |
# File 'lib/wayfarer/job.rb', line 40 define_hook :after_crawl |
.before_crawl ⇒ Object
Callback that fires once before any pages are retrieved.
34 |
# File 'lib/wayfarer/job.rb', line 34 define_hook :before_crawl |
.prepare ⇒ Object
Returns a class copy.
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
# File 'lib/wayfarer/job.rb', line 60 def prepare duplicate = dup duplicate.router = router.dup duplicate.locals = locals.deep_dup duplicate.config = config.dup duplicate.locals.each do |(key, val)| duplicate.locals[key] = Locals.thread_safe_counterpart(val) end duplicate.locals.each do |(key, _)| duplicate.send(:define_method, key) do duplicate.locals[key] end duplicate.send(:define_singleton_method, key) do duplicate.locals[key] end end duplicate end |
.setup_adapter {|[HTTPAdapters::NetHTTPAdapter, HTTPAdapters::SeleniumAdapter], [Selenium::WebDriver::Driver, nil], [Capybara::Selenium::Driver, nil]| ... } ⇒ Object
Callback that fires when HTTP adapters are instantiated.
46 |
# File 'lib/wayfarer/job.rb', line 46 define_hooks :setup_adapter |
Instance Method Details
#browser ⇒ Object (protected)
A Capybara driver that wraps the #driver.
206 |
# File 'lib/wayfarer/job.rb', line 206 delegate browser: :adapter |
#doc ⇒ Object (protected)
The parsed response body. When using the Selenium adapter, this parses the body again on every call. Otherwise, subsequent DOM updates (i.e. JavaScript-induced) would be invisible.
195 |
# File 'lib/wayfarer/job.rb', line 195 delegate doc: :page |
#driver ⇒ Object (protected)
The Selenium WebDriver.
201 |
# File 'lib/wayfarer/job.rb', line 201 delegate driver: :adapter |
#halt ⇒ Object (protected)
Sets a halting flag that signals the processor to stop its work.
142 143 144 |
# File 'lib/wayfarer/job.rb', line 142 def halt @halts = true end |
#halts? ⇒ Boolean
Whether this job will stop processing.
126 127 128 |
# File 'lib/wayfarer/job.rb', line 126 def halts? @halts end |
#logger ⇒ Object (protected)
209 |
# File 'lib/wayfarer/job.rb', line 209 delegate logger: :"self.class" |
#perform(*uris) ⇒ Object
ActiveJob API
Performs this job.
133 134 135 |
# File 'lib/wayfarer/job.rb', line 133 def perform(*uris) Crawl.new(self.class, *uris).execute end |
#stage(*uris) ⇒ Object (protected)
Adds URIs to process in the next cycle. If a relative path is given, an absolute URI is constructed from the current #page's URI.
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 |
# File 'lib/wayfarer/job.rb', line 150 def stage(*uris) = uris.flatten.map do |u| if (uri = URI(u)).absolute? uri else # URI#join would discard the path of page.uri.path current = page.uri.dup current.path = File.join(page.uri.path, uri.path) current end end # This method has somewhat become the guard keeper for invalid URIs that # would lead to exceptions otherwise down the line supported = .select do |uri| HTTPAdapters::NetHTTPAdapter::RECOGNIZED_URI_TYPES.any? do |type| uri.is_a?(type) end end @staged_uris.push(*supported) end |