Class: DaimonSkycrawlers::Crawler::Base
- Inherits:
-
Object
- Object
- DaimonSkycrawlers::Crawler::Base
- Includes:
- DaimonSkycrawlers::Callbacks, DaimonSkycrawlers::ConfigMixin, DaimonSkycrawlers::Configurable, LoggerMixin
- Defined in:
- lib/daimon_skycrawlers/crawler/base.rb
Overview
The base class of crawler
A crawler implementation can inherit this class and override
#fetch in the class.
Direct Known Subclasses
Instance Attribute Summary collapse
-
#n_processed_urls ⇒ Object
readonly
Returns the value of attribute n_processed_urls.
-
#storage ⇒ DaimonSkycrawlers::Storage::Base
Retrieve storage instance.
Instance Method Summary collapse
- #connection ⇒ Faraday
-
#fetch(path, message = {}) ⇒ Faraday::Response
Fetch URL.
-
#get(path, params = {}) ⇒ Faraday::Response
GET URL with params.
-
#initialize(base_url = nil, faraday_options: {}, options: {}) ⇒ Base
constructor
A new instance of Base.
-
#post(path, params = {}) ⇒ Faraday::Response
POST URL with params.
-
#prepare {|connection| ... } ⇒ Object
Call this method before DaimonSkycrawlers.register_crawler For example, you can login before fetch URL.
-
#process(message, &block) ⇒ Object
Process crawler sequence.
-
#setup_connection(options = {}) {|faraday| ... } ⇒ Object
Set up connection.
- #skipped? ⇒ true|false
Methods included from DaimonSkycrawlers::Configurable
Methods included from DaimonSkycrawlers::Callbacks
#after_process, #before_process, #clear_after_process_callbacks, #clear_before_process_callbacks, #run_after_process_callbacks, #run_before_process_callbacks
Constructor Details
#initialize(base_url = nil, faraday_options: {}, options: {}) ⇒ Base
Returns a new instance of Base.
45 46 47 48 49 50 51 52 53 54 55 56 |
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 45 def initialize(base_url = nil, faraday_options: {}, options: {}) super() @base_url = base_url = = @prepare = ->(connection) {} @skipped = false @n_processed_urls = 0 setup_default_filters setup_default_post_processes end |
Instance Attribute Details
#n_processed_urls ⇒ Object (readonly)
Returns the value of attribute n_processed_urls.
38 39 40 |
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 38 def n_processed_urls @n_processed_urls end |
#storage ⇒ DaimonSkycrawlers::Storage::Base
Retrieve storage instance
88 89 90 |
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 88 def storage @storage ||= Storage::RDB.new end |
Instance Method Details
#connection ⇒ Faraday
102 103 104 |
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 102 def connection @connection ||= Faraday.new(@base_url, ) end |
#fetch(path, message = {}) ⇒ Faraday::Response
Fetch URL
Override this method in subclass.
147 148 149 |
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 147 def fetch(path, = {}) raise NotImplementedError, "Must implement this method in subclass" end |
#get(path, params = {}) ⇒ Faraday::Response
GET URL with params
159 160 161 |
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 159 def get(path, params = {}) @connection.get(path, params) end |
#post(path, params = {}) ⇒ Faraday::Response
POST URL with params
171 172 173 |
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 171 def post(path, params = {}) @connection.post(path, params) end |
#prepare {|connection| ... } ⇒ Object
Call this method before DaimonSkycrawlers.register_crawler For example, you can login before fetch URL
79 80 81 |
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 79 def prepare(&block) @prepare = block end |
#process(message, &block) ⇒ Object
Process crawler sequence
- Run registered filters
- Prepare connection
- Download(fetch) data from given URL
- Run post processes (store downloaded data to storage)
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 116 def process(, &block) @skipped = false @n_processed_urls += 1 proceeding = run_before_process_callbacks() unless proceeding skip([:url]) return end # url can be a path url = .delete(:url) url = (URI(connection.url_prefix) + url).to_s @prepare.call(connection) response = fetch(url, , &block) data = { url: url, message: , response: response } run_after_process_callbacks(data) data end |
#setup_connection(options = {}) {|faraday| ... } ⇒ Object
Set up connection
65 66 67 68 69 70 71 |
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 65 def setup_connection( = {}) = .merge() = .empty? ? nil : @connection = Faraday.new(@base_url, ) do |faraday| yield faraday end end |
#skipped? ⇒ true|false
95 96 97 |
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 95 def skipped? @skipped end |