Class: DaimonSkycrawlers::Crawler::Base
- Inherits:
-
Object
- Object
- DaimonSkycrawlers::Crawler::Base
- Includes:
- DaimonSkycrawlers::Callbacks, DaimonSkycrawlers::ConfigMixin, DaimonSkycrawlers::Configurable, LoggerMixin
- Defined in:
- lib/daimon_skycrawlers/crawler/base.rb
Overview
The base class of crawler
Direct Known Subclasses
Instance Attribute Summary collapse
-
#n_processed_urls ⇒ Object
readonly
Returns the value of attribute n_processed_urls.
-
#storage ⇒ Object
Retrieve storage instance.
Instance Method Summary collapse
- #connection ⇒ Object
- #fetch(path, message = {}) ⇒ Object
- #get(path, params = {}) ⇒ Object
-
#initialize(base_url = nil, faraday_options: {}, options: {}) ⇒ Base
constructor
A new instance of Base.
- #post(path, params = {}) ⇒ Object
-
#prepare(&block) ⇒ Object
Call this method before DaimonSkycrawlers.register_crawler For example, you can login before fetch URL.
- #process(message, &block) ⇒ Object
-
#setup_connection(options = {}) {|faraday| ... } ⇒ Object
Set up connection.
- #skipped? ⇒ Boolean
Methods included from DaimonSkycrawlers::Configurable
Methods included from DaimonSkycrawlers::Callbacks
#before_process, #clear_before_process_callbacks, #run_before_callbacks
Methods included from LoggerMixin
Constructor Details
#initialize(base_url = nil, faraday_options: {}, options: {}) ⇒ Base
Returns a new instance of Base.
38 39 40 41 42 43 44 45 46 |
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 38 def initialize(base_url = nil, faraday_options: {}, options: {}) super() @base_url = base_url = = @prepare = ->(connection) {} @skipped = false @n_processed_urls = 0 end |
Instance Attribute Details
#n_processed_urls ⇒ Object (readonly)
Returns the value of attribute n_processed_urls.
32 33 34 |
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 32 def n_processed_urls @n_processed_urls end |
#storage ⇒ Object
Retrieve storage instance
74 75 76 |
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 74 def storage @storage ||= Storage::RDB.new end |
Instance Method Details
#connection ⇒ Object
82 83 84 |
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 82 def connection @connection ||= Faraday.new(@base_url, ) end |
#fetch(path, message = {}) ⇒ Object
107 108 109 |
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 107 def fetch(path, = {}) raise NotImplementedError, "Must implement this method in subclass" end |
#get(path, params = {}) ⇒ Object
111 112 113 |
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 111 def get(path, params = {}) @connection.get(path, params) end |
#post(path, params = {}) ⇒ Object
115 116 117 |
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 115 def post(path, params = {}) @connection.post(path, params) end |
#prepare(&block) ⇒ Object
Call this method before DaimonSkycrawlers.register_crawler For example, you can login before fetch URL
67 68 69 |
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 67 def prepare(&block) @prepare = block end |
#process(message, &block) ⇒ Object
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 86 def process(, &block) @skipped = false @n_processed_urls += 1 setup_default_filters proceeding = run_before_callbacks() unless proceeding @skipped = true skip([:url]) return end # url can be a path url = .delete(:url) url = (URI(connection.url_prefix) + url).to_s @prepare.call(connection) fetch(url, , &block) end |
#setup_connection(options = {}) {|faraday| ... } ⇒ Object
Set up connection
55 56 57 58 59 60 61 |
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 55 def setup_connection( = {}) = .merge() = .empty? ? nil : @connection = Faraday.new(@base_url, ) do |faraday| yield faraday end end |
#skipped? ⇒ Boolean
78 79 80 |
# File 'lib/daimon_skycrawlers/crawler/base.rb', line 78 def skipped? @skipped end |