Class: DaimonSkycrawlers::Filter::UpdateChecker
- Defined in:
- lib/daimon_skycrawlers/filter/update_checker.rb
Overview
This filter provides update checker for given URL.
Skip processing URLs that is latest (not updated since previous access).
Instance Method Summary collapse
-
#call(message, connection: nil) ⇒ true|false
(also: #updated?)
Return true when need update, otherwise return false.
-
#initialize(storage: nil, base_url: nil) ⇒ UpdateChecker
constructor
A new instance of UpdateChecker.
Methods inherited from Base
Methods included from LoggerMixin
Constructor Details
#initialize(storage: nil, base_url: nil) ⇒ UpdateChecker
Returns a new instance of UpdateChecker.
13 14 15 16 17 |
# File 'lib/daimon_skycrawlers/filter/update_checker.rb', line 13 def initialize(storage: nil, base_url: nil) super(storage: storage) @base_url = nil @base_url = URI(base_url) if base_url end |
Instance Method Details
#call(message, connection: nil) ⇒ true|false Also known as: updated?
Return true when need update, otherwise return false
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
# File 'lib/daimon_skycrawlers/filter/update_checker.rb', line 24 def call(, connection: nil) url = normalize_url([:url]) page = storage.find(url) return true unless page if connection response = connection.head(url) else response = Faraday.head(url) end headers = response.headers case when headers.key?("etag") && page.etag headers["etag"] != page.etag when headers.key?("last-modified") && page.last_modified_at if headers["last-modified"] < page.last_modified_at log.warn("#{url} returns old contents. #{headers["last-modified"]} < #{page.last_modified_at}") end headers["last-modified"] > page.last_modified_at else true end end |