Class: DaimonSkycrawlers::Filter::UpdateChecker
- Defined in:
- lib/daimon_skycrawlers/filter/update_checker.rb
Overview
This filter provides update checker for given URL.
Skip processing URLs that is latest (not updated since previous access).
Instance Method Summary collapse
-
#call(message, connection: nil) ⇒ true|false
(also: #updated?)
Return true when need update, otherwise return false.
-
#initialize(storage: nil, base_url: nil) ⇒ UpdateChecker
constructor
A new instance of UpdateChecker.
Methods inherited from Base
Constructor Details
#initialize(storage: nil, base_url: nil) ⇒ UpdateChecker
Returns a new instance of UpdateChecker.
13 14 15 16 17 |
# File 'lib/daimon_skycrawlers/filter/update_checker.rb', line 13 def initialize(storage: nil, base_url: nil) super(storage: storage) @base_url = nil @base_url = URI(base_url) if base_url end |
Instance Method Details
#call(message, connection: nil) ⇒ true|false Also known as: updated?
Return true when need update, otherwise return false
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
# File 'lib/daimon_skycrawlers/filter/update_checker.rb', line 24 def call(, connection: nil) url = normalize_url([:url]) [:url] = url page = storage.read() return true unless page if connection response = connection.head(url) else response = Faraday.head(url) end headers = response.headers case when headers.key?("etag") && page.etag headers["etag"] != page.etag when headers.key?("last-modified") && page.last_modified_at if headers["last-modified"] < page.last_modified_at log.warn("#{url} returns old contents. #{headers["last-modified"]} < #{page.last_modified_at}") end headers["last-modified"] > page.last_modified_at else true end end |