Class: DaimonSkycrawlers::Filter::UpdateChecker

Inherits:
Base
  • Object
show all
Defined in:
lib/daimon_skycrawlers/filter/update_checker.rb

Overview

This filter provides update checker for given URL.

Skip processing URLs that is latest (not updated since previous access).

Instance Method Summary collapse

Methods inherited from Base

#storage

Constructor Details

#initialize(storage: nil, base_url: nil) ⇒ UpdateChecker

Returns a new instance of UpdateChecker.



13
14
15
16
17
# File 'lib/daimon_skycrawlers/filter/update_checker.rb', line 13

def initialize(storage: nil, base_url: nil)
  super(storage: storage)
  @base_url = nil
  @base_url = URI(base_url) if base_url
end

Instance Method Details

#call(message, connection: nil) ⇒ true|false Also known as: updated?

Return true when need update, otherwise return false

Parameters:

  • message (Hash)

    message includes :url

  • connection (Faraday) (defaults to: nil)

Returns:

  • (true|false)

    Return true when need update, otherwise return false



24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# File 'lib/daimon_skycrawlers/filter/update_checker.rb', line 24

def call(message, connection: nil)
  url = normalize_url(message[:url])
  message[:url] = url
  page = storage.read(message)
  return true unless page
  if connection
    response = connection.head(url)
  else
    response = Faraday.head(url)
  end
  headers = response.headers
  case
  when headers.key?("etag") && page.etag
    headers["etag"] != page.etag
  when headers.key?("last-modified") && page.last_modified_at
    if headers["last-modified"] < page.last_modified_at
      log.warn("#{url} returns old contents. #{headers["last-modified"]} < #{page.last_modified_at}")
    end
    headers["last-modified"] > page.last_modified_at
  else
    true
  end
end