Class: DaimonSkycrawlers::Filter::DuplicateChecker

Inherits:
Base
  • Object
show all
Defined in:
lib/daimon_skycrawlers/filter/duplicate_checker.rb

Overview

This filter provides duplication checker for given URL.

Skip processing duplicated URLs.

Instance Method Summary collapse

Methods inherited from Base

#storage

Methods included from LoggerMixin

included

Constructor Details

#initialize(base_url: nil) ⇒ DuplicateChecker

Returns a new instance of DuplicateChecker.



12
13
14
15
16
# File 'lib/daimon_skycrawlers/filter/duplicate_checker.rb', line 12

def initialize(base_url: nil)
  @base_url = nil
  @base_url = URI(base_url) if base_url
  @urls = Set.new
end

Instance Method Details

#call(url) ⇒ true|false

Return false when duplicated, otherwise return true.

Parameters:

  • url (String)

    to check duplication. If given URL is relative URL, use ‘@base_url + url` as absolute URL.

Returns:

  • (true|false)

    Return false when duplicated, otherwise return true.



23
24
25
26
27
28
29
30
# File 'lib/daimon_skycrawlers/filter/duplicate_checker.rb', line 23

def call(url)
  unless URI(url).absolute?
    url = (@base_url + url).to_s
  end
  return false if @urls.include?(url)
  @urls << url
  true
end

#duplicated?(url) ⇒ true|false

Return true when duplicated, otherwise return false.

Parameters:

  • url (String)

    to check duplication. If given URL is relative URL, use ‘@base_url + url` as absolute URL.

Returns:

  • (true|false)

    Return true when duplicated, otherwise return false.



37
38
39
# File 'lib/daimon_skycrawlers/filter/duplicate_checker.rb', line 37

def duplicated?(url)
  !call(url)
end