Class: DaimonSkycrawlers::Filter::RobotsTxtChecker

Inherits:
Base
  • Object
show all
Defined in:
lib/daimon_skycrawlers/filter/robots_txt_checker.rb

Overview

This filter provides robots.txt checker for given URL. We want to obey robots.txt provided by a web site.

Instance Method Summary collapse

Methods inherited from Base

#storage

Constructor Details

#initialize(base_url: nil, user_agent: "DaimonSkycrawlers/#{DaimonSkycrawlers::VERSION}") ⇒ RobotsTxtChecker

Returns a new instance of RobotsTxtChecker.



12
13
14
15
16
# File 'lib/daimon_skycrawlers/filter/robots_txt_checker.rb', line 12

def initialize(base_url: nil, user_agent: "DaimonSkycrawlers/#{DaimonSkycrawlers::VERSION}")
  super()
  @base_url = base_url
  @webrobots = WebRobots.new(user_agent)
end

Instance Method Details

#call(message) ⇒ true|false Also known as: allowed?

Return true when web site allows to fetch the URL, otherwise return false

Parameters:

  • message (Hash)

    check given URL is allowed or not by robots.txt

Returns:

  • (true|false)

    Return true when web site allows to fetch the URL, otherwise return false



22
23
24
25
# File 'lib/daimon_skycrawlers/filter/robots_txt_checker.rb', line 22

def call(message)
  url = normalize_url(message[:url])
  @webrobots.allowed?(url)
end