Class: DaimonSkycrawlers::Filter::RobotsTxtChecker
- Defined in:
- lib/daimon_skycrawlers/filter/robots_txt_checker.rb
Overview
This filter provides robots.txt checker for given URL. We want to obey robots.txt provided by a web site.
Instance Method Summary collapse
-
#call(message) ⇒ true|false
(also: #allowed?)
Return true when web site allows to fetch the URL, otherwise return false.
-
#initialize(base_url: nil, user_agent: "DaimonSkycrawlers/#{DaimonSkycrawlers::VERSION}") ⇒ RobotsTxtChecker
constructor
A new instance of RobotsTxtChecker.
Methods inherited from Base
Constructor Details
#initialize(base_url: nil, user_agent: "DaimonSkycrawlers/#{DaimonSkycrawlers::VERSION}") ⇒ RobotsTxtChecker
Returns a new instance of RobotsTxtChecker.
12 13 14 15 16 |
# File 'lib/daimon_skycrawlers/filter/robots_txt_checker.rb', line 12 def initialize(base_url: nil, user_agent: "DaimonSkycrawlers/#{DaimonSkycrawlers::VERSION}") super() @base_url = base_url @webrobots = WebRobots.new(user_agent) end |
Instance Method Details
#call(message) ⇒ true|false Also known as: allowed?
Return true when web site allows to fetch the URL, otherwise return false
22 23 24 25 |
# File 'lib/daimon_skycrawlers/filter/robots_txt_checker.rb', line 22 def call() url = normalize_url([:url]) @webrobots.allowed?(url) end |