Method: WebRobots#initialize
- Defined in:
- lib/webrobots.rb
#initialize(user_agent, options = nil) ⇒ WebRobots
Creates a WebRobots object for a robot named user_agent, with optional options.
-
:http_get => a custom method, proc, or anything that responds to .call(uri), to be used for fetching robots.txt. It must return the response body if successful, return an empty string if the resource is not found, and return nil or raise any error on failure. Redirects should be handled within this proc.
-
:crawl_delay => determines how to react to Crawl-delay directives. If
:sleepis given, WebRobots sleeps as demanded when allowed?(url)/disallowed?(url) is called. This is the default behavior. If:ignoreis given, WebRobots does nothing. If a custom method, proc, or anything that responds to .call(delay, last_checked_at), it is called.
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
# File 'lib/webrobots.rb', line 28 def initialize(user_agent, = nil) @user_agent = user_agent ||= {} @http_get = [:http_get] || method(:http_get) crawl_delay_handler = case value = [:crawl_delay] || :sleep when :ignore nil when :sleep method(:crawl_delay_handler) else if value.respond_to?(:call) value else raise ArgumentError, "invalid Crawl-delay handler: #{value.inspect}" end end @parser = RobotsTxt::Parser.new(user_agent, crawl_delay_handler) @parser_mutex = Mutex.new @robotstxt = create_cache() end |