Class: SiteMapper::Robots
- Inherits:
-
Object
- Object
- SiteMapper::Robots
- Defined in:
- lib/site_mapper/robots.rb
Overview
Based on: rubygems.org/gems/robots, v0.10.1 Provided a base URL it checks whether a given URL is allowed to be crawled according to /robots.txt.
Defined Under Namespace
Classes: ParsedRobots
Instance Method Summary collapse
-
#allowed?(uri) ⇒ Boolean
True if uri is allowed to be crawled.
-
#initialize(robots_txt, hostname, user_agent) ⇒ Robots
constructor
A new instance of Robots.
-
#other_values ⇒ Hash
Key/value pairs from robots.txt.
-
#sitemaps ⇒ Array
Array of sitemaps defined in robots.txt.
Constructor Details
#initialize(robots_txt, hostname, user_agent) ⇒ Robots
Returns a new instance of Robots.
124 125 126 127 128 129 |
# File 'lib/site_mapper/robots.rb', line 124 def initialize(robots_txt, hostname, user_agent) @robots_txt = robots_txt @hostname = hostname @user_agent = user_agent @parsed = {} end |
Instance Method Details
#allowed?(uri) ⇒ Boolean
Returns true if uri is allowed to be crawled.
136 137 138 139 140 141 |
# File 'lib/site_mapper/robots.rb', line 136 def allowed?(uri) uri = to_uri(uri) host = uri.host @parsed[host] ||= ParsedRobots.new(@robots_txt, @user_agent) @parsed[host].allowed?(uri, @user_agent) end |
#other_values ⇒ Hash
Returns key/value pairs from robots.txt.
158 159 160 161 162 |
# File 'lib/site_mapper/robots.rb', line 158 def other_values host = @hostname @parsed[host] ||= ParsedRobots.new(@robots_txt, @user_agent) @parsed[host].other_values end |
#sitemaps ⇒ Array
Returns array of sitemaps defined in robots.txt.
147 148 149 150 151 |
# File 'lib/site_mapper/robots.rb', line 147 def sitemaps host = @hostname @parsed[host] ||= ParsedRobots.new(@robots_txt, @user_agent) @parsed[host].sitemaps end |