Class: SiteMapper::Robots

Inherits:
Object
  • Object
show all
Defined in:
lib/site_mapper/robots.rb

Overview

Based on: rubygems.org/gems/robots, v0.10.1 Provided a base URL it checks whether a given URL is allowed to be crawled according to /robots.txt.

Defined Under Namespace

Classes: ParsedRobots

Instance Method Summary collapse

Constructor Details

#initialize(robots_txt, hostname, user_agent) ⇒ Robots

Returns a new instance of Robots.

Parameters:

  • robots_txt (String)

    contents of /robots.txt

  • hostname (String)

    for the passed robots_txt

  • user_agent (String)

    to check



124
125
126
127
128
129
# File 'lib/site_mapper/robots.rb', line 124

def initialize(robots_txt, hostname, user_agent)
  @robots_txt = robots_txt
  @hostname   = hostname
  @user_agent = user_agent
  @parsed     = {}
end

Instance Method Details

#allowed?(uri) ⇒ Boolean

Returns true if uri is allowed to be crawled.

Examples:

Check if www.google.com/googlesites is allowed to be crawled

robots = Robots.new('google.com', 'SiteMapper')
robots.allowed?('http://www.google.com/googlesites') # => false (as of 2014-10-22)

Parameters:

  • uri (String, URI)

    String or URI to check

Returns:

  • (Boolean)

    true if uri is allowed to be crawled



136
137
138
139
140
141
# File 'lib/site_mapper/robots.rb', line 136

def allowed?(uri)
  uri  = to_uri(uri)
  host = uri.host
  @parsed[host] ||= ParsedRobots.new(@robots_txt, @user_agent)
  @parsed[host].allowed?(uri, @user_agent)
end

#other_valuesHash

Returns key/value pairs from robots.txt.

Examples:

Get other values for google.com

robots = Robots.new('google.com', 'SiteMapper')
robots.other_values

Parameters:

  • uri (String, URI)

    String or URI get other_values from

Returns:

  • (Hash)

    key/value pairs from robots.txt



158
159
160
161
162
# File 'lib/site_mapper/robots.rb', line 158

def other_values
  host = @hostname
  @parsed[host] ||= ParsedRobots.new(@robots_txt, @user_agent)
  @parsed[host].other_values
end

#sitemapsArray

Returns array of sitemaps defined in robots.txt.

Examples:

Get sitemap for google.com

robots = Robots.new('google.com', 'SiteMapper')
robots.sitemaps

Returns:

  • (Array)

    array of sitemaps defined in robots.txt



147
148
149
150
151
# File 'lib/site_mapper/robots.rb', line 147

def sitemaps
  host = @hostname
  @parsed[host] ||= ParsedRobots.new(@robots_txt, @user_agent)
  @parsed[host].sitemaps
end