Module: Robotstxt

Defined in:
lib/robotstxt.rb,
lib/robotstxt/parser.rb

Defined Under Namespace

Classes: Parser

Constant Summary collapse

NAME =
'Robotstxt'
GEM =
'robotstxt'
AUTHORS =
['Simone Rinzivillo <[email protected]>']
VERSION =
'0.5.4'

Class Method Summary collapse

Class Method Details

.allowed?(url, robot_id) ⇒ Boolean

Check if the URL is allowed to be crawled from the current Robot_id. Robotstxt::Allowed? returns true if the robots.txt file does not block the access to the URL.

Robotstxt.allowed?('http://www.simonerinzivillo.it/', 'rubytest')

Returns:

  • (Boolean)


35
36
37
38
39
40
41
# File 'lib/robotstxt.rb', line 35

def self.allowed?(url, robot_id)
	
 u = URI.parse(url)
 r = Robotstxt::Parser.new(robot_id)
r.allowed?(url) if r.get(u.scheme + '://' + u.host)	
  
end

.sitemaps(url, robot_id) ⇒ Object

Analyze the robots.txt file to return an Array containing the list of XML Sitemaps URLs.

Robotstxt.sitemaps('http://www.simonerinzivillo.it/', 'rubytest')


47
48
49
50
51
52
53
# File 'lib/robotstxt.rb', line 47

def self.sitemaps(url, robot_id)
	
 u = URI.parse(url)
 r = Robotstxt::Parser.new(robot_id)
r.sitemaps if r.get(u.scheme + '://' + u.host)	
  
end