robot_rules

A tool to determine if the robots.txt would prevent a given user agent from making a request to a given URI.

Example

Given the following: #!/usr/local/bin/ruby -w

require "robot_rules"
require "open-uri"

rules      = RobotRules.new("RubyQuizBrowser 1.0")
robots_url = "http://pragmaticprogrammer.com/robots.txt"

open(robots_url) do |url|
   data = url.read

   puts "/robots.txt:"
   puts data
   puts

   rules.parse(robots_url, data)
end

puts "URL tests:"
%w{ http://pragmaticprogrammer.com/images/dave.jpg
     http://pragmaticprogrammer.com/imagination }.each do |test|
   puts "rules.allowed?( #{test.inspect} )"
   puts rules.allowed?(test)
end

__END__

This script will print

/robots.txt:
User-agent:  *
Disallow:    images

URL tests:
rules.allowed?( "http://pragmaticprogrammer.com/images/dave.jpg" )
false
rules.allowed?( "http://pragmaticprogrammer.com/imagination" )
true

History

RobotRules was created by James Edward Gray II as a response to "Port a Library" Ruby Quiz #64. A few years later, Jeremy Friesen wrapped the library up into a gem and added some tests.

Copyright (c) 2009 James Edward Gray II and Jeremy Friesen. See LICENSE for details.