Class: Spider
- Inherits:
-
Object
- Object
- Spider
- Defined in:
- lib/spider.rb
Overview
A spidering library for Ruby. Handles robots.txt, scraping, finding more links, and doing it all over again.
Constant Summary collapse
- VERSION_INFO =
[0, 5, 3]
- VERSION =
VERSION_INFO.map(&:to_s).join('.')
Class Method Summary collapse
-
.start_at(a_url, &block) ⇒ Object
Runs the spider starting at the given URL.
- .version ⇒ Object
Class Method Details
.start_at(a_url, &block) ⇒ Object
Runs the spider starting at the given URL. Also takes a block that is given the SpiderInstance. Use the block to define the rules and handlers for the discovered Web pages. See SpiderInstance for the possible rules and handlers.
Spider.start_at('http://cashcats.biz/') do |s|
s.add_url_check do |a_url|
a_url =~ %r{^http://cashcats.biz.*}
end
s.on 404 do |a_url, resp, prior_url|
puts "URL not found: #{a_url}"
end
s.on :success do |a_url, resp, prior_url|
puts "body: #{resp.body}"
end
s.on :every do |a_url, resp, prior_url|
puts "URL returned anything: #{a_url} with this code #{resp.code}"
end
end
37 38 39 40 41 42 |
# File 'lib/spider.rb', line 37 def self.start_at(a_url, &block) rules = RobotRules.new("Ruby Spider #{Spider::VERSION}") a_spider = SpiderInstance.new({nil => [a_url]}, [], rules, []) block.call(a_spider) a_spider.start! end |
.version ⇒ Object
10 11 12 |
# File 'lib/spider.rb', line 10 def self.version VERSION end |