Class: Spider

Inherits:

Object

Object
Spider

show all

Defined in:: lib/spider.rb

Overview

A spidering library for Ruby. Handles robots.txt, scraping, finding more links, and doing it all over again.

Constant Summary collapse

VERSION_INFO =

[0, 5, 3]

VERSION =

VERSION_INFO.map(&:to_s).join('.')

Class Method Summary collapse

.start_at(a_url, &block) ⇒ Object

Runs the spider starting at the given URL.
.version ⇒ Object

Class Method Details

.start_at(a_url, &block) ⇒ `Object`

Runs the spider starting at the given URL. Also takes a block that is given the SpiderInstance. Use the block to define the rules and handlers for the discovered Web pages. See SpiderInstance for the possible rules and handlers.

Spider.start_at('http://cashcats.biz/') do |s|
  s.add_url_check do |a_url|
    a_url =~ %r{^http://cashcats.biz.*}
  end

  s.on 404 do |a_url, resp, prior_url|
    puts "URL not found: #{a_url}"
  end

  s.on :success do |a_url, resp, prior_url|
    puts "body: #{resp.body}"
  end

  s.on :every do |a_url, resp, prior_url|
    puts "URL returned anything: #{a_url} with this code #{resp.code}"
  end
end

# File 'lib/spider.rb', line 37

def self.start_at(a_url, &block)
  rules    = RobotRules.new("Ruby Spider #{Spider::VERSION}")
  a_spider = SpiderInstance.new({nil => [a_url]}, [], rules, [])
  block.call(a_spider)
  a_spider.start!
end

.version ⇒ `Object`



10
11
12

# File 'lib/spider.rb', line 10

def self.version
  VERSION
end