Spidy

logo

Installation

Add this line to your application's Gemfile:

gem 'spidy'

And then execute:

$ bundle

Or install it yourself as:

$ gem install spidy

Usage

When used from the command line

website.rb

Spidy.defin do
  spider(as: :html) do |yielder, connector, url|
    connector.call(url) do |html|
      # html as nokogiri object ( mechanize )
      yielder.call(url)
    end
  end

  define(as: :html) do
    let(:object_name, 'nokogiri query')
  end
end
echo 'http://example.com' | spidy each website.rb > urls
cat urls | spidy call website.rb > website.json
# shorthands
echo 'http://example.com' | spidy each website.rb | spidy call website.rb | jq .

When development console

spidy console website.rb

reload source code

pry(#<Spidy::Console>)> reload!
each('http://example.com') { |url| break url }
call('http://example.com') { |html| break html } # html as nokogiri object ( mechanize )

When used from the ruby code

`` a = Spidy.define do # Implementing spiders and scrapers end

a.each(url) do |url| # Loop for the number of retrieved URLs end

a.call(url) do |object| # The scrape result is passed as a defined object end


## Development

After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).

## Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/aileron-inc/spidy. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.

## License

The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).

## Code of Conduct

Everyone interacting in the Crawler project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/aileron-inc/spidy/blob/master/CODE_OF_CONDUCT.md).