html2rss logo

html2rss Build Status

Request and convert an HTML document to an RSS feed via a config object. The config contains the URL to scrape and the selectors needed to extract the required information. This gem provides some extractors (e.g. extract the information from an HTML attribute).

Please always check the website's Terms of Service before if its allowed to scrape their content!

Installation

Add this line to your application's Gemfile:

gem 'html2rss'

And then execute:

$ bundle

Or install it yourself as:

$ gem install html2rss

Usage

Usage with a YAML file

Create a YAML config file. Find an example at rspec/config.test.yml.

Html2rss.feed_from_yaml_config(File.join(['spec', 'config.test.yml']), 'nuxt-releases') returns

an RSS:Rss object.

Usage in a web application

Find a minimal Sintra app which exposes your feeds to HTTP endpoints here: gildesmarais/html2rss-web

Tips and tricks

  • Check that the channel url does not redirect to a mobile page
  • fiddling with curl and pup to find the selectors seems quite efficient

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/gildesmarais/html2rss.

Changelog generation

The CHANGELOG.md can be generated automatically with standard-changelog.

License

The gem is available as open source under the terms of the MIT License.