HtmlScraper
HtmlScraper is a ruby gem that transforms the content from a html web page to a json document following a defined html template
Installation
Add this line to your application's Gemfile:
gem 'html_scraper'
And then execute:
$ bundle
Or install it yourself as:
$ gem install html_scraper
Usage
Simple html parsing
Expressions sourrounded by {{ }} will be parsed as simple json attributes:
template = '
<div class="person">
<h5>{{ surname }}</h5>
<p>{{ name }}</p>
</div>
'
html = '
<html>
<body>
<div class="person">
<h5>Eastwood</h5>
<p>Clint</p>
</div>
</body>
</html>
'
json = HtmlScraper::Scraper.new(template: template).parse(html)
The json result:
{:surname=>"Eastwood", :name=>"Clint"}
Iterative data
To parse iterative structures define the attribute hs-repeat to the html node containing the iteration:
template = '
<div id="people-list">
<div class="person" hs-repeat="people">
<h5>{{ surname }}</h5>
<p>{{ name }}</p>
</div>
</div>
'
html = '
<html>
<body>
<div id="people-list">
<div class="person">
<h5>Eastwood</h5>
<p>Clint</p>
</div>
<div class="person">
<h5>Woods</h5>
<p>James</p>
</div>
<div class="person">
<h5>Kinski</h5>
<p>Klaus</p>
</div>
</div>
</body>
</html>
'
json = HtmlScraper::Scraper.new(template: template).parse(html)
The json result:
{:people=>
[{:surname=>"Eastwood", :name=>"Clint"},
{:surname=>"Woods", :name=>"James"},
{:surname=>"Kinski", :name=>"Klaus"}]}
```
## Development
After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
## Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/html_scraper.
## License
The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).