ArticleCrux

The gem scrapes HTML of a URL and returns the title and cover image which most likey represents the article. It also returns an array of tags. The gem can be useful in scenarios where you want to display a short summary of the article, before the end-user lands on the actual article.

Installation

Add this line to your application's Gemfile:

gem 'article_crux'

And then execute:

$ bundle

Or install it yourself as:

$ gem install article_crux

Usage


require 'article_crux'

ArticleCrux.fetch("https://techcrunch.com/2017/04/18/facebook-announces-react-fiber-a-rewrite-of-its-react-framework/")
=> {:image=>"https://tctechcrunch2011.files.wordpress.com/2017/04/image-uploaded-from-ios-1.jpg?w=764&h=400&crop=1", :title=>"Facebook announces React Fiber, a rewrite of its React framework", :tags=>["developers", "F82017", "Facebook", "Javascript", "react", "React Fiber"]}

# In case you want to pass a custom user Agent (server can whitelist specific User agents, as well as you might me blocked, and at times a server might return a different for different user agent)
user_agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X x.y; rv:10.0) Gecko/20100101 Firefox/10.0"
ArticleCrux.fetch("https://techcrunch.com/2017/04/18/facebook-announces-react-fiber-a-rewrite-of-its-react-framework/", user_agent)
=> {:image=>"https://tctechcrunch2011.files.wordpress.com/2017/04/image-uploaded-from-ios-1.jpg?w=764&h=400&crop=1", :title=>"Facebook announces React Fiber, a rewrite of its React framework", :tags=>["developers", "F82017", "Facebook", "Javascript", "react", "React Fiber"]}

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/amitsaxena/article_crux.

License

The gem is available as open source under the terms of the MIT License.