slasherrb

Build Status Gem Version Code Climate Coverage Status

This project is actually the ruby version of slasherjs. Slasher is a library that could extract the main content of an HTML article document. The result of extraction is depending of assumption on HTML document structure itself. Therefore, there may be flaws in the result if the document doesn't match the structure that is recognised by the library. This condition will make the library will be improved from time to time.

How To Use

To use the library, you need to have an HTML document first.

require 'net/http'
require 'slasher'

uri = URI("http://sea-games-2015.liputan6.com/read/2252937/all-indonesia-finals-ganda-putra-sumbang-emas")
html = Net::HTTP.get(uri)

slasher = Slasher.new(html)
content = slasher.slash

#content variable will have the main content of the HTML document (article).

Website Coverage

This library has been tested against some websites and you can see the complete list in this document

TODO

  1. Add more test cases: international websites
  2. Anytime I want to slash a new site, I don't need to re initialize the object.
  3. Add gem dependencies (nokogiri)
  4. Move test to travis
  5. Better information for gem