OmniScrape
This gem is an all-purpose web crawler and scraper in the works.
Installation
Add these lines to your application's Gemfile:
gem 'omni_scrape'
And then execute:
$ bundle
Or install it yourself as:
$ gem install omni_scrape
Usage
Add the lines : require 'omni_scrape' and include OmniScrape to your script file.
Method : CrawlScrape Note: this method is currently on a back burner.
example : OmniScrape.CrawlScrape("http://en.wikipedia.org/wiki/List_of_massively_multiplayer_online_role-playing_games", 0, "http://en.wikipedia.org")
This method takes three parameters the first should be the url to start at.
The second parameter is currently unimplemented but will be the depth to crawl. (just pass it 1)
The third is a sub-url for internal links.q
Method : Localize
example : OmniScrape.Localize("https://en.wikipedia.org/wiki/List_of_massively_multiplayer_online_role-playing_games", 1, "https://en.wikipedia.org")
This method takes three parameters the first should be the url to start at.
The second parameter is the depth to crawl. ***Warning: crawling grows at an INSANE rate.
The third is a sub-url for internal links.
description: Localize will follow every link from the page provided and scrape the html from those pages, storing it as html files in subdirectories.
The pages are linked to other local pages. NOTE: Removed duplication :)
Method : Localize_CSS
example:OmniScrape.Localize_CSS("https://en.wikipedia.org/wiki/List_of_massively_multiplayer_online_role-playing_games", 1, "https://en.wikipedia.org", "div table.wikitable")
This method takes four parameters the first should be the url to start at.
The second parameter is the depth to crawl. ***Warning: crawling may grow at an INSANE rate.
The third is a sub-url for internal links.
The fourth is a css selector for what parts of all pages you want to take the links for.
description: Localize_CSS offers the same service that Localize provides while at the same time giving you the option to limit the result set using a css selector.
Method : Localize_IN
example : OmniScrape.Localize_IN("https://en.wikipedia.org/wiki/List_of_massively_multiplayer_online_role-playing_games", 1, "https://en.wikipedia.org")
This will perform the same actions as Localize, but only for internal links
Method : Localize_EX
example : OmniScrape.Localize_EX("https://en.wikipedia.org/wiki/List_of_massively_multiplayer_online_role-playing_games", 1, "https://en.wikipedia.org")
This will perform the same actions as Localize, but only for external links
Method : Localize_IN_CSS
example : OmniScrape.Localize_IN_CSS("https://en.wikipedia.org/wiki/List_of_massively_multiplayer_online_role-playing_games", 1, "https://en.wikipedia.org", "div table.wikitable")
This will perform the same actions as Localize_CSS, but only for internal links
Method : Localize_EX_CSS
example : OmniScrape.Localize_EX_CSS("https://en.wikipedia.org/wiki/List_of_massively_multiplayer_online_role-playing_games", 1, "https://en.wikipedia.org", "div table.wikitable") NOTE: There are no external links in the wikitable!
This will perform the same actions as Localize_CSS, but only for external links.
Contributing
- Fork it ( https://github.com/bmaynard1991/omni-scrape )
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create a new Pull Request