Dependency issues are welcome to be reported in this repo at Issues section. Please include:
1. Your Operating System + architecture (Example: "Ubuntu 32 bits").
2. Full error backtrace.
3. Your ruby version (you can see it by typing "ruby -v" in your command prompt.
Janis
Janis will help you find proxy servers quickly, by grabbing them from a list of many (hopefully available and up-to-date) proxy listing websites. You can also tell Janis to parse from a specific website and it will do it if it knows how to. If it doesn't you can improve it by adding new Parsers (more on this on Usage section).
Installation
Add this line to your application's Gemfile:
gem 'janis'
And then execute:
$ bundle
Or install it yourself as:
$ gem install janis
Then download the latest version of PhantomJS from http://phantomjs.org/download.html, according to your platform.
Place the PhantomJs executable somewhere in your PATH.
On Unix, you can see your path from your shell by typing '$PATH'. Common folders to place phantomjs binary in are /usr/bin and usr/local/bin.
On Windows, you can consult your PATH from your system settings in "Environment Variables" section. C:\windows\system32\ is a common location you can place phantomjs.exe in.
Usage
From your own script/app or from irb, require the gem with:
require 'janis'
And then do:
Janis.find(max_amount_of_results)
That will gather proxy server info from all url's (and local files) included in the default source list, bringing a maximum of results specified in the argument. Note: Entries in the default source list can be disabled by commenting them out with a # at their beginning.
Extending Janis
If there's a proxy listing website you consider reliable and up-to-date which you'd like to add it to the list:
- Fork Janis repository.
Define a module file following the format shown in /specific_parsers/template.rb. There, subclass ProxyWebsiteParser and override the #parse method. Example:
class MyAwesomeProxyListParser < Janis::Parsing::WebSpecificParsers::ProxyWebsiteParser include CapybaraWithPanthomJs # optional - only if you use capybara-poltergeist for parsing def self.url # url to the proxy list website you will be parsing in the #parse method end def # optional - only if you use capybara-poltergeist for parsing Capybara.configure { |c| c.app_host = url } end def initialize super # optional - only if you use capybara-poltergeist for parsing @session = new_session # optional - only if you use capybara-poltergeist for parsing @session.visit(url) # optional - only if you use capybara-poltergeist for parsing obtain_html_doc end def parse # Your code to parse the page's content and deliver an array of strings # Those strings must have the format "IP:PORT_NUMBER" end private def obtain_html_doc # optional - Redefine the way the html document to parse is obtained if you use capybara/poltergeist @html_doc = Nokogiri.HTML(@session.html) end end
Implement #parse method so that it successfully returns an array of strings with the "IP:PORT_NUMBER" format. Example output: ["1.1.1.1:3434", "2.2.2.2:3333", "255.3.1.4: 8787"]. The implementation must use Nokogiri, our parser dependency.
Run the tests.
If all tests pass, create a pull request.
Wait for the applauses to come!
Development
After checking out the repo, run bin/setup
to install dependencies. Then, run rake test
to run the tests. You can also run bin/console
for an interactive prompt that will allow you to experiment.
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/mgiagante/janis.