Janis
Janis will help you find proxy servers quickly, by grabbing them from a list of many (hopefully available and up-to-date) proxy listing websites. You can also tell Janis to parse from a specific website and it will do it if it knows how to. If it doesn't you can improve it by adding new Parsers (more on this on Usage section).
Installation
Add this line to your application's Gemfile:
gem 'janis'
And then execute:
$ bundle
Or install it yourself as:
$ gem install janis
Usage
From your own script/app or from irb, require the gem with:
require 'janis'
And then do:
Janis.find(max_amount_of_results)
That will gather proxy server info from all url's (and local files) included in the default source list, bringing a maximum of results specified in the argument. Note: Entries in the default source list can be disabled by commenting them out with a # at their beginning.
Extending Janis
If there's a proxy listing website you consider reliable and up-to-date which you'd like to add it to the list:
- Fork Janis repository.
Define a module file following the format shown in /specific_parsers/template.rb. There, subclass ProxyWebsiteParser and override the #parse method. Example:
class MyAwesomeProxyListParser < Janis::Parsing::WebSpecificParsers::ProxyWebsiteParser include CapybaraWithPanthomJs # optional - only if you use capybara-poltergeist for parsing def self.url # url to the proxy list website you will be parsing in the #parse method end def # optional - only if you use capybara-poltergeist for parsing Capybara.configure { |c| c.app_host = url } end def initialize super # optional - only if you use capybara-poltergeist for parsing @session = new_session # optional - only if you use capybara-poltergeist for parsing @session.visit(url) # optional - only if you use capybara-poltergeist for parsing obtain_html_doc end def parse # Your code to parse the page's content and deliver an array of strings # Those strings must have the format "IP:PORT_NUMBER" end private def obtain_html_doc # optional - Redefine the way the html document to parse is obtained if you use capybara/poltergeist @html_doc = Nokogiri.HTML(@session.html) end end
Implement #parse method so that it successfully returns an array of strings with the "IP:PORT_NUMBER" format. Example output: ["1.1.1.1:3434", "2.2.2.2:3333", "255.3.1.4: 8787"]. The implementation must use Nokogiri, our parser dependency.
Run the tests.
If all tests pass, create a pull request.
Wait for the applauses to come!
Development
After checking out the repo, run bin/setup
to install dependencies. Then, run rake test
to run the tests. You can also run bin/console
for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install
. To release a new version, update the version number in version.rb
, and then run bundle exec rake release
, which will create a git tag for the version, push git commits and tags, and push the .gem
file to rubygems.org.
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/mgiagante/janis.