TinyGrabber

Gem Version

The TinyGrabber library is used for grabbing remote websites.

Installation

Add this line to your application's Gemfile:

gem 'tiny_grabber'

And then execute:

$ bundle

Or install it yourself as:

$ gem install tiny_grabber

Usage

#! /usr/bin/env ruby

require 'tiny_grabber'


# Initialize request setting

# Set request timelive
read_timeout = 300

# You can set own UserAgent, but by default each request get random UserAgent from list of most popular
user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36'

# Set proxy for concealment your real IP
# ip(required argument) - String format [0-9]+\.[0-9]+\.[0-9]+\.
# port(required argument) - Integer
# type - Connect type `http` or `socks`
proxy = { ip: 'xx.xx.xx.xx', port: 'xxxx', type: '...' }

# Set Net::HTTP headers
headers = { 'Content-Type' => 'text/html; charset=utf-8' }

# You can set own cookies like String or Hash
cookies = 'username=username&password=password'
cookies = { username: 'username', password: 'password' }

# For POST request you can set DATAS
params = { key: 'value' }

# Initialize TinyGrabber object
tg = TinyGrabber.new


# Set debug configuration
# active - Flag to save log information
# destination - Save log to file or print: [:file, :print]
# save_html - Flag to save response html to file
tg.debug = { active: true, destination: :file, save_html: true }

# Set debug flag for activate debug with default configuration { active: true, destination: :print, save_html: false }
tg.debug = true

# Set max time to execute request
tg.read_timeout = read_timeout

# Set web browser name
tg.user_agent = user_agent

# Set proxy configuration
tg.proxy = proxy

# Set basic authentification
tg.basic_auth('username', 'password')

# Set HTTP headers
tg.headers = headers

# Set HTTP cookies
tg.cookies = cookies

# Set SSL verify_mode.
# By default use OpenSSL::SSL::VERIFY_NONE
tg.verify_mode = OpenSSL::SSL::VERIFY_NONE


# Make request

# Make response with GET method
response = tg.get 'https://whoer.net/ru', headers

# Reset headers and cookies
tg.reset

# Make response with POST method
response = tg.post 'https://whoer.net/ru', params, headers

# Make singleton response with GET method
response = TinyGrabber.get 'https://whoer.net/ru', { debug = true, read_timeout = read_timeout ... }

# Make singleton response with POST method
response = TinyGrabber.post 'https://whoer.net/ru', params, { debug = true, read_timeout = read_timeout ... }


# Get response

# Get Nokogiri object from response HTML
ng = response.ng

# Get HTTP response code
response.code

# Get response cookies
response.cookies

# Get response headers
response.headers

# Get response HTML
response.body

# Get latest request URI
response.uri
tg.uri

Changelog

  • v 0.4.0
    • Change reguired ruby version
  • v 0.3.8
    • Added the perfect url. This attribute skips the conversion url
  • v 0.3.7
    • Compare body encode with UTF-8. Responce nokogiri with UTf-8 encode.
  • v 0.3.6
    • Add nokogumbo gem for work with HTML5 content. Now ng method responce nokogumbo object.
  • v 0.3.4
    • Return URI of last request was added
  • v 0.3.3
    • Format cookies was changed
  • v 0.3.2
    • Save cookies and headers in 302 unswer code was added
  • v 0.3.1
    • Remove anchor from url
  • v 0.3.0
    • ---
  • v 0.2.9
    • Added agent attribute for redirect follow location
    • Used 302 http answer code and header location for redirecting
    • Used meta refresh url
    • Refactored code for rubocop
  • v 0.2.8
    • Added processing Accept headers
  • v 0.2.7
    • Added verify_mode configuration attribute. By default use OpenSSL::SSL::VERIFY_NONE
  • v 0.2.6
    • Move read_timeout param to agent start method
  • v 0.2.5
    • Added auto convert params to symbol Now you can set cookies with hash cookies = { "username" => 'username', "password" => 'password' }
  • v 0.2.4
    • Added debug file
  • v 0.2.3
    • The feature to set cookies in the form of a Hash is added
  • v 0.2.2
    • Added debug configurations.
  • v 0.2.1
    • Setting random user_agent from list if it not seted
    • Remove headers attribute from singleton methods
    • Remove header transfer-encoding for chain requests
    • Add reset method for delete headers and cookies
  • v 0.2.0
    • Now there is an opportunity to create object TinyGrabber
    • Change order of parameters for singleton request
    • Add response cookies and headers
    • Add debug flag for detilazition log and save result HTML to /log/*.html file
  • v 0.1.1
    • Save cookie in Redis
  • v 0.1.0
    • Add TinyGrabber.post method for HTTP POST request
  • v 0.0.7
    • Add POST request
    • Add Basic Authentication
  • v 0.0.6
    • Add Net::HTTPOK modify file for Nokogiri response
  • v 0.0.5
    • Fix work with non ascii url
    • Add new ng response method for getting Nokogiri object
  • v 0.0.4
    • Fix work with socks4(5) proxy

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Dependencies

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/moroznoeytpo/tiny_grabber. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

Authors

Copyright © 2016 by Aleksandr Chernyshov ([email protected])

License

The gem is available as open source under the terms of the MIT License.

Gem created by quickleft tutorial