webscraping_ai
WebScrapingAI - the Ruby gem for the WebScraping.AI
A client for https://webscraping.ai API. It provides a web scaping automation API with Chrome JS rendering, rotating proxies and builtin HTML parsing.
This SDK is automatically generated by the OpenAPI Generator project:
- API version: 2.0.0
- Package version: 2.0.0
- Build package: org.openapitools.codegen.languages.RubyClientCodegen For more information, please visit https://webscraping.ai
Installation
Build a gem
To build the Ruby code into a gem:
gem build webscraping_ai.gemspec
Then either install the gem locally:
gem install ./webscraping_ai-2.0.0.gem
(for development, run gem install --dev ./webscraping_ai-2.0.0.gem
to install the development dependencies)
or publish the gem to a gem hosting service, e.g. RubyGems.
Finally add this to the Gemfile:
gem 'webscraping_ai', '~> 2.0.0'
Install from Git
If the Ruby gem is hosted at a git repository: https://github.com/webscraping-ai/webscraping-ai-ruby, then add the following in the Gemfile:
gem 'webscraping_ai', :git => 'https://github.com/webscraping-ai/webscraping-ai-ruby.git'
Include the Ruby code directly
Include the Ruby code directly using -I
as follows:
ruby -Ilib script.rb
Getting Started
Please follow the installation procedure and then run the following code:
# Load the gem
require 'webscraping_ai'
# Setup authorization
WebScrapingAI.configure do |config|
# Configure API key authorization: api_key
config.api_key['api_key'] = 'YOUR API KEY'
# Uncomment the following line to set a prefix for the API key, e.g. 'Bearer' (defaults to nil)
#config.api_key_prefix['api_key'] = 'Bearer'
end
api_instance = WebScrapingAI::HTMLApi.new
url = 'https://example.com' # String | URL of the target page
opts = {
headers: {'key' => '{\"Cookie\":\"session=some_id\"}'}, # Hash<String, String> | HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&headers[One]=value1&headers=[Another]=value2) or as a JSON encoded object (...&headers={\"One\": \"value1\", \"Another\": \"value2\"})
timeout: 5000, # Integer | Maximum processing time in ms. Increase it in case of timeout errors (5000 by default, maximum is 30000)
js: true, # Boolean | Execute on-page JavaScript using a headless browser (true by default), costs 2 requests
proxy: 'datacenter' # String | Type of proxy, use residential proxies if your site restricts traffic from datacenters (datacenter by default)
}
begin
#Page HTML by URL
api_instance.get_html(url, opts)
rescue WebScrapingAI::ApiError => e
puts "Exception when calling HTMLApi->get_html: #{e}"
end
Documentation for API Endpoints
All URIs are relative to https://api.webscraping.ai
Class | Method | HTTP request | Description |
---|---|---|---|
WebScrapingAI::HTMLApi | get_html | GET /html | Page HTML by URL |
WebScrapingAI::HTMLApi | post_html | POST /html | Page HTML by URL with POST request to the target page |
WebScrapingAI::SelectedHTMLApi | get_selected | GET /selected | HTML of a selected page area by URL and CSS selector |
WebScrapingAI::SelectedHTMLApi | get_selected_multiple | GET /selected-multiple | HTML of multiple page areas by URL and CSS selectors |
WebScrapingAI::SelectedHTMLApi | post_selected | POST /selected | HTML of a selected page areas by URL and CSS selector, with POST request to the target page |
WebScrapingAI::SelectedHTMLApi | post_selected_multiple | POST /selected-multiple | HTML of multiple page areas by URL and CSS selectors, with POST request to the target page |
Documentation for Models
Documentation for Authorization
api_key
- Type: API key
- API key parameter name: api_key
- Location: URL query string