webscraping_ai

WebScrapingAI - the Ruby gem for the WebScraping.AI

A client for https://webscraping.ai API. It provides a web scaping automation API with Chrome JS rendering, rotating proxies and builtin HTML parsing.

This SDK is automatically generated by the OpenAPI Generator project:

  • API version: 2.0.0
  • Package version: 2.0.0
  • Build package: org.openapitools.codegen.languages.RubyClientCodegen For more information, please visit https://webscraping.ai

Installation

Build a gem

To build the Ruby code into a gem:

gem build webscraping_ai.gemspec

Then either install the gem locally:

gem install ./webscraping_ai-2.0.0.gem

(for development, run gem install --dev ./webscraping_ai-2.0.0.gem to install the development dependencies)

or publish the gem to a gem hosting service, e.g. RubyGems.

Finally add this to the Gemfile:

gem 'webscraping_ai', '~> 2.0.0'

Install from Git

If the Ruby gem is hosted at a git repository: https://github.com/webscraping-ai/webscraping-ai-ruby, then add the following in the Gemfile:

gem 'webscraping_ai', :git => 'https://github.com/webscraping-ai/webscraping-ai-ruby.git'

Include the Ruby code directly

Include the Ruby code directly using -I as follows:

ruby -Ilib script.rb

Getting Started

Please follow the installation procedure and then run the following code:

# Load the gem
require 'webscraping_ai'

# Setup authorization
WebScrapingAI.configure do |config|
  # Configure API key authorization: api_key
  config.api_key['api_key'] = 'YOUR API KEY'
  # Uncomment the following line to set a prefix for the API key, e.g. 'Bearer' (defaults to nil)
  #config.api_key_prefix['api_key'] = 'Bearer'
end

api_instance = WebScrapingAI::HTMLApi.new
url = 'https://example.com' # String | URL of the target page
opts = {
  headers: {'key' => '{\"Cookie\":\"session=some_id\"}'}, # Hash<String, String> | HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&headers[One]=value1&headers=[Another]=value2) or as a JSON encoded object (...&headers={\"One\": \"value1\", \"Another\": \"value2\"})
  timeout: 5000, # Integer | Maximum processing time in ms. Increase it in case of timeout errors (5000 by default, maximum is 30000)
  js: true, # Boolean | Execute on-page JavaScript using a headless browser (true by default), costs 2 requests
  proxy: 'datacenter' # String | Type of proxy, use residential proxies if your site restricts traffic from datacenters (datacenter by default)
}

begin
  #Page HTML by URL
  api_instance.get_html(url, opts)
rescue WebScrapingAI::ApiError => e
  puts "Exception when calling HTMLApi->get_html: #{e}"
end

Documentation for API Endpoints

All URIs are relative to https://api.webscraping.ai

Class Method HTTP request Description
WebScrapingAI::HTMLApi get_html GET /html Page HTML by URL
WebScrapingAI::HTMLApi post_html POST /html Page HTML by URL with POST request to the target page
WebScrapingAI::SelectedHTMLApi get_selected GET /selected HTML of a selected page area by URL and CSS selector
WebScrapingAI::SelectedHTMLApi get_selected_multiple GET /selected-multiple HTML of multiple page areas by URL and CSS selectors
WebScrapingAI::SelectedHTMLApi post_selected POST /selected HTML of a selected page areas by URL and CSS selector, with POST request to the target page
WebScrapingAI::SelectedHTMLApi post_selected_multiple POST /selected-multiple HTML of multiple page areas by URL and CSS selectors, with POST request to the target page

Documentation for Models

Documentation for Authorization

api_key

  • Type: API key
  • API key parameter name: api_key
  • Location: URL query string