webscraping_ai

WebScrapingAI - the Ruby gem for the WebScraping.AI

WebScraping.AI scraping API provides GPT-powered tools with Chromium JavaScript rendering, rotating proxies, and built-in HTML parsing.

This SDK is automatically generated by the OpenAPI Generator project:

API version: 3.1.3
Package version: 3.1.3
Build package: org.openapitools.codegen.languages.RubyClientCodegen For more information, please visit https://webscraping.ai

Installation

Build a gem

To build the Ruby code into a gem:

gem build webscraping_ai.gemspec

Then either install the gem locally:

gem install ./webscraping_ai-3.1.3.gem

(for development, run gem install --dev ./webscraping_ai-3.1.3.gem to install the development dependencies)

or publish the gem to a gem hosting service, e.g. RubyGems.

Finally add this to the Gemfile:

gem 'webscraping_ai', '~> 3.1.3'

Install from Git

If the Ruby gem is hosted at a git repository: https://github.com/webscraping-ai/webscraping-ai-ruby, then add the following in the Gemfile:

gem 'webscraping_ai', :git => 'https://github.com/webscraping-ai/webscraping-ai-ruby.git'

Include the Ruby code directly

Include the Ruby code directly using -I as follows:

ruby -Ilib script.rb

Getting Started

Please follow the installation procedure and then run the following code:

# Load the gem
require 'webscraping_ai'

# Setup authorization
WebScrapingAI.configure do |config|
  # Configure API key authorization: api_key
  config.api_key['api_key'] = 'YOUR API KEY'
  # Uncomment the following line to set a prefix for the API key, e.g. 'Bearer' (defaults to nil)
  # config.api_key_prefix['api_key'] = 'Bearer'
end

api_instance = WebScrapingAI::AIApi.new
url = 'https://example.com' # String | URL of the target page.
opts = {
  question: 'What is the summary of this page content?', # String | Question or instructions to ask the LLM model about the target page.
  context_limit: 4000, # Integer | Maximum number of tokens to use as context for the LLM model (4000 by default).
  response_tokens: 100, # Integer | Maximum number of tokens to return in the LLM model response. The total context size (context_limit) includes the question, the target page content and the response, so this parameter reserves tokens for the response (see also on_context_limit).
  on_context_limit: 'truncate', # String | What to do if the context_limit parameter is exceeded (truncate by default). The context is exceeded when the target page content is too long.
  headers: { key: 3.56}, # Hash<String, String> | HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&headers[One]=value1&headers=[Another]=value2) or as a JSON encoded object (...&headers={\"One\": \"value1\", \"Another\": \"value2\"}).
  timeout: 10000, # Integer | Maximum web page retrieval time in ms. Increase it in case of timeout errors (10000 by default, maximum is 30000).
  js: true, # Boolean | Execute on-page JavaScript using a headless browser (true by default).
  js_timeout: 2000, # Integer | Maximum JavaScript rendering time in ms. Increase it in case if you see a loading indicator instead of data on the target page.
  proxy: 'datacenter', # String | Type of proxy, use residential proxies if your site restricts traffic from datacenters (datacenter by default). Note that residential proxy requests are more expensive than datacenter, see the pricing page for details.
  country: 'us', # String | Country of the proxy to use (US by default). Only available on Startup and Custom plans.
  device: 'desktop', # String | Type of device emulation.
  error_on_404: false, # Boolean | Return error on 404 HTTP status on the target page (false by default).
  error_on_redirect: false, # Boolean | Return error on redirect on the target page (false by default).
  js_script: 'document.querySelector('button').click();' # String | Custom JavaScript code to execute on the target page.
}

begin
  #Get an answer to a question about a given web page
  result = api_instance.get_question(url, opts)
  p result
rescue WebScrapingAI::ApiError => e
  puts "Exception when calling AIApi->get_question: #{e}"
end

Documentation for API Endpoints

All URIs are relative to https://api.webscraping.ai

Class	Method	HTTP request	Description
WebScrapingAI::AIApi	get_question	GET /ai/question	Get an answer to a question about a given web page
WebScrapingAI::AccountApi	account	GET /account	Information about your account calls quota
WebScrapingAI::HTMLApi	get_html	GET /html	Page HTML by URL
WebScrapingAI::SelectedHTMLApi	get_selected	GET /selected	HTML of a selected page area by URL and CSS selector
WebScrapingAI::SelectedHTMLApi	get_selected_multiple	GET /selected-multiple	HTML of multiple page areas by URL and CSS selectors
WebScrapingAI::TextApi	get_text	GET /text	Page text by URL