Class: Firecrawl::BatchScrapeRequest
- Defined in:
- lib/firecrawl/batch_scrape_request.rb
Overview
The BatchScrapeRequest class encapsulates a batch scrape request to the Firecrawl API. After creating a new BatchScrapeRequest instance you can begin batch scraping by calling the submit method and then subsequently retrieve the results by calling the +retrieve’ method.
examples
require ‘firecrawl’
request = Firecrawl::BatchScrapeRequest.new( api_key: ENV[ ‘FIRECRAWL_API_KEY’ )
urls = [ ‘example.com’, ‘icann.org’ ] options = Firecrawl::ScrapeOptions.build do
format [ :markdown, 'screenshot@full_page' ]
only_main_content true
end
batch_response = request.submit( urls, options ) while response.success?
batch_result = batch_response.result
if batch_result.success?
batch_result.scrape_results.each do | result |
puts response.[ 'title ]
puts '---'
puts response.markdown
puts "\n\n"
end
end
break unless batch_result.status?( :scraping )
batch_response = request.retrieve( batch_result )
end
unless batch_response.success?
puts batch_response.result.error_description
end
Constant Summary
Constants inherited from Request
Instance Method Summary collapse
-
#retrieve(batch_result, &block) ⇒ Object
The
retrievemethod makes a Firecrawl ‘/batch/scrape’ GET request which will return the scrape results that were completed since the previous call to this method ( or, if this is the first call to this method, since the batch scrape was started ). -
#retrieve_all(batch_result, &block) ⇒ Object
The
retrieve_allmethod makes a Firecrawl ‘/batch/scrape’ GET request which will return the scrape results that were completed at the time of this call. -
#submit(urls, options = nil, &block) ⇒ Object
The
submitmethod makes a Firecrawl ‘/batch/scrape/id’ POST request which will initiate batch scraping of the given urls.
Methods inherited from Request
Constructor Details
This class inherits a constructor from Firecrawl::Request
Instance Method Details
#retrieve(batch_result, &block) ⇒ Object
The retrieve method makes a Firecrawl ‘/batch/scrape’ GET request which will return the scrape results that were completed since the previous call to this method ( or, if this is the first call to this method, since the batch scrape was started ). Note that there is no guarantee that there are any new batch scrape results at the time you make this call ( scrape_results may be empty ).
The response is always an instance of Faraday::Response. If response.success? is true, then response.result will be an instance BatchScrapeResult. If the request is not successful then response.result will be an instance of ErrorResult.
Remember that you should call response.success? to valida that the call to the API was successful and then response.result.success? to validate that the API processed the request successfuly.
89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
# File 'lib/firecrawl/batch_scrape_request.rb', line 89 def retrieve( batch_result, &block ) raise ArgumentError, "The first argument must be an instance of BatchScrapeResult." \ unless batch_result.is_a?( BatchScrapeResult ) response = get( batch_result.next_url, &block ) result = nil attributes = JSON.parse( response.body, symbolize_names: true ) rescue nil if response.success? attributes ||= { success: false, status: :failed } result = batch_result.merge_attributes( attributes ) else result = ErrorResult.new( response.status, attributes || {} ) end ResponseMethods.install( response, result ) end |
#retrieve_all(batch_result, &block) ⇒ Object
The retrieve_all method makes a Firecrawl ‘/batch/scrape’ GET request which will return the scrape results that were completed at the time of this call. Repeated calls to this method will retrieve the scrape results previouslly returned as well as any scrape results that have accumulated since.
Note that there is no guarantee that there are any new batch scrape results at the time you make this call ( scrape_results may be empty ).
The response is always an instance of Faraday::Response. If response.success? is true, then response.result will be an instance BatchScrapeResult. If the request is not successful then response.result will be an instance of ErrorResult.
Remember that you should call response.success? to valida that the call to the API was successful and then response.result.success? to validate that the API processed the request successfuly.
122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
# File 'lib/firecrawl/batch_scrape_request.rb', line 122 def retrieve_all( batch_result, &block ) raise ArgumentError, "The first argument must be an instance of BatchScrapeResult." \ unless batch_result.is_a?( BatchScrapeResult ) response = get( batch_result.url, &block ) result = nil attributes = JSON.parse( response.body, symbolize_names: true ) rescue nil if response.success? attributes ||= { success: false, status: :failed } # the next url should not be set by this method so that retrieve and retrieve_all do # not impact each other attributes.delete( :next ) result = batch_result.merge_attributes( attributes ) else result = ErrorResult.new( response.status, attributes || {} ) end ResponseMethods.install( response, result ) end |
#submit(urls, options = nil, &block) ⇒ Object
The submit method makes a Firecrawl ‘/batch/scrape/id’ POST request which will initiate batch scraping of the given urls.
The response is always an instance of Faraday::Response. If response.success? is true, then response.result will be an instance BatchScrapeResult. If the request is not successful then response.result will be an instance of ErrorResult.
Remember that you should call response.success? to valida that the call to the API was successful and then response.result.success? to validate that the API processed the request successfuly.
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
# File 'lib/firecrawl/batch_scrape_request.rb', line 54 def submit( urls, = nil, &block ) if = .is_a?( ScrapeOptions ) ? : ScrapeOptions.build( .to_h ) = ScrapeOptions.( .to_h ) else = {} end [ :urls ] = [ urls ].flatten response = post( "#{BASE_URI}/batch/scrape", , &block ) result = nil attributes = JSON.parse( response.body, symbolize_names: true ) rescue nil if response.success? result = BatchScrapeResult.new( attributes ) else result = ErrorResult.new( response.status, attributes || {} ) end ResponseMethods.install( response, result ) end |