Class: Firecrawl::CrawlRequest
- Defined in:
- lib/firecrawl/crawl_request.rb
Overview
The CrawlRequest
class encapsulates a crawl request to the Firecrawl API. After creating a new CrawlRequest
instance you can begin crawling by calling the submit
method and then subsequently retrieving the results by calling the retrieve
method.
You can also optionally cancel the crawling operation by calling cancel
.
examples
require ‘firecrawl’
request = Firecrawl::CrawlRequest.new( api_key: ENV[ ‘FIRECRAWL_API_KEY’ )
urls = ‘icann.org’ options = Firecrawl::CrawlOptions.build do
do
main_content_only true
end
end
crawl_response = request.submit( urls, options ) while crawl_response.success?
crawl_result = crawl_response.result
if crawl_result.success?
crawl_result.scrape_results.each do | result |
puts response.metadata[ 'title ]
puts '---'
puts response.markdown
puts "\n\n"
end
end
break unless crawl_result.status?( :scraping )
crawl_response = request.retrieve( crawl_result )
end
unless crawl_response.success?
puts crawl_response.result.error_description
end
Constant Summary
Constants inherited from Request
Instance Method Summary collapse
-
#cancel(crawl_result, &block) ⇒ Object
The
cancel
method makes a Firecrawl ‘/crawl/id’ DELETE request which will cancel a previouslly submitted crawl. -
#retrieve(crawl_result, &block) ⇒ Object
The
retrieve
method makes a Firecrawl ‘/crawl/id’ GET request which will return the crawl results that were completed since the previous call to this method( or, if this is the first call to this method, since the crawl was started ). -
#submit(url, options = nil, &block) ⇒ Object
The
submit
method makes a Firecrawl ‘/crawl’ POST request which will initiate crawling of the given url.
Methods inherited from Request
Constructor Details
This class inherits a constructor from Firecrawl::Request
Instance Method Details
#cancel(crawl_result, &block) ⇒ Object
The cancel
method makes a Firecrawl ‘/crawl/id’ DELETE request which will cancel a previouslly submitted crawl.
The response is always an instance of Faraday::Response
. If response.success?
is true
, then response.result
will be an instance CrawlResult
. If the request is not successful then response.result
will be an instance of ErrorResult
.
Remember that you should call response.success?
to validate that the call to the API was successful and then response.result.success?
to validate that the API processed the request successfuly.
123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
# File 'lib/firecrawl/crawl_request.rb', line 123 def cancel( crawl_result, &block ) raise ArgumentError, "The first argument must be an instance of CrawlResult." \ unless crawl_result.is_a?( CrawlResult ) response = get( crawl_result.url, &block ) result = nil attributes = JSON.parse( response.body, symbolize_names: true ) rescue nil if response.success? result = crawl_result.merge_attributes( attributes || { success: false, status: :failed } ) else result = ErrorResult.new( response.status, attributes || {} ) end ResponseMethods.install( response, result ) end |
#retrieve(crawl_result, &block) ⇒ Object
The retrieve
method makes a Firecrawl ‘/crawl/id’ GET request which will return the crawl results that were completed since the previous call to this method( or, if this is the first call to this method, since the crawl was started ). Note that there is no guarantee that there are any new crawl results at the time you make this call ( scrape_results may be empty ).
The response is always an instance of Faraday::Response
. If response.success?
is true
, then response.result
will be an instance CrawlResult
. If the request is not successful then response.result
will be an instance of ErrorResult
.
Remember that you should call response.success?
to validate that the call to the API was successful and then response.result.success?
to validate that the API processed the request successfuly.
96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
# File 'lib/firecrawl/crawl_request.rb', line 96 def retrieve( crawl_result, &block ) raise ArgumentError, "The first argument must be an instance of CrawlResult." \ unless crawl_result.is_a?( CrawlResult ) response = get( crawl_result.next_url, &block ) result = nil attributes = JSON.parse( response.body, symbolize_names: true ) rescue nil if response.success? result = crawl_result.merge_attributes( attributes || { success: false, status: :failed } ) else result = ErrorResult.new( response.status, attributes || {} ) end ResponseMethods.install( response, result ) end |
#submit(url, options = nil, &block) ⇒ Object
The submit
method makes a Firecrawl ‘/crawl’ POST request which will initiate crawling of the given url.
The response is always an instance of Faraday::Response
. If response.success?
is true, then response.result
will be an instance CrawlResult
. If the request is not successful then response.result
will be an instance of ErrorResult
.
Remember that you should call response.success?
to validr that the call to the API was successful and then response.result.success?
to validate that the API processed the request successfuly.
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
# File 'lib/firecrawl/crawl_request.rb', line 56 def submit( url, = nil, &block ) if = .is_a?( CrawlOptions ) ? : CrawlOptions.build( .to_h ) = .to_h = [ :scrapeOptions ] if [ :scrapeOptions ] = ScrapeOptions.( ) end else = {} end [ :url ] = url response = post( "#{BASE_URI}/crawl", , &block ) result = nil attributes = JSON.parse( response.body, symbolize_names: true ) rescue nil if response.success? result = CrawlResult.new( attributes ) else result = ErrorResult.new( response.status, attributes ) end ResponseMethods.install( response, result ) end |