Method: Mindee::Client#enqueue_and_parse

Defined in:
lib/mindee/client.rb

#enqueue_and_parse(input_source, product_class, endpoint, options) ⇒ Mindee::Parsing::Common::ApiResponse

Enqueue a document for async parsing and automatically try to retrieve it

Parameters:

  • input_source (Mindee::Input::Source::LocalInputSource, Mindee::Input::Source::URLInputSource)

    The source of the input document (local file or URL).

  • product_class (Mindee::Inference)

    The class of the product.

  • options (Hash)

    A hash of options to configure the parsing behavior. Possible keys:

    • :endpoint [HTTP::Endpoint, nil] Endpoint of the API. Doesn't need to be set in the case of OTS APIs.
    • :all_words [bool] Whether to extract all the words on each page. This performs a full OCR operation on the server and will increase response time.
    • :full_text [bool] Whether to include the full OCR text response in compatible APIs. This performs a full OCR operation on the server and may increase response time.
    • :close_file [bool] Whether to close() the file after parsing it. Set to false if you need to access the file after this operation.
    • :page_options [Hash, nil] Page cutting/merge options:
      • :page_indexes [Array] Zero-based list of page indexes.
      • :operation [Symbol] Operation to apply on the document, given the page_indexes specified:
        • :KEEP_ONLY - keep only the specified pages, and remove all others.
        • :REMOVE - remove the specified pages, and keep all others.
      • :on_min_pages [Integer] Apply the operation only if the document has at least this many pages.
    • :cropper [bool, nil] Whether to include cropper results for each page. This performs a cropping operation on the server and will increase response time.
    • :rag [bool] Whether to enable Retrieval-Augmented Generation. Only works if a Workflow ID is provided.
    • :workflow_id [String, nil] ID of the workflow to use.
    • :initial_delay_sec [Numeric] Initial delay before polling. Defaults to 2.
    • :delay_sec [Numeric] Delay between polling attempts. Defaults to 1.5.
    • :max_retries [Integer] Maximum number of retries. Defaults to 80.
  • endpoint (Mindee::HTTP::Endpoint)

    Endpoint of the API.

Returns:



250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
# File 'lib/mindee/client.rb', line 250

def enqueue_and_parse(input_source, product_class, endpoint, options)
  validate_async_params(options.initial_delay_sec, options.delay_sec, options.max_retries)
  enqueue_res = enqueue(input_source, product_class, endpoint: endpoint, options: options)
  job = enqueue_res.job or raise Errors::MindeeAPIError, 'Expected job to be present'
  job_id = job.id

  sleep(options.initial_delay_sec)
  polling_attempts = 1
  logger.debug("Successfully enqueued document with job id: '#{job_id}'")
  queue_res = parse_queued(job_id, product_class, endpoint: endpoint)
  queue_res_job = queue_res.job or raise Errors::MindeeAPIError, 'Expected job to be present'
  valid_statuses = [
    Mindee::Parsing::Common::JobStatus::WAITING,
    Mindee::Parsing::Common::JobStatus::PROCESSING,
  ]
  # @type var valid_statuses: Array[(:waiting | :processing | :completed | :failed)]
  while valid_statuses.include?(queue_res_job.status) && polling_attempts < options.max_retries
    logger.debug("Polling server for parsing result with job id: '#{job_id}'. Attempt #{polling_attempts}")
    sleep(options.delay_sec)
    queue_res = parse_queued(job_id, product_class, endpoint: endpoint)
    queue_res_job = queue_res.job or raise Errors::MindeeAPIError, 'Expected job to be present'
    polling_attempts += 1
  end

  if queue_res_job.status != Mindee::Parsing::Common::JobStatus::COMPLETED
    elapsed = options.initial_delay_sec + (polling_attempts * options.delay_sec.to_f)
    raise Errors::MindeeAPIError,
          "Asynchronous parsing request timed out after #{elapsed} seconds (#{polling_attempts} tries)"
  end

  queue_res
end