Class: Mindee::Client

Inherits:
Object
  • Object
show all
Defined in:
lib/mindee/client.rb

Overview

Mindee API Client. See: https://developers.mindee.com/docs

Instance Method Summary collapse

Constructor Details

#initialize(api_key: '') ⇒ Client

Returns a new instance of Client.

Parameters:

  • api_key (String) (defaults to: '')


93
94
95
# File 'lib/mindee/client.rb', line 93

def initialize(api_key: '')
  @api_key = api_key
end

Instance Method Details

#create_endpoint(endpoint_name: '', account_name: '', version: '') ⇒ Mindee::HTTP::Endpoint

Creates a custom endpoint with the given values. Do not set for standard (off the shelf) endpoints.

Parameters:

  • endpoint_name (String) (defaults to: '')

    For custom endpoints, the "API name" field in the "Settings" page of the API Builder. Do not set for standard (off the shelf) endpoints.

  • account_name (String) (defaults to: '')

    For custom endpoints, your account or organization username on the API Builder. This is normally not required unless you have a custom endpoint which has the same name as a standard (off the shelf) endpoint.

  • version (String) (defaults to: '')

    For custom endpoints, version of the product

Returns:



390
391
392
393
394
395
396
397
# File 'lib/mindee/client.rb', line 390

def create_endpoint(endpoint_name: '', account_name: '', version: '')
  initialize_endpoint(
    Mindee::Product::Universal::Universal,
    endpoint_name: endpoint_name,
    account_name: ,
    version: version
  )
end

#enqueue(input_source, product_class, endpoint: nil, options: {}) ⇒ Mindee::Parsing::Common::ApiResponse

Enqueue a document for async parsing

Parameters:

  • input_source (Mindee::Input::Source::LocalInputSource, Mindee::Input::Source::URLInputSource)

    The source of the input document (local file or URL).

  • product_class (Mindee::Inference)

    The class of the product.

  • options (Hash) (defaults to: {})

    A hash of options to configure the enqueue behavior. Possible keys:

    • :endpoint [HTTP::Endpoint, nil] Endpoint of the API. Doesn't need to be set in the case of OTS APIs.
    • :all_words [bool] Whether to extract all the words on each page. This performs a full OCR operation on the server and will increase response time.
    • :full_text [bool] Whether to include the full OCR text response in compatible APIs. This performs a full OCR operation on the server and may increase response time.
    • :close_file [bool] Whether to close() the file after parsing it. Set to false if you need to access the file after this operation.
    • :page_options [Hash, nil] Page cutting/merge options:
      • :page_indexes [Array] Zero-based list of page indexes.
      • :operation [Symbol] Operation to apply on the document, given the page_indexes specified:
        • :KEEP_ONLY - keep only the specified pages, and remove all others.
        • :REMOVE - remove the specified pages, and keep all others.
      • :on_min_pages [Integer] Apply the operation only if the document has at least this many pages.
    • :cropper [bool] Whether to include cropper results for each page. This performs a cropping operation on the server and will increase response time.
    • :rag [bool] Whether to enable Retrieval-Augmented Generation. Only works if a Workflow ID is provided.
    • :workflow_id [String, nil] ID of the workflow to use.
  • endpoint (Mindee::HTTP::Endpoint) (defaults to: nil)

    Endpoint of the API.

Returns:



194
195
196
197
198
199
200
201
202
203
204
# File 'lib/mindee/client.rb', line 194

def enqueue(input_source, product_class, endpoint: nil, options: {})
  opts = normalize_parse_options(options)
  endpoint ||= initialize_endpoint(product_class)
  logger.debug("Enqueueing document as '#{endpoint.url_root}'")

  prediction, raw_http = endpoint.predict_async(
    input_source,
    opts
  )
  Mindee::Parsing::Common::ApiResponse.new(product_class, prediction, raw_http.to_json)
end

#enqueue_and_parse(input_source, product_class, endpoint, options) ⇒ Mindee::Parsing::Common::ApiResponse

Enqueue a document for async parsing and automatically try to retrieve it

Parameters:

  • input_source (Mindee::Input::Source::LocalInputSource, Mindee::Input::Source::URLInputSource)

    The source of the input document (local file or URL).

  • product_class (Mindee::Inference)

    The class of the product.

  • options (Hash)

    A hash of options to configure the parsing behavior. Possible keys:

    • :endpoint [HTTP::Endpoint, nil] Endpoint of the API. Doesn't need to be set in the case of OTS APIs.
    • :all_words [bool] Whether to extract all the words on each page. This performs a full OCR operation on the server and will increase response time.
    • :full_text [bool] Whether to include the full OCR text response in compatible APIs. This performs a full OCR operation on the server and may increase response time.
    • :close_file [bool] Whether to close() the file after parsing it. Set to false if you need to access the file after this operation.
    • :page_options [Hash, nil] Page cutting/merge options:
      • :page_indexes [Array] Zero-based list of page indexes.
      • :operation [Symbol] Operation to apply on the document, given the page_indexes specified:
        • :KEEP_ONLY - keep only the specified pages, and remove all others.
        • :REMOVE - remove the specified pages, and keep all others.
      • :on_min_pages [Integer] Apply the operation only if the document has at least this many pages.
    • :cropper [bool, nil] Whether to include cropper results for each page. This performs a cropping operation on the server and will increase response time.
    • :rag [bool] Whether to enable Retrieval-Augmented Generation. Only works if a Workflow ID is provided.
    • :workflow_id [String, nil] ID of the workflow to use.
    • :initial_delay_sec [Numeric] Initial delay before polling. Defaults to 2.
    • :delay_sec [Numeric] Delay between polling attempts. Defaults to 1.5.
    • :max_retries [Integer] Maximum number of retries. Defaults to 80.
  • endpoint (Mindee::HTTP::Endpoint)

    Endpoint of the API.

Returns:



250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
# File 'lib/mindee/client.rb', line 250

def enqueue_and_parse(input_source, product_class, endpoint, options)
  validate_async_params(options.initial_delay_sec, options.delay_sec, options.max_retries)
  enqueue_res = enqueue(input_source, product_class, endpoint: endpoint, options: options)
  job = enqueue_res.job or raise Errors::MindeeAPIError, 'Expected job to be present'
  job_id = job.id

  sleep(options.initial_delay_sec)
  polling_attempts = 1
  logger.debug("Successfully enqueued document with job id: '#{job_id}'")
  queue_res = parse_queued(job_id, product_class, endpoint: endpoint)
  queue_res_job = queue_res.job or raise Errors::MindeeAPIError, 'Expected job to be present'
  valid_statuses = [
    Mindee::Parsing::Common::JobStatus::WAITING,
    Mindee::Parsing::Common::JobStatus::PROCESSING,
  ]
  # @type var valid_statuses: Array[(:waiting | :processing | :completed | :failed)]
  while valid_statuses.include?(queue_res_job.status) && polling_attempts < options.max_retries
    logger.debug("Polling server for parsing result with job id: '#{job_id}'. Attempt #{polling_attempts}")
    sleep(options.delay_sec)
    queue_res = parse_queued(job_id, product_class, endpoint: endpoint)
    queue_res_job = queue_res.job or raise Errors::MindeeAPIError, 'Expected job to be present'
    polling_attempts += 1
  end

  if queue_res_job.status != Mindee::Parsing::Common::JobStatus::COMPLETED
    elapsed = options.initial_delay_sec + (polling_attempts * options.delay_sec.to_f)
    raise Errors::MindeeAPIError,
          "Asynchronous parsing request timed out after #{elapsed} seconds (#{polling_attempts} tries)"
  end

  queue_res
end

#execute_workflow(input_source, workflow_id, options: {}) ⇒ Mindee::Parsing::Common::WorkflowResponse

Sends a document to a workflow.

Accepts options either as a Hash or as a WorkflowOptions struct.

requiring authentication.

  • page_options [Hash, nil] Page cutting/merge options:
    • :page_indexes Zero-based list of page indexes.
      • :operation Operation to apply on the document, given the `page_indexes specified:
        • :KEEP_ONLY - keep only the specified pages, and remove all others.
        • :REMOVE - remove the specified pages, and keep all others.
      • :on_min_pages Apply the operation only if document has at least this many pages.

Parameters:

  • input_source (Mindee::Input::Source::LocalInputSource, Mindee::Input::Source::URLInputSource)
  • workflow_id (String)
  • options (Hash, WorkflowOptions) (defaults to: {})

    Options to configure workflow behavior. Possible keys:

    • document_alias [String, nil] Alias to give to the document.
    • priority [Symbol, nil] Priority to give to the document.
    • full_text [bool] Whether to include the full OCR text response in compatible APIs.
    • rag [bool, nil] Whether to enable Retrieval-Augmented Generation.

    • public_url [String, nil] A unique, encrypted URL for accessing the document validation interface without

Returns:



304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
# File 'lib/mindee/client.rb', line 304

def execute_workflow(input_source, workflow_id, options: {})
  opts = options.is_a?(WorkflowOptions) ? options : WorkflowOptions.new(params: options)
  if opts.respond_to?(:page_options) && input_source.is_a?(Input::Source::LocalInputSource)
    process_pdf_if_required(input_source, opts)
  end

  workflow_endpoint = Mindee::HTTP::WorkflowEndpoint.new(workflow_id, api_key: @api_key.to_s)
  logger.debug("Sending document to workflow '#{workflow_id}'")

  prediction, raw_http = workflow_endpoint.execute_workflow(
    input_source,
    opts
  )

  Mindee::Parsing::Common::WorkflowResponse.new(Product::Universal::Universal, prediction, raw_http)
end

#load_prediction(product_class, local_response) ⇒ Mindee::Parsing::Common::ApiResponse

Load a prediction.

Parameters:

Returns:



326
327
328
329
330
331
332
333
334
335
# File 'lib/mindee/client.rb', line 326

def load_prediction(product_class, local_response)
  raise Errors::MindeeAPIError, 'Expected LocalResponse to not be nil.' if local_response.nil?

  response_hash = local_response.as_hash || {}
  raise Errors::MindeeAPIError, 'Expected LocalResponse#as_hash to return a hash.' if response_hash.nil?

  Mindee::Parsing::Common::ApiResponse.new(product_class, response_hash, response_hash.to_json)
rescue KeyError, Errors::MindeeAPIError
  raise Errors::MindeeInputError, 'No prediction found in local response.'
end

#parse(input_source, product_class, endpoint: nil, options: {}, enqueue: true) ⇒ Mindee::Parsing::Common::ApiResponse

Enqueue a document for parsing and automatically try to retrieve it if needed.

Accepts options either as a Hash or as a ParseOptions struct.

Parameters:

  • input_source (Mindee::Input::Source::LocalInputSource, Mindee::Input::Source::URLInputSource)
  • product_class (Mindee::Inference)

    The class of the product.

  • endpoint (Mindee::HTTP::Endpoint, nil) (defaults to: nil)

    Endpoint of the API.

  • options (Hash) (defaults to: {})

    A hash of options to configure the parsing behavior. Possible keys:

    • :all_words [bool] Whether to extract all the words on each page. This performs a full OCR operation on the server and will increase response time.
    • :full_text [bool] Whether to include the full OCR text response in compatible APIs. This performs a full OCR operation on the server and may increase response time.
    • :close_file [bool] Whether to close() the file after parsing it. Set to false if you need to access the file after this operation.
    • :page_options [Hash, nil] Page cutting/merge options:
      • :page_indexes [Array] Zero-based list of page indexes.
      • :operation [Symbol] Operation to apply on the document, given the page_indexes specified:
        • :KEEP_ONLY - keep only the specified pages, and remove all others.
        • :REMOVE - remove the specified pages, and keep all others.
      • :on_min_pages [Integer] Apply the operation only if the document has at least this many pages.
    • :cropper [bool, nil] Whether to include cropper results for each page. This performs a cropping operation on the server and will increase response time.
    • :initial_delay_sec [Numeric] Initial delay before polling. Defaults to 2.
    • :delay_sec [Numeric] Delay between polling attempts. Defaults to 1.5.
    • :max_retries [Integer] Maximum number of retries. Defaults to 80.
  • enqueue (bool) (defaults to: true)

    Whether to enqueue the file.

Returns:



124
125
126
127
128
129
130
131
132
133
134
# File 'lib/mindee/client.rb', line 124

def parse(input_source, product_class, endpoint: nil, options: {}, enqueue: true)
  opts = normalize_parse_options(options)
  process_pdf_if_required(input_source, opts) if input_source.is_a?(Input::Source::LocalInputSource)
  endpoint ||= initialize_endpoint(product_class)

  if enqueue && product_class.has_async
    enqueue_and_parse(input_source, product_class, endpoint, opts)
  else
    parse_sync(input_source, product_class, endpoint, opts)
  end
end

#parse_queued(job_id, product_class, endpoint: nil) ⇒ Mindee::Parsing::Common::ApiResponse

Parses a queued document

Doesn't need to be set in the case of OTS APIs.

Parameters:

  • job_id (String)

    ID of the job (queue) to poll from

  • product_class (Mindee::Inference)

    class of the product

  • endpoint (HTTP::Endpoint, nil) (defaults to: nil)

    Endpoint of the API

Returns:



214
215
216
217
218
219
# File 'lib/mindee/client.rb', line 214

def parse_queued(job_id, product_class, endpoint: nil)
  endpoint = initialize_endpoint(product_class) if endpoint.nil?
  logger.debug("Fetching queued document as '#{endpoint.url_root}'")
  prediction, raw_http = endpoint.parse_async(job_id)
  Mindee::Parsing::Common::ApiResponse.new(product_class, prediction, raw_http.to_json)
end

#source_from_b64string(base64_string, filename, repair_pdf: false) ⇒ Mindee::Input::Source::Base64InputSource

Load a document from a base64 encoded string.

Parameters:

  • base64_string (String)

    Input to parse as base64 string

  • filename (String)

    The name of the file (without the path)

  • repair_pdf (bool) (defaults to: false)

    Attempts to fix broken pdf if true

Returns:



359
360
361
# File 'lib/mindee/client.rb', line 359

def source_from_b64string(base64_string, filename, repair_pdf: false)
  Input::Source::Base64InputSource.new(base64_string, filename, repair_pdf: repair_pdf)
end

#source_from_bytes(input_bytes, filename, repair_pdf: false) ⇒ Mindee::Input::Source::BytesInputSource

Load a document from raw bytes.

Parameters:

  • input_bytes (String)

    Encoding::BINARY byte input

  • filename (String)

    The name of the file (without the path)

  • repair_pdf (bool) (defaults to: false)

    Attempts to fix broken pdf if true

Returns:



350
351
352
# File 'lib/mindee/client.rb', line 350

def source_from_bytes(input_bytes, filename, repair_pdf: false)
  Input::Source::BytesInputSource.new(input_bytes, filename, repair_pdf: repair_pdf)
end

#source_from_file(input_file, filename, repair_pdf: false) ⇒ Mindee::Input::Source::FileInputSource

Load a document from a normal Ruby File.

Parameters:

  • input_file (File)

    Input file handle

  • filename (String)

    The name of the file (without the path)

  • repair_pdf (bool) (defaults to: false)

    Attempts to fix broken pdf if true

Returns:



368
369
370
# File 'lib/mindee/client.rb', line 368

def source_from_file(input_file, filename, repair_pdf: false)
  Input::Source::FileInputSource.new(input_file, filename, repair_pdf: repair_pdf)
end

#source_from_path(input_path, repair_pdf: false) ⇒ Mindee::Input::Source::PathInputSource

Load a document from an absolute path, as a string.

Parameters:

  • input_path (String)

    Path of file to open

  • repair_pdf (bool) (defaults to: false)

    Attempts to fix broken pdf if true

Returns:



341
342
343
# File 'lib/mindee/client.rb', line 341

def source_from_path(input_path, repair_pdf: false)
  Input::Source::PathInputSource.new(input_path, repair_pdf: repair_pdf)
end

#source_from_url(url) ⇒ Mindee::Input::Source::URLInputSource

Load a document from a secure remote source (HTTPS).

Parameters:

  • url (String)

    URL of the file

Returns:



375
376
377
# File 'lib/mindee/client.rb', line 375

def source_from_url(url)
  Input::Source::URLInputSource.new(url)
end