Class: Tahweel::Processors::GoogleDrive

Inherits:
Object
  • Object
show all
Defined in:
lib/tahweel/processors/google_drive.rb

Overview

Handles the conversion of images to text using Google Drive’s OCR capabilities.

This class automates the process of:

  1. Uploading a local image to Google Drive as a Google Document.

  2. Downloading the content of that document as plain text.

  3. Cleaning up (deleting) the temporary file from Drive.

It includes robust error handling with infinite retries and exponential backoff for network issues, rate limits, and server errors.

Instance Method Summary collapse

Constructor Details

#initializeGoogleDrive

Note:

This operation performs filesystem I/O to read credentials. For bulk processing, instantiate this once and reuse it.

Initializes the Google Drive OCR service. Sets up the Google Drive API client and authorizes it using Authorizer.



26
27
28
29
30
# File 'lib/tahweel/processors/google_drive.rb', line 26

def initialize
  @service = Google::Apis::DriveV3::DriveService.new
  @service.client_options.application_name = "Tahweel"
  @service.authorization = Tahweel::Authorizer.authorize
end

Instance Method Details

#extract(file_path) ⇒ String

Extracts text from an image file using the “Upload -> Export -> Delete” flow.

The method ensures that the temporary file created on Google Drive is deleted regardless of whether the download succeeds or fails.

Parameters:

  • file_path (String)

    The path to the image file.

Returns:

  • (String)

    The extracted text.

Raises:

  • (RuntimeError)

    If the file does not exist locally.

  • (Google::Apis::Error)

    If a non-retriable API error occurs (e.g., 401, 403, 404).



41
42
43
44
45
46
47
48
49
50
# File 'lib/tahweel/processors/google_drive.rb', line 41

def extract(file_path)
  raise "File not found: #{file_path}" unless File.exist?(file_path)

  begin
    file_id = upload_file(file_path)
    download_text(file_id).gsub("\r\n", "\n").gsub("________________", "").strip
  ensure
    delete_file(file_id)
  end
end