Class: Tahweel::Processors::GoogleDrive
- Inherits:
-
Object
- Object
- Tahweel::Processors::GoogleDrive
- Defined in:
- lib/tahweel/processors/google_drive.rb
Overview
Handles the conversion of images to text using Google Drive’s OCR capabilities.
This class automates the process of:
-
Uploading a local image to Google Drive as a Google Document.
-
Downloading the content of that document as plain text.
-
Cleaning up (deleting) the temporary file from Drive.
It includes robust error handling with infinite retries and exponential backoff for network issues, rate limits, and server errors.
Instance Method Summary collapse
-
#extract(file_path) ⇒ String
Extracts text from an image file using the “Upload -> Export -> Delete” flow.
-
#initialize ⇒ GoogleDrive
constructor
Initializes the Google Drive OCR service.
Constructor Details
#initialize ⇒ GoogleDrive
This operation performs filesystem I/O to read credentials. For bulk processing, instantiate this once and reuse it.
Initializes the Google Drive OCR service. Sets up the Google Drive API client and authorizes it using Authorizer.
26 27 28 29 30 |
# File 'lib/tahweel/processors/google_drive.rb', line 26 def initialize @service = Google::Apis::DriveV3::DriveService.new @service..application_name = "Tahweel" @service. = Tahweel::Authorizer. end |
Instance Method Details
#extract(file_path) ⇒ String
Extracts text from an image file using the “Upload -> Export -> Delete” flow.
The method ensures that the temporary file created on Google Drive is deleted regardless of whether the download succeeds or fails.
41 42 43 44 45 46 47 48 49 50 |
# File 'lib/tahweel/processors/google_drive.rb', line 41 def extract(file_path) raise "File not found: #{file_path}" unless File.exist?(file_path) begin file_id = upload_file(file_path) download_text(file_id).gsub("\r\n", "\n").gsub("________________", "").strip ensure delete_file(file_id) end end |