Module: Bidi2pdf::TestHelpers::PDFReaderUtils
- Included in:
- Images::Extractor
- Defined in:
- lib/bidi2pdf/test_helpers/pdf_reader_utils.rb
Defined Under Namespace
Modules: InstanceMethods
Class Method Summary collapse
-
.convert_data_to_io(pdf_data) ⇒ IO
rubocop: disable Metrics/CyclomaticComplexity, Metrics/PerceivedComplexity Converts various input formats into an IO object for PDF::Reader.
- .included(base) ⇒ Object
-
.pdf_reader_for(pdf_data) ⇒ PDF::Reader
Converts the input PDF data into an IO object and initializes a PDF::Reader.
-
.pdf_text(pdf_data) ⇒ Array<String>, Object
Extracts text content from a PDF document.
Class Method Details
.convert_data_to_io(pdf_data) ⇒ IO
rubocop: disable Metrics/CyclomaticComplexity, Metrics/PerceivedComplexity Converts various input formats into an IO object for PDF::Reader.
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
# File 'lib/bidi2pdf/test_helpers/pdf_reader_utils.rb', line 51 def convert_data_to_io(pdf_data) # rubocop:disable Lint/DuplicateBranch if pdf_data.is_a?(String) && (pdf_data.start_with?("JVBERi") || pdf_data.start_with?("JVBER")) StringIO.new(Base64.decode64(pdf_data)) elsif pdf_data.start_with?("%PDF-") StringIO.new(pdf_data) elsif pdf_data.is_a?(StringIO) pdf_data elsif pdf_data.is_a?(String) && File.exist?(pdf_data) File.open(pdf_data, "rb") else StringIO.new(pdf_data) end # rubocop:enable Lint/DuplicateBranch end |
.included(base) ⇒ Object
84 85 86 |
# File 'lib/bidi2pdf/test_helpers/pdf_reader_utils.rb', line 84 def self.included(base) base.include(InstanceMethods) end |
.pdf_reader_for(pdf_data) ⇒ PDF::Reader
Converts the input PDF data into an IO object and initializes a PDF::Reader.
41 42 43 44 |
# File 'lib/bidi2pdf/test_helpers/pdf_reader_utils.rb', line 41 def pdf_reader_for(pdf_data) io = convert_data_to_io(pdf_data) PDF::Reader.new(io) end |
.pdf_text(pdf_data) ⇒ Array<String>, Object
Extracts text content from a PDF document.
This method accepts various PDF input formats and attempts to extract text content from all pages. If extraction fails due to malformed PDF data, it returns the original input.
25 26 27 28 29 30 31 32 33 34 |
# File 'lib/bidi2pdf/test_helpers/pdf_reader_utils.rb', line 25 def pdf_text(pdf_data) return pdf_data unless pdf_data.is_a?(String) || pdf_data.is_a?(StringIO) || pdf_data.is_a?(File) begin reader = pdf_reader_for pdf_data reader.pages.map(&:text) rescue PDF::Reader::MalformedPDFError [pdf_data] end end |