Module: Mindee::Image::ImageExtractor
- Defined in:
- lib/mindee/image/image_extractor.rb
Overview
Image Extraction wrapper class.
Class Method Summary collapse
-
.attach_image_as_new_file(input_buffer, format: 'jpg') ⇒ Origami::PDF
Attaches an image as a new page in a PdfDocument object.
-
.create_extracted_image(buffer, file_name, page_id, element_id) ⇒ Object
Generates an ExtractedImage.
-
.extract_images_from_polygons(input_source, pdf_stream, page_id, polygons) ⇒ Array<Image::ExtractedImage>
Extracts images from their positions on a file (as polygons).
-
.extract_multiple_images_from_source(input_source, page_id, polygons) ⇒ Array<Image::ExtractedImage>
Extracts multiple images from a given local input source.
-
.load_input_source_pdf_page_as_stringio(input_file, page_id) ⇒ StringIO
Loads a single_page from an image file or a pdf document.
Class Method Details
.attach_image_as_new_file(input_buffer, format: 'jpg') ⇒ Origami::PDF
Attaches an image as a new page in a PdfDocument object.
19 20 21 22 23 24 25 26 27 |
# File 'lib/mindee/image/image_extractor.rb', line 19 def self.attach_image_as_new_file(input_buffer, format: 'jpg') magick_image = MiniMagick::Image.read(input_buffer) # NOTE: We force format consolidation to a single format to avoid frames being interpreted as the final output. magick_image.format(format) original_density = magick_image.resolution scale_factor = original_density[0].to_f / 4.166666 # Convert from default 300 DPI to 72. magick_image.format('pdf', 0, { density: scale_factor.to_s }) Origami::PDF.read(StringIO.new(magick_image.to_blob)) end |
.create_extracted_image(buffer, file_name, page_id, element_id) ⇒ Object
Generates an ExtractedImage.
92 93 94 95 96 97 98 99 |
# File 'lib/mindee/image/image_extractor.rb', line 92 def self.create_extracted_image(buffer, file_name, page_id, element_id) buffer.rewind ExtractedImage.new( Input::Source::BytesInputSource.new(buffer.read.to_s, file_name), page_id, element_id ) end |
.extract_images_from_polygons(input_source, pdf_stream, page_id, polygons) ⇒ Array<Image::ExtractedImage>
Extracts images from their positions on a file (as polygons).
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
# File 'lib/mindee/image/image_extractor.rb', line 49 def self.extract_images_from_polygons(input_source, pdf_stream, page_id, polygons) extracted_elements = [] polygons.each_with_index do |polygon, element_id| polygon = ImageUtils.normalize_polygon(polygon) page_content = ImageUtils.read_page_content(pdf_stream) min_max_x = Geometry.get_min_max_x([ polygon.top_left, polygon.bottom_right, polygon.top_right, polygon.bottom_left, ]) min_max_y = Geometry.get_min_max_y([ polygon.top_left, polygon.bottom_right, polygon.top_right, polygon.bottom_left, ]) file_extension = ImageUtils.determine_file_extension(input_source) cropped_image = ImageUtils.crop_image(page_content, min_max_x, min_max_y) if file_extension == 'pdf' cropped_image.format('jpg') else cropped_image.format(file_extension.to_s) end buffer = StringIO.new ImageUtils.write_image_to_buffer(cropped_image, buffer) file_name = "#{input_source.filename}_page#{page_id}-#{element_id}.#{file_extension}" extracted_elements << create_extracted_image(buffer, file_name, page_id, element_id) end extracted_elements end |
.extract_multiple_images_from_source(input_source, page_id, polygons) ⇒ Array<Image::ExtractedImage>
Extracts multiple images from a given local input source.
35 36 37 38 39 40 |
# File 'lib/mindee/image/image_extractor.rb', line 35 def self.extract_multiple_images_from_source(input_source, page_id, polygons) new_stream = load_input_source_pdf_page_as_stringio(input_source, page_id) new_stream.seek(0) extract_images_from_polygons(input_source, new_stream, page_id, polygons) end |
.load_input_source_pdf_page_as_stringio(input_file, page_id) ⇒ StringIO
Loads a single_page from an image file or a pdf document.
106 107 108 109 110 111 112 113 |
# File 'lib/mindee/image/image_extractor.rb', line 106 def self.load_input_source_pdf_page_as_stringio(input_file, page_id) input_file.io_stream.rewind if input_file.pdf? PDF::PDFProcessor.get_page(Origami::PDF.read(input_file.io_stream), page_id) else input_file.io_stream end end |