Module: HexaPDF::DocumentUtils::Files

Included in:
HexaPDF::DocumentUtils
Defined in:
lib/hexapdf/document_utils.rb

Overview

This module provides methods for managing file specification of a PDF file.

Note that for a given PDF file not all file specifications may be found, e.g. when a file specification is only a string. Therefore this module can only handle those file specifications that are indirect file specification dictionaries with the /Type key set.

Instance Method Summary collapse

Instance Method Details

#add_file(file_or_io, name: nil, description: nil, embed: true) ⇒ Object

:call-seq:

files.add_file(filename, name: File.basename(filename), description: nil, embed: true) -> file_spec
files.add_file(io, name:, description: nil)                      -> file_spec

Adds the file or IO to the PDF and returns the corresponding file specification object.

Options:

name

The name that should be used for the file path. This name is also for registering the file in the EmbeddedFiles name tree.

description

A description of the file.

embed

When an IO object is given, it is always embedded and this option is ignored.

When a filename is given and this option is true, then the file is embedded. Otherwise only a reference to it is stored.

See: HexaPDF::Type::FileSpecification



144
145
146
147
148
149
150
151
152
153
154
155
# File 'lib/hexapdf/document_utils.rb', line 144

def add_file(file_or_io, name: nil, description: nil, embed: true)
  name ||= File.basename(file_or_io) if file_or_io.kind_of?(String)
  if name.nil?
    raise ArgumentError, "The name argument is mandatory when given an IO object"
  end

  spec = @document.add(Type: :Filespec)
  spec.path = name
  spec[:Desc] = description if description
  spec.embed(file_or_io, name: name, register: true) if embed || !file_or_io.kind_of?(String)
  spec
end

#each_file(search: false) ⇒ Object

:call-seq:

files.each_file(search: false) {|file_spec| block }   -> files
files.each_file(search: false)                        -> Enumerator

Iterates over indirect file specification dictionaries of the PDF.

By default, only the file specifications in their standard locations, namely in the EmbeddedFiles name tree and in the page annotations, are returned. If the search option is true, then all indirect objects are searched for file specification dictionaries which is much slower.



167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
# File 'lib/hexapdf/document_utils.rb', line 167

def each_file(search: false)
  return to_enum(__method__, search: search) unless block_given?

  if search
    @document.each(current: false) do |obj|
      yield(obj) if obj.type == :Filespec
    end
  else
    seen = {}
    tree = @document.catalog[:Names] && @document.catalog[:Names][:EmbeddedFiles]
    tree.each_entry do |_, spec|
      seen[spec] = true
      yield(spec)
    end if tree

    @document.pages.each_page do |page|
      next unless page[:Annots]
      page[:Annots].each do |annot|
        annot = @document.deref(annot)
        next unless annot[:Subtype] == :FileAttachment
        spec = @document.deref(annot[:FS])
        yield(spec) unless seen.key?(spec)
        seen[spec] = true
      end
    end
  end

  self
end