Module: HexaPDF::Task::Optimize

Defined in:
lib/hexapdf/task/optimize.rb

Overview

Task for optimizing the PDF document.

For a list of optimization methods this task can perform have a look at the ::call method.

Defined Under Namespace

Classes: SerializationProcessor

Class Method Summary collapse

Class Method Details

.call(doc, compact: false, object_streams: :preserve, xref_streams: :preserve, compress_pages: false) ⇒ Object

Optimizes the PDF document.

The field entries that are optional and set to their default value are always deleted. Additional optimization methods are performed depending on the values of the following arguments:

compact

Compacts the object space by merging the revisions and then deleting null and unused values if set to true.

object_streams

Specifies if and how object streams should be used: For :preserve, existing object streams are preserved; for :generate objects are packed into object streams as much as possible; and for :delete existing object streams are deleted.

xref_streams

Specifies if cross-reference streams should be used. Can be :preserve (no modifications), :generate (use cross-reference streams) or :delete (remove cross-reference streams).

If object_streams is set to :generate, this option is implicitly changed to :generate.

compress_pages

Compresses the content streams of all pages if set to true. Note that this can take a very long time because each content stream has to be unfiltered, parsed, serialized and then filtered again.



72
73
74
75
76
77
78
79
80
81
82
83
84
85
# File 'lib/hexapdf/task/optimize.rb', line 72

def self.call(doc, compact: false, object_streams: :preserve, xref_streams: :preserve,
              compress_pages: false)
  if compact
    compact(doc, object_streams, xref_streams)
  elsif object_streams != :preserve
    process_object_streams(doc, object_streams, xref_streams)
  elsif xref_streams != :preserve
    process_xref_streams(doc, xref_streams)
  else
    doc.each(current: false, &method(:delete_fields_with_defaults))
  end

  compress_pages(doc) if compress_pages
end

.compact(doc, object_streams, xref_streams) ⇒ Object

Compacts the document by merging all revisions into one, deleting null and unused entries and renumbering the objects.

For the meaning of the other arguments see ::call.



91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
# File 'lib/hexapdf/task/optimize.rb', line 91

def self.compact(doc, object_streams, xref_streams)
  doc.revisions.merge
  unused = Set.new(doc.task(:dereference))
  rev = doc.revisions.add

  oid = 1
  doc.revisions[0].each do |obj|
    if obj.null? || unused.include?(obj) || (obj.type == :ObjStm) ||
        (obj.type == :XRef && xref_streams != :preserve)
      obj.data.value = nil
      next
    end

    delete_fields_with_defaults(obj)
    obj.oid = oid
    obj.gen = 0
    rev.add(obj)
    oid += 1
  end
  doc.revisions.delete(0)

  if object_streams == :generate
    process_object_streams(doc, :generate, xref_streams)
  elsif xref_streams == :generate
    doc.add(Type: :XRef)
  end
end

.compress_pages(doc) ⇒ Object

Compresses the contents of all pages by parsing and then serializing again. The HexaPDF serializer is already optimized for small output size so nothing else needs to be done.



205
206
207
208
209
210
211
212
# File 'lib/hexapdf/task/optimize.rb', line 205

def self.compress_pages(doc)
  doc.pages.each do |page|
    processor = SerializationProcessor.new
    HexaPDF::Content::Parser.parse(page.contents, processor)
    page.contents = processor.result
    page[:Contents].set_filter(:FlateDecode)
  end
end

.delete_fields_with_defaults(obj) ⇒ Object

Deletes field entries of the object that are optional and currently set to their default value.



193
194
195
196
197
198
199
200
201
# File 'lib/hexapdf/task/optimize.rb', line 193

def self.delete_fields_with_defaults(obj)
  return unless obj.kind_of?(HexaPDF::Dictionary) && !obj.null?
  obj.each do |name, value|
    if (field = obj.class.field(name)) && !field.required? && field.default? &&
        value == field.default
      obj.delete(name)
    end
  end
end

.process_object_streams(doc, method, xref_streams) ⇒ Object

Processes the object streams in each revision according to method: For :preserve, nothing is done, for :delete all object streams are deleted and for :generate objects are packed into object streams as much as possible.



122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
# File 'lib/hexapdf/task/optimize.rb', line 122

def self.process_object_streams(doc, method, xref_streams)
  case method
  when :delete
    doc.revisions.each_with_index do |rev, rev_index|
      xref_stream = false
      rev.each do |obj|
        if obj.type == :ObjStm || (obj.type == :XRef && xref_streams == :delete)
          rev.delete(obj)
        else
          delete_fields_with_defaults(obj)
        end
      end
      if xref_streams == :generate && !xref_stream
        doc.add({Type: :XRef}, revision: rev_index)
      end
    end
  when :generate
    doc.revisions.each_with_index do |rev, rev_index|
      xref_stream = false
      count = 0
      objstms = [doc.wrap(Type: :ObjStm)]
      rev.each do |obj|
        if obj.type == :XRef
          xref_stream = true
        elsif obj.type == :ObjStm
          rev.delete(obj)
        end
        delete_fields_with_defaults(obj)

        next if obj.respond_to?(:stream)

        objstms[-1].add_object(obj)
        count += 1
        if count == 200
          objstms << doc.wrap(Type: :ObjStm)
          count = 0
        end
      end
      objstms.each {|objstm| doc.add(objstm, revision: rev_index)}
      doc.add({Type: :XRef}, revision: rev_index) unless xref_stream
    end
  end
end

.process_xref_streams(doc, method) ⇒ Object

Processes the cross-reference streams in each revision according to method: For :preserve, nothing is done, for :delete all cross-reference streams are deleted and for :generate cross-reference streams are added.



169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
# File 'lib/hexapdf/task/optimize.rb', line 169

def self.process_xref_streams(doc, method)
  case method
  when :delete
    doc.each(current: false) do |obj, rev|
      if obj.type == :XRef
        rev.delete(obj)
      else
        delete_fields_with_defaults(obj)
      end
    end
  when :generate
    doc.revisions.each_with_index do |rev, rev_index|
      xref_stream = false
      rev.each do |obj|
        xref_stream = true if obj.type == :XRef
        delete_fields_with_defaults(obj)
      end
      doc.add({Type: :XRef}, revision: rev_index) unless xref_stream
    end
  end
end