Module: HexaPDF::Task::Optimize
- Defined in:
- lib/hexapdf/task/optimize.rb
Overview
Task for optimizing the PDF document.
For a list of optimization methods this task can perform have a look at the ::call method.
Defined Under Namespace
Classes: SerializationProcessor
Class Method Summary collapse
-
.call(doc, compact: false, object_streams: :preserve, xref_streams: :preserve, compress_pages: false) ⇒ Object
Optimizes the PDF document.
-
.compact(doc, object_streams, xref_streams) ⇒ Object
Compacts the document by merging all revisions into one, deleting null and unused entries and renumbering the objects.
-
.compress_pages(doc) ⇒ Object
Compresses the contents of all pages by parsing and then serializing again.
-
.delete_fields_with_defaults(obj) ⇒ Object
Deletes field entries of the object that are optional and currently set to their default value.
-
.process_object_streams(doc, method, xref_streams) ⇒ Object
Processes the object streams in each revision according to method: For :preserve, nothing is done, for :delete all object streams are deleted and for :generate objects are packed into object streams as much as possible.
-
.process_xref_streams(doc, method) ⇒ Object
Processes the cross-reference streams in each revision according to method: For :preserve, nothing is done, for :delete all cross-reference streams are deleted and for :generate cross-reference streams are added.
Class Method Details
.call(doc, compact: false, object_streams: :preserve, xref_streams: :preserve, compress_pages: false) ⇒ Object
Optimizes the PDF document.
The field entries that are optional and set to their default value are always deleted. Additional optimization methods are performed depending on the values of the following arguments:
- compact
-
Compacts the object space by merging the revisions and then deleting null and unused values if set to
true
. - object_streams
-
Specifies if and how object streams should be used: For :preserve, existing object streams are preserved; for :generate objects are packed into object streams as much as possible; and for :delete existing object streams are deleted.
- xref_streams
-
Specifies if cross-reference streams should be used. Can be :preserve (no modifications), :generate (use cross-reference streams) or :delete (remove cross-reference streams).
If
object_streams
is set to :generate, this option is implicitly changed to :generate. - compress_pages
-
Compresses the content streams of all pages if set to
true
. Note that this can take a very long time because each content stream has to be unfiltered, parsed, serialized and then filtered again.
72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
# File 'lib/hexapdf/task/optimize.rb', line 72 def self.call(doc, compact: false, object_streams: :preserve, xref_streams: :preserve, compress_pages: false) if compact compact(doc, object_streams, xref_streams) elsif object_streams != :preserve process_object_streams(doc, object_streams, xref_streams) elsif xref_streams != :preserve process_xref_streams(doc, xref_streams) else doc.each(current: false, &method(:delete_fields_with_defaults)) end compress_pages(doc) if compress_pages end |
.compact(doc, object_streams, xref_streams) ⇒ Object
Compacts the document by merging all revisions into one, deleting null and unused entries and renumbering the objects.
For the meaning of the other arguments see ::call.
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
# File 'lib/hexapdf/task/optimize.rb', line 91 def self.compact(doc, object_streams, xref_streams) doc.revisions.merge unused = Set.new(doc.task(:dereference)) rev = doc.revisions.add oid = 1 doc.revisions[0].each do |obj| if obj.null? || unused.include?(obj) || (obj.type == :ObjStm) || (obj.type == :XRef && xref_streams != :preserve) obj.data.value = nil next end delete_fields_with_defaults(obj) obj.oid = oid obj.gen = 0 rev.add(obj) oid += 1 end doc.revisions.delete(0) if object_streams == :generate process_object_streams(doc, :generate, xref_streams) elsif xref_streams == :generate doc.add(Type: :XRef) end end |
.compress_pages(doc) ⇒ Object
Compresses the contents of all pages by parsing and then serializing again. The HexaPDF serializer is already optimized for small output size so nothing else needs to be done.
205 206 207 208 209 210 211 212 |
# File 'lib/hexapdf/task/optimize.rb', line 205 def self.compress_pages(doc) doc.pages.each do |page| processor = SerializationProcessor.new HexaPDF::Content::Parser.parse(page.contents, processor) page.contents = processor.result page[:Contents].set_filter(:FlateDecode) end end |
.delete_fields_with_defaults(obj) ⇒ Object
Deletes field entries of the object that are optional and currently set to their default value.
193 194 195 196 197 198 199 200 201 |
# File 'lib/hexapdf/task/optimize.rb', line 193 def self.delete_fields_with_defaults(obj) return unless obj.kind_of?(HexaPDF::Dictionary) && !obj.null? obj.each do |name, value| if (field = obj.class.field(name)) && !field.required? && field.default? && value == field.default obj.delete(name) end end end |
.process_object_streams(doc, method, xref_streams) ⇒ Object
Processes the object streams in each revision according to method: For :preserve, nothing is done, for :delete all object streams are deleted and for :generate objects are packed into object streams as much as possible.
122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 |
# File 'lib/hexapdf/task/optimize.rb', line 122 def self.process_object_streams(doc, method, xref_streams) case method when :delete doc.revisions.each_with_index do |rev, rev_index| xref_stream = false rev.each do |obj| if obj.type == :ObjStm || (obj.type == :XRef && xref_streams == :delete) rev.delete(obj) else delete_fields_with_defaults(obj) end end if xref_streams == :generate && !xref_stream doc.add({Type: :XRef}, revision: rev_index) end end when :generate doc.revisions.each_with_index do |rev, rev_index| xref_stream = false count = 0 objstms = [doc.wrap(Type: :ObjStm)] rev.each do |obj| if obj.type == :XRef xref_stream = true elsif obj.type == :ObjStm rev.delete(obj) end delete_fields_with_defaults(obj) next if obj.respond_to?(:stream) objstms[-1].add_object(obj) count += 1 if count == 200 objstms << doc.wrap(Type: :ObjStm) count = 0 end end objstms.each {|objstm| doc.add(objstm, revision: rev_index)} doc.add({Type: :XRef}, revision: rev_index) unless xref_stream end end end |
.process_xref_streams(doc, method) ⇒ Object
Processes the cross-reference streams in each revision according to method: For :preserve, nothing is done, for :delete all cross-reference streams are deleted and for :generate cross-reference streams are added.
169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 |
# File 'lib/hexapdf/task/optimize.rb', line 169 def self.process_xref_streams(doc, method) case method when :delete doc.each(current: false) do |obj, rev| if obj.type == :XRef rev.delete(obj) else delete_fields_with_defaults(obj) end end when :generate doc.revisions.each_with_index do |rev, rev_index| xref_stream = false rev.each do |obj| xref_stream = true if obj.type == :XRef delete_fields_with_defaults(obj) end doc.add({Type: :XRef}, revision: rev_index) unless xref_stream end end end |