Module: HexaPDF::Task::Optimize
- Defined in:
- lib/hexapdf/task/optimize.rb
Overview
Task for optimizing the PDF document.
For a list of optimization methods this task can perform have a look at the ::call method.
Defined Under Namespace
Classes: SerializationProcessor
Class Method Summary collapse
-
.call(doc, compact: false, object_streams: :preserve, xref_streams: :preserve, compress_pages: false) ⇒ Object
Optimizes the PDF document.
-
.compact(doc, object_streams, xref_streams) ⇒ Object
Compacts the document by merging all revisions into one, deleting null and unused entries and renumbering the objects.
-
.compress_pages(doc) ⇒ Object
Compresses the contents of all pages by parsing and then serializing again.
-
.delete_fields_with_defaults(obj) ⇒ Object
Deletes field entries of the object that are optional and currently set to their default value.
-
.process_object_streams(doc, method, xref_streams) ⇒ Object
Processes the object streams in each revision according to method: For :preserve, nothing is done, for :delete all object streams are deleted and for :generate objects are packed into object streams as much as possible.
-
.process_xref_streams(doc, method) ⇒ Object
Processes the cross-reference streams in each revision according to method: For :preserve, nothing is done, for :delete all cross-reference streams are deleted and for :generate cross-reference streams are added.
Class Method Details
.call(doc, compact: false, object_streams: :preserve, xref_streams: :preserve, compress_pages: false) ⇒ Object
Optimizes the PDF document.
The field entries that are optional and set to their default value are always deleted. Additional optimization methods are performed depending on the values of the following arguments:
- compact
-
Compacts the object space by merging the revisions and then deleting null and unused values if set to
true. - object_streams
-
Specifies if and how object streams should be used: For :preserve, existing object streams are preserved; for :generate objects are packed into object streams as much as possible; and for :delete existing object streams are deleted.
- xref_streams
-
Specifies if cross-reference streams should be used. Can be :preserve (no modifications), :generate (use cross-reference streams) or :delete (remove cross-reference streams).
If
object_streamsis set to :generate, this option is implicitly changed to :generate. - compress_pages
-
Compresses the content streams of all pages if set to
true. Note that this can take a very long time because each content stream has to be unfiltered, parsed, serialized and then filtered again.
75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
# File 'lib/hexapdf/task/optimize.rb', line 75 def self.call(doc, compact: false, object_streams: :preserve, xref_streams: :preserve, compress_pages: false) if compact compact(doc, object_streams, xref_streams) elsif object_streams != :preserve process_object_streams(doc, object_streams, xref_streams) elsif xref_streams != :preserve process_xref_streams(doc, xref_streams) else doc.each(only_current: false, &method(:delete_fields_with_defaults)) end compress_pages(doc) if compress_pages end |
.compact(doc, object_streams, xref_streams) ⇒ Object
Compacts the document by merging all revisions into one, deleting null and unused entries and renumbering the objects.
For the meaning of the other arguments see ::call.
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
# File 'lib/hexapdf/task/optimize.rb', line 94 def self.compact(doc, object_streams, xref_streams) doc.revisions.merge unused = Set.new(doc.task(:dereference)) rev = doc.revisions.add oid = 1 doc.revisions[0].each do |obj| if obj.null? || unused.include?(obj) || (obj.type == :ObjStm) || (obj.type == :XRef && xref_streams != :preserve) obj.data.value = nil next end delete_fields_with_defaults(obj) obj.oid = oid obj.gen = 0 rev.add(obj) oid += 1 end doc.revisions.delete(0) if object_streams == :generate process_object_streams(doc, :generate, xref_streams) elsif xref_streams == :generate doc.add({Type: :XRef}) end end |
.compress_pages(doc) ⇒ Object
Compresses the contents of all pages by parsing and then serializing again. The HexaPDF serializer is already optimized for small output size so nothing else needs to be done.
217 218 219 220 221 222 223 224 |
# File 'lib/hexapdf/task/optimize.rb', line 217 def self.compress_pages(doc) doc.pages.each do |page| processor = SerializationProcessor.new HexaPDF::Content::Parser.parse(page.contents, processor) page.contents = processor.result page[:Contents].set_filter(:FlateDecode) end end |
.delete_fields_with_defaults(obj) ⇒ Object
Deletes field entries of the object that are optional and currently set to their default value.
205 206 207 208 209 210 211 212 213 |
# File 'lib/hexapdf/task/optimize.rb', line 205 def self.delete_fields_with_defaults(obj) return unless obj.kind_of?(HexaPDF::Dictionary) && !obj.null? obj.each do |name, value| if (field = obj.class.field(name)) && !field.required? && field.default? && value == field.default obj.delete(name) end end end |
.process_object_streams(doc, method, xref_streams) ⇒ Object
Processes the object streams in each revision according to method: For :preserve, nothing is done, for :delete all object streams are deleted and for :generate objects are packed into object streams as much as possible.
125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 |
# File 'lib/hexapdf/task/optimize.rb', line 125 def self.process_object_streams(doc, method, xref_streams) case method when :delete doc.revisions.each_with_index do |rev, rev_index| xref_stream = false objects_to_delete = [] rev.each do |obj| case obj.type when :ObjStm objects_to_delete << obj when :XRef xref_stream = true objects_to_delete << obj if xref_streams == :delete else delete_fields_with_defaults(obj) end end objects_to_delete.each {|obj| rev.delete(obj) } if xref_streams == :generate && !xref_stream doc.add({Type: :XRef}, revision: rev_index) end end when :generate doc.revisions.each_with_index do |rev, rev_index| xref_stream = false count = 0 objstms = [doc.wrap({Type: :ObjStm})] old_objstms = [] rev.each do |obj| case obj.type when :XRef xref_stream = true when :ObjStm old_objstms << obj end delete_fields_with_defaults(obj) next if obj.respond_to?(:stream) objstms[-1].add_object(obj) count += 1 if count == 200 objstms << doc.wrap({Type: :ObjStm}) count = 0 end end old_objstms.each {|objstm| rev.delete(objstm) } objstms.each {|objstm| doc.add(objstm, revision: rev_index) } doc.add({Type: :XRef}, revision: rev_index) unless xref_stream end end end |
.process_xref_streams(doc, method) ⇒ Object
Processes the cross-reference streams in each revision according to method: For :preserve, nothing is done, for :delete all cross-reference streams are deleted and for :generate cross-reference streams are added.
181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 |
# File 'lib/hexapdf/task/optimize.rb', line 181 def self.process_xref_streams(doc, method) case method when :delete doc.each(only_current: false) do |obj, rev| if obj.type == :XRef rev.delete(obj) else delete_fields_with_defaults(obj) end end when :generate doc.revisions.each_with_index do |rev, rev_index| xref_stream = false rev.each do |obj| xref_stream = true if obj.type == :XRef delete_fields_with_defaults(obj) end doc.add({Type: :XRef}, revision: rev_index) unless xref_stream end end end |