Class: HexaPDF::Serializer

Inherits:
Object
  • Object
show all
Defined in:
lib/hexapdf/serializer.rb

Overview

Knows how to serialize Ruby objects for a PDF file.

For normal serialization purposes, the #serialize or #serialize_to_io methods should be used. However, if the type of the object to be serialized is known, a specialized serialization method like #serialize_float can be used.

Additionally, an object for encrypting strings and streams while serializing can be set via the #encrypter= method. The assigned object has to respond to #encrypt_string(str, ind_obj) (where the string is part of the indirect object; returns the encrypted string) and #encrypt_stream(stream) (returns a fiber that represents the encrypted stream).

How This Class Works

The main public interface consists of the #serialize and #serialize_to_io methods which accept an object and return its serialized form. During serialization of this object it is accessible by individual serialization methods via the @object instance variable (useful if the object is a composed object).

Internally, the #__serialize method is used for invoking the correct serialization method based on the class of a given object. It is also used for serializing individual parts of a composed object.

Therefore the serializer contains one serialization method for each class it needs to serialize. The naming scheme of these methods is based on the class name: The full class name is converted to lowercase, the namespace separator ‘::’ is replaced with a single underscore and the string “serialize_” is then prepended.

Examples:

NilClass                 => serialize_nilclass
TrueClass                => serialize_trueclass
HexaPDF::Object          => serialize_hexapdf_object

If no serialization method for a specific class is found, the ancestors classes are tried.

See: PDF1.7 s7.3

Constant Summary collapse

NAME_SUBSTS =

The regexp matches all characters that need to be escaped and the substs hash contains the mapping from these characters to their escaped form.

See PDF1.7 s7.3.5

{}
NAME_REGEXP =

:nodoc:

/[^!-~&&[^##{Regexp.escape(Tokenizer::DELIMITER)}#{Regexp.escape(Tokenizer::WHITESPACE)}]]/
NAME_CACHE =

:nodoc:

Utils::LRUCache.new(1000)
BYTE_IS_DELIMITER =

:nodoc:

{40 => true, 47 => true, 60 => true, 91 => true, # :nodoc:
41 => true, 62 => true, 93 => true}.freeze
STRING_ESCAPE_MAP =

:nodoc:

{"(" => "\\(", ")" => "\\)", "\\" => "\\\\", "\r" => "\\r"}.freeze

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initializeSerializer

Creates a new Serializer object.



90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
# File 'lib/hexapdf/serializer.rb', line 90

def initialize
  @dispatcher = {
    Hash => 'serialize_hash',
    Array => 'serialize_array',
    Symbol => 'serialize_symbol',
    String => 'serialize_string',
    Integer => 'serialize_integer',
    Float => 'serialize_float',
    Time => 'serialize_time',
    TrueClass => 'serialize_trueclass',
    FalseClass => 'serialize_falseclass',
    NilClass => 'serialize_nilclass',
    HexaPDF::Reference => 'serialize_hexapdf_reference',
    HexaPDF::Object => 'serialize_hexapdf_object',
    HexaPDF::Stream => 'serialize_hexapdf_stream',
    HexaPDF::Dictionary => 'serialize_hexapdf_object',
    HexaPDF::PDFArray => 'serialize_hexapdf_object',
    HexaPDF::Rectangle => 'serialize_hexapdf_object',
  }
  @dispatcher.default_proc = lambda do |h, klass|
    h[klass] = if klass <= HexaPDF::Stream
                 "serialize_hexapdf_stream"
               elsif klass <= HexaPDF::Object
                 "serialize_hexapdf_object"
               else
                 method = nil
                 klass.ancestors.each do |ancestor_klass|
                   name = ancestor_klass.name.to_s.downcase
                   name.gsub!(/::/, '_')
                   method = "serialize_#{name}"
                   break if respond_to?(method, true)
                 end
                 method
               end
  end
  @encrypter = false
  @io = nil
  @object = nil
  @in_object = false
end

Instance Attribute Details

#encrypterObject

The encrypter to use for encrypting strings and streams. If nil, strings and streams are not encrypted.

Default: nil



87
88
89
# File 'lib/hexapdf/serializer.rb', line 87

def encrypter
  @encrypter
end

Instance Method Details

#serialize(obj) ⇒ Object

Returns the serialized form of the given object.

For developers: While the object is serialized, methods can use the instance variable



135
136
137
138
139
140
# File 'lib/hexapdf/serializer.rb', line 135

def serialize(obj)
  @object = obj
  __serialize(obj)
ensure
  @object = nil
end

#serialize_array(obj) ⇒ Object

Serializes an Array object.

See: PDF1.7 s7.3.6



226
227
228
229
230
231
232
233
234
235
236
237
# File 'lib/hexapdf/serializer.rb', line 226

def serialize_array(obj)
  str = +"["
  index = 0
  while index < obj.size
    tmp = __serialize(obj[index])
    str << " " unless BYTE_IS_DELIMITER[tmp.getbyte(0)] ||
      BYTE_IS_DELIMITER[str.getbyte(-1)]
    str << tmp
    index += 1
  end
  str << "]"
end

#serialize_date(obj) ⇒ Object

See: #serialize_time



291
292
293
# File 'lib/hexapdf/serializer.rb', line 291

def serialize_date(obj)
  serialize_time(obj.to_time)
end

#serialize_datetime(obj) ⇒ Object

See: #serialize_time



296
297
298
# File 'lib/hexapdf/serializer.rb', line 296

def serialize_datetime(obj)
  serialize_time(obj.to_time)
end

#serialize_falseclass(_obj) ⇒ Object

Serializes the false value.

See: PDF1.7 s7.3.2



169
170
171
# File 'lib/hexapdf/serializer.rb', line 169

def serialize_falseclass(_obj)
  "false"
end

#serialize_float(obj) ⇒ Object

Serializes a Float object.

See: PDF1.7 s7.3.3



193
194
195
# File 'lib/hexapdf/serializer.rb', line 193

def serialize_float(obj)
  -0.0001 < obj && obj < 0.0001 && obj != 0 ? sprintf("%.6f", obj) : obj.round(6).to_s
end

#serialize_hash(obj) ⇒ Object

Serializes a Hash object (i.e. a PDF dictionary object).

See: PDF1.7 s7.3.7



242
243
244
245
246
247
248
249
250
251
252
253
# File 'lib/hexapdf/serializer.rb', line 242

def serialize_hash(obj)
  str = +"<<"
  obj.each do |k, v|
    next if v.nil? || (v.respond_to?(:null?) && v.null?)
    str << serialize_symbol(k)
    tmp = __serialize(v)
    str << " " unless BYTE_IS_DELIMITER[tmp.getbyte(0)] ||
      BYTE_IS_DELIMITER[str.getbyte(-1)]
    str << tmp
  end
  str << ">>"
end

#serialize_integer(obj) ⇒ Object

Serializes an Integer object.

See: PDF1.7 s7.3.3



186
187
188
# File 'lib/hexapdf/serializer.rb', line 186

def serialize_integer(obj)
  obj.to_s
end

#serialize_nilclass(_obj) ⇒ Object

Serializes the nil value.

See: PDF1.7 s7.3.9



155
156
157
# File 'lib/hexapdf/serializer.rb', line 155

def serialize_nilclass(_obj)
  "null"
end

#serialize_numeric(obj) ⇒ Object

Serializes a Numeric object (either Integer or Float).

This method should be used for cases where it is known that the object is either an Integer or a Float.

See: PDF1.7 s7.3.3



179
180
181
# File 'lib/hexapdf/serializer.rb', line 179

def serialize_numeric(obj)
  obj.kind_of?(Integer) ? obj.to_s : serialize_float(obj)
end

#serialize_string(obj) ⇒ Object

Serializes a String object.

See: PDF1.7 s7.3.4



260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
# File 'lib/hexapdf/serializer.rb', line 260

def serialize_string(obj)
  obj = if @encrypter && @object.kind_of?(HexaPDF::Object) && @object.indirect?
          encrypter.encrypt_string(obj, @object)
        elsif obj.encoding != Encoding::BINARY
          if obj.match?(/[^ -~\t\r\n]/)
            "\xFE\xFF".b << obj.encode(Encoding::UTF_16BE).force_encoding(Encoding::BINARY)
          else
            obj.b
          end
        else
          obj.dup
        end
  obj.gsub!(/[()\\\r]/n, STRING_ESCAPE_MAP)
  "(#{obj})"
end

#serialize_symbol(obj) ⇒ Object

Serializes a Symbol object (i.e. a PDF name object).

See: PDF1.7 s7.3.5



211
212
213
214
215
216
217
218
# File 'lib/hexapdf/serializer.rb', line 211

def serialize_symbol(obj)
  NAME_CACHE[obj] ||=
    begin
      str = obj.to_s.dup.force_encoding(Encoding::BINARY)
      str.gsub!(NAME_REGEXP, NAME_SUBSTS)
      str.empty? ? "/ " : "/#{str}"
    end
end

#serialize_time(obj) ⇒ Object

The ISO PDF specification differs in respect to the supported date format. When converting to a date string, a format suitable for both is output.

See: PDF1.7 s7.9.4, ADB1.7 3.8.3



280
281
282
283
284
285
286
287
288
# File 'lib/hexapdf/serializer.rb', line 280

def serialize_time(obj)
  zone = obj.strftime("%z'")
  if zone == "+0000'"
    zone = ''
  else
    zone[3, 0] = "'"
  end
  serialize_string(obj.strftime("D:%Y%m%d%H%M%S#{zone}"))
end

#serialize_to_io(obj, io) ⇒ Object

Serializes the given object and writes it to the IO.

Also see: #serialize



145
146
147
148
149
150
# File 'lib/hexapdf/serializer.rb', line 145

def serialize_to_io(obj, io)
  @io = io
  @io << serialize(obj).freeze
ensure
  @io = nil
end

#serialize_trueclass(_obj) ⇒ Object

Serializes the true value.

See: PDF1.7 s7.3.2



162
163
164
# File 'lib/hexapdf/serializer.rb', line 162

def serialize_trueclass(_obj)
  "true"
end