Class: HexaPDF::Serializer

Inherits:
Object
  • Object
show all
Defined in:
lib/hexapdf/serializer.rb

Overview

Knows how to serialize Ruby objects for a PDF file.

For normal serialization purposes, the #serialize or #serialize_to_io methods should be used. However, if the type of the object to be serialized is known, a specialized serialization method like #serialize_float can be used.

Additionally, an object for encrypting strings and streams while serializing can be set via the #encrypter= method. The assigned object has to respond to #encrypt_string(str, ind_obj) (where the string is part of the indirect object; returns the encrypted string) and #encrypt_stream(stream) (returns a fiber that represents the encrypted stream).

How This Class Works

The main public interface consists of the #serialize and #serialize_to_io methods which accept an object and return its serialized form. During serialization of this object it is accessible by individual serialization methods via the @object instance variable (useful if the object is a composed object).

Internally, the #__serialize method is used for invoking the correct serialization method based on the class of a given object. It is also used for serializing individual parts of a composed object.

Therefore the serializer contains one serialization method for each class it needs to serialize. The naming scheme of these methods is based on the class name: The full class name is converted to lowercase, the namespace separator ‘::’ is replaced with a single underscore and the string “serialize_” is then prepended.

Examples:

NilClass                 => serialize_nilclass
TrueClass                => serialize_trueclass
HexaPDF::Object          => serialize_hexapdf_object

If no serialization method for a specific class is found, the ancestors classes are tried.

See: PDF1.7 s7.3

Constant Summary collapse

NAME_SUBSTS =

The regexp matches all characters that need to be escaped and the substs hash contains the mapping from these characters to their escaped form.

See PDF1.7 s7.3.5

{}
NAME_REGEXP =

:nodoc:

/[^!-~&&[^##{Regexp.escape(Tokenizer::DELIMITER)}#{Regexp.escape(Tokenizer::WHITESPACE)}]]/
NAME_CACHE =

:nodoc:

Utils::LRUCache.new(1000)
BYTE_IS_DELIMITER =

:nodoc:

{40 => true, 47 => true, 60 => true, 91 => true, # :nodoc:
41 => true, 62 => true, 93 => true}.freeze
STRING_ESCAPE_MAP =

:nodoc:

{"(" => "\\(", ")" => "\\)", "\\" => "\\\\", "\r" => "\\r"}.freeze

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initializeSerializer

Creates a new Serializer object.



87
88
89
90
91
92
93
94
95
96
97
98
99
100
# File 'lib/hexapdf/serializer.rb', line 87

def initialize
  @dispatcher = Hash.new do |h, klass|
    method = nil
    klass.ancestors.each do |ancestor_klass|
      method = "serialize_#{ancestor_klass.name.to_s.downcase.gsub(/::/, '_')}"
      (h[klass] = method; break) if respond_to?(method, true)
    end
    method
  end
  @encrypter = false
  @io = nil
  @object = nil
  @in_object = false
end

Instance Attribute Details

#encrypterObject

The encrypter to use for encrypting strings and streams. If nil, strings and streams are not encrypted.

Default: nil



84
85
86
# File 'lib/hexapdf/serializer.rb', line 84

def encrypter
  @encrypter
end

Instance Method Details

#serialize(obj) ⇒ Object

Returns the serialized form of the given object.

For developers: While the object is serialized, methods can use the instance variable



106
107
108
109
110
111
# File 'lib/hexapdf/serializer.rb', line 106

def serialize(obj)
  @object = obj
  __serialize(obj)
ensure
  @object = nil
end

#serialize_array(obj) ⇒ Object

Serializes an Array object.

See: PDF1.7 s7.3.6



197
198
199
200
201
202
203
204
205
206
207
208
# File 'lib/hexapdf/serializer.rb', line 197

def serialize_array(obj)
  str = +"["
  index = 0
  while index < obj.size
    tmp = __serialize(obj[index])
    str << " " unless BYTE_IS_DELIMITER[tmp.getbyte(0)] ||
        BYTE_IS_DELIMITER[str.getbyte(-1)]
    str << tmp
    index += 1
  end
  str << "]"
end

#serialize_date(obj) ⇒ Object

See: #serialize_time



262
263
264
# File 'lib/hexapdf/serializer.rb', line 262

def serialize_date(obj)
  serialize_time(obj.to_time)
end

#serialize_datetime(obj) ⇒ Object

See: #serialize_time



267
268
269
# File 'lib/hexapdf/serializer.rb', line 267

def serialize_datetime(obj)
  serialize_time(obj.to_time)
end

#serialize_falseclass(_obj) ⇒ Object

Serializes the false value.

See: PDF1.7 s7.3.2



140
141
142
# File 'lib/hexapdf/serializer.rb', line 140

def serialize_falseclass(_obj)
  "false"
end

#serialize_float(obj) ⇒ Object

Serializes a Float object.

See: PDF1.7 s7.3.3



164
165
166
# File 'lib/hexapdf/serializer.rb', line 164

def serialize_float(obj)
  -0.0001 < obj && obj < 0.0001 && obj != 0 ? sprintf("%.6f", obj) : obj.round(6).to_s
end

#serialize_hash(obj) ⇒ Object

Serializes a Hash object (i.e. a PDF dictionary object).

See: PDF1.7 s7.3.7



213
214
215
216
217
218
219
220
221
222
223
224
# File 'lib/hexapdf/serializer.rb', line 213

def serialize_hash(obj)
  str = +"<<"
  obj.each do |k, v|
    next if v.nil? || (v.respond_to?(:null?) && v.null?)
    str << __serialize(k)
    tmp = __serialize(v)
    str << " " unless BYTE_IS_DELIMITER[tmp.getbyte(0)] ||
        BYTE_IS_DELIMITER[str.getbyte(-1)]
    str << tmp
  end
  str << ">>"
end

#serialize_integer(obj) ⇒ Object

Serializes an Integer object.

See: PDF1.7 s7.3.3



157
158
159
# File 'lib/hexapdf/serializer.rb', line 157

def serialize_integer(obj)
  obj.to_s
end

#serialize_nilclass(_obj) ⇒ Object

Serializes the nil value.

See: PDF1.7 s7.3.9



126
127
128
# File 'lib/hexapdf/serializer.rb', line 126

def serialize_nilclass(_obj)
  "null"
end

#serialize_numeric(obj) ⇒ Object

Serializes a Numeric object (either Integer or Float).

This method should be used for cases where it is known that the object is either an Integer or a Float.

See: PDF1.7 s7.3.3



150
151
152
# File 'lib/hexapdf/serializer.rb', line 150

def serialize_numeric(obj)
  obj.kind_of?(Integer) ? obj.to_s : serialize_float(obj)
end

#serialize_string(obj) ⇒ Object

Serializes a String object.

See: PDF1.7 s7.3.4



231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
# File 'lib/hexapdf/serializer.rb', line 231

def serialize_string(obj)
  obj = if @encrypter && @object.kind_of?(HexaPDF::Object) && @object.indirect?
          encrypter.encrypt_string(obj, @object)
        elsif obj.encoding != Encoding::BINARY
          if obj.match?(/[^ -~\t\r\n]/)
            "\xFE\xFF".b << obj.encode(Encoding::UTF_16BE).force_encoding(Encoding::BINARY)
          else
            obj.b
          end
        else
          obj.dup
        end
  obj.gsub!(/[\(\)\\\r]/n, STRING_ESCAPE_MAP)
  "(#{obj})"
end

#serialize_symbol(obj) ⇒ Object

Serializes a Symbol object (i.e. a PDF name object).

See: PDF1.7 s7.3.5



182
183
184
185
186
187
188
189
# File 'lib/hexapdf/serializer.rb', line 182

def serialize_symbol(obj)
  NAME_CACHE[obj] ||=
    begin
      str = obj.to_s.force_encoding(Encoding::BINARY)
      str.gsub!(NAME_REGEXP) {|m| NAME_SUBSTS[m] }
      "/#{str}"
    end
end

#serialize_time(obj) ⇒ Object

The ISO PDF specification differs in respect to the supported date format. When converting to a date string, a format suitable for both is output.

See: PDF1.7 s7.9.4, ADB1.7 3.8.3



251
252
253
254
255
256
257
258
259
# File 'lib/hexapdf/serializer.rb', line 251

def serialize_time(obj)
  zone = obj.strftime("%z'")
  if zone == "+0000'"
    zone = ''
  else
    zone[3, 0] = "'"
  end
  serialize_string(obj.strftime("D:%Y%m%d%H%M%S#{zone}"))
end

#serialize_to_io(obj, io) ⇒ Object

Serializes the given object and writes it to the IO.

Also see: #serialize



116
117
118
119
120
121
# File 'lib/hexapdf/serializer.rb', line 116

def serialize_to_io(obj, io)
  @io = io
  @io << serialize(obj).freeze
ensure
  @io = nil
end

#serialize_trueclass(_obj) ⇒ Object

Serializes the true value.

See: PDF1.7 s7.3.2



133
134
135
# File 'lib/hexapdf/serializer.rb', line 133

def serialize_trueclass(_obj)
  "true"
end