Class: ZipTricks::Streamer

Inherits:
Object
  • Object
show all
Defined in:
lib/zip_tricks/streamer.rb

Overview

Is used to write streamed ZIP archives into the provided IO-ish object. The output IO is never going to be rewound or seeked, so the output of this object can be coupled directly to, say, a Rack output.

Allows for splicing raw files (for "stored" entries without compression) and splicing of deflated files (for "deflated" storage mode).

For stored entries, you need to know the CRC32 (as a uint) and the filesize upfront, before the writing of the entry body starts.

Any object that responds to << can be used as the Streamer target - you can use a String, an Array, a Socket or a File, at your leisure.

Using the Streamer with runtime compression

You can use the Streamer with data descriptors (the CRC32 and the sizes will be written after the file data). This allows non-rewinding on-the-fly compression. If you are compressing large files, the Deflater object that the Streamer controls will be regularly flushed to prevent memory inflation.

ZipTricks::Streamer.open(file_socket_or_string) do |zip|
  zip.write_stored_file('mov.mp4') do |sink|
    File.open('mov.mp4', 'rb'){|source| IO.copy_stream(source, sink) }
  end
  zip.write_deflated_file('long-novel.txt') do |sink|
    File.open('novel.txt', 'rb'){|source| IO.copy_stream(source, sink) }
  end
end

The central directory will be written automatically at the end of the block.

Using the Streamer with entries of known size and having a known CRC32 checksum

Streamer allows "IO splicing" - in this mode it will only control the metadata output, but you can write the data to the socket/file outside of the Streamer. For example, when using the sendfile gem:

ZipTricks::Streamer.open(socket) do | zip |
  zip.add_stored_entry(filename: "myfile1.bin", size: 9090821, crc32: 12485)
  zip.simulate_write(tempfile1.size)
  socket.sendfile(tempfile1)
  zip.add_stored_entry(filename: "myfile2.bin", size: 458678, crc32: 89568)
  zip.simulate_write(tempfile2.size)
  socket.sendfile(tempfile2)
end

Note that you need to use simulate_write to let the The central directory will be written automatically at the end of the block.

Defined Under Namespace

Classes: Entry, Writable

Constant Summary collapse

EntryBodySizeMismatch =
Class.new(StandardError)
InvalidOutput =
Class.new(ArgumentError)
Overflow =
Class.new(StandardError)
PathError =
Class.new(StandardError)
DuplicateFilenames =
Class.new(StandardError)
UnknownMode =
Class.new(StandardError)

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(stream) ⇒ Streamer

Creates a new Streamer on top of the given IO-ish object.

Parameters:

  • stream (IO)

    the destination IO for the ZIP (should respond to <<)

Raises:



82
83
84
85
86
87
88
89
90
91
92
# File 'lib/zip_tricks/streamer.rb', line 82

def initialize(stream)
  raise InvalidOutput, "The stream must respond to #<<" unless stream.respond_to?(:<<)
  unless stream.respond_to?(:tell) && stream.respond_to?(:advance_position_by)
    stream = ZipTricks::WriteAndTell.new(stream) 
  end
  
  @out = stream
  @files = []
  @local_header_offsets = []
  @writer = create_writer
end

Class Method Details

.open(stream) {|Streamer| ... } ⇒ Object

Creates a new Streamer on top of the given IO-ish object and yields it. Once the given block returns, the Streamer will have it's close method called, which will write out the central directory of the archive to the output.

Parameters:

  • stream (IO)

    the destination IO for the ZIP (should respond to tell and <<)

Yields:

  • (Streamer)

    the streamer that can be written to



73
74
75
76
77
# File 'lib/zip_tricks/streamer.rb', line 73

def self.open(stream)
  archive = new(stream)
  yield(archive)
  archive.close
end

Instance Method Details

#<<(binary_data) ⇒ Object

Writes a part of a zip entry body (actual binary data of the entry) into the output stream.

Parameters:

  • binary_data (String)

    a String in binary encoding

Returns:

  • self



98
99
100
101
# File 'lib/zip_tricks/streamer.rb', line 98

def <<(binary_data)
  @out << binary_data
  self
end

#add_compressed_entry(filename:, compressed_size:, uncompressed_size:, crc32:) ⇒ Fixnum

Writes out the local header for an entry (file in the ZIP) that is using the deflated storage model (is compressed). Once this method is called, the << method has to be called to write the actual contents of the body.

Note that the deflated body that is going to be written into the output has to be precompressed (pre-deflated) before writing it into the Streamer, because otherwise it is impossible to know it's size upfront.

Parameters:

  • filename (String)

    the name of the file in the entry

  • compressed_size (Fixnum)

    the size of the compressed entry that is going to be written into the archive

  • uncompressed_size (Fixnum)

    the size of the entry when uncompressed, in bytes

  • crc32 (Fixnum)

    the CRC32 checksum of the entry when uncompressed

Returns:

  • (Fixnum)

    the offset the output IO is at after writing the entry header



136
137
138
139
140
# File 'lib/zip_tricks/streamer.rb', line 136

def add_compressed_entry(filename:, compressed_size:, uncompressed_size:, crc32:)
  add_file_and_write_local_header(filename: filename, crc32: crc32, storage_mode: DEFLATED, 
    compressed_size: compressed_size, uncompressed_size: uncompressed_size)
  @out.tell
end

#add_stored_entry(filename:, size:, crc32:) ⇒ Fixnum

Writes out the local header for an entry (file in the ZIP) that is using the stored storage model (is stored as-is). Once this method is called, the << method has to be called one or more times to write the actual contents of the body.

Parameters:

  • filename (String)

    the name of the file in the entry

  • size (Fixnum)

    the size of the file when uncompressed, in bytes

  • crc32 (Fixnum)

    the CRC32 checksum of the entry when uncompressed

Returns:

  • (Fixnum)

    the offset the output IO is at after writing the entry header



149
150
151
152
153
# File 'lib/zip_tricks/streamer.rb', line 149

def add_stored_entry(filename:, size:, crc32:)
  add_file_and_write_local_header(filename: filename, crc32: crc32, storage_mode: STORED,
    compressed_size: size, uncompressed_size: size)
  @out.tell
end

#closeFixnum

Closes the archive. Writes the central directory, and switches the writer into a state where it can no longer be written to.

Once this method is called, the Streamer should be discarded (the ZIP archive is complete).

Returns:

  • (Fixnum)

    the offset the output IO is at after closing the archive



206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
# File 'lib/zip_tricks/streamer.rb', line 206

def close
  # Record the central directory offset, so that it can be written into the EOCD record
  cdir_starts_at = @out.tell
  
  # Write out the central directory entries, one for each file
  @files.each_with_index do |entry, i|
    header_loc = @local_header_offsets.fetch(i)
    @writer.write_central_directory_file_header(io: @out, local_file_header_location: header_loc,
      gp_flags: entry.gp_flags, storage_mode: entry.storage_mode,
      compressed_size: entry.compressed_size, uncompressed_size: entry.uncompressed_size,
      mtime: entry.mtime, crc32: entry.crc32, filename: entry.filename) #, external_attrs: DEFAULT_EXTERNAL_ATTRS)
  end
  
  # Record the central directory size, for the EOCDR
  cdir_size = @out.tell - cdir_starts_at

  # Write out the EOCDR
  @writer. write_end_of_central_directory(io: @out, start_of_central_directory_location: cdir_starts_at,
     central_directory_size: cdir_size, num_files_in_archive: @files.length)
  @out.tell
end

#create_writerZipTricks::ZipWriter

Sets up the ZipWriter with wrappers if necessary. The method is called once, when the Streamer gets instantiated - the Writer then gets reused. This method is primarily there so that you can override it.

Returns:



233
234
235
# File 'lib/zip_tricks/streamer.rb', line 233

def create_writer
  ZipTricks::ZipWriter.new
end

#simulate_write(num_bytes) ⇒ Numeric

Advances the internal IO pointer to keep the offsets of the ZIP file in check. Use this if you are going to use accelerated writes to the socket (like the sendfile() call) after writing the headers, or if you just need to figure out the size of the archive.

Parameters:

  • num_bytes (Numeric)

    how many bytes are going to be written bypassing the Streamer

Returns:

  • (Numeric)

    position in the output stream / ZIP archive



120
121
122
123
# File 'lib/zip_tricks/streamer.rb', line 120

def simulate_write(num_bytes)
  @out.advance_position_by(num_bytes)
  @out.tell
end

#write(binary_data) ⇒ Fixnum

Writes a part of a zip entry body (actual binary data of the entry) into the output stream, and returns the number of bytes written. Is implemented to make Streamer usable with IO.copy_stream(from, to).

Parameters:

  • binary_data (String)

    a String in binary encoding

Returns:

  • (Fixnum)

    the number of bytes written



109
110
111
112
# File 'lib/zip_tricks/streamer.rb', line 109

def write(binary_data)
  @out << binary_data
  binary_data.bytesize
end

#write_deflated_file(filename) {|#<<, #write| ... } ⇒ Object

Opens the stream for a deflated file in the archive, and yields a writer for that file to the block. Once the write completes, a data descriptor will be written with the actual compressed/uncompressed sizes and the CRC32 checksum.

Parameters:

  • filename (String)

    the name of the file in the archive

Yields:

  • (#<<, #write)

    an object that the file contents must be written to



184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
# File 'lib/zip_tricks/streamer.rb', line 184

def write_deflated_file(filename)
  add_file_and_write_local_header(filename: filename, storage_mode: DEFLATED,
    use_data_descriptor: true, crc32: 0, compressed_size: 0, uncompressed_size: 0)

  w = DeflatedWriter.new(@out)
  yield(Writable.new(w))
  crc, comp, uncomp = w.finish

  # Save the information into the entry for when the time comes to write out the central directory
  last_entry = @files[-1]
  last_entry.crc32 = crc
  last_entry.compressed_size = comp
  last_entry.uncompressed_size = uncomp
  write_data_descriptor_for_last_entry
end

#write_stored_file(filename) {|#<<, #write| ... } ⇒ Object

Opens the stream for a stored file in the archive, and yields a writer for that file to the block. Once the write completes, a data descriptor will be written with the actual compressed/uncompressed sizes and the CRC32 checksum.

Parameters:

  • filename (String)

    the name of the file in the archive

Yields:

  • (#<<, #write)

    an object that the file contents must be written to



161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
# File 'lib/zip_tricks/streamer.rb', line 161

def write_stored_file(filename)
  add_file_and_write_local_header(filename: filename, storage_mode: STORED,
    use_data_descriptor: true, crc32: 0, compressed_size: 0, uncompressed_size: 0)

  w = StoredWriter.new(@out)
  yield(Writable.new(w))
  crc, comp, uncomp = w.finish

  # Save the information into the entry for when the time comes to write out the central directory
  last_entry = @files[-1]
  last_entry.crc32 = crc
  last_entry.compressed_size = comp
  last_entry.uncompressed_size = uncomp

  @writer.write_data_descriptor(io: @out, crc32: crc, compressed_size: comp, uncompressed_size: uncomp)
end