Class: ZipTricks::FileReader
- Inherits:
-
Object
- Object
- ZipTricks::FileReader
- Defined in:
- lib/zip_tricks/file_reader.rb
Overview
A very barebones ZIP file reader. Is made for maximum interoperability, but at the same time we attempt to keep it somewhat concise.
REALLY CRAZY IMPORTANT STUFF: SECURITY IMPLICATIONS
Please BEWARE - using this is a security risk if you are reading files that have been
supplied by users. This implementation has not been formally verified for correctness. As
ZIP files contain relative offsets in lots of places it might be possible for a maliciously
crafted ZIP file to put the decode procedure in an endless loop, make it attempt huge reads
from the input file and so on. Additionally, the reader module for deflated data has
no support for ZIP bomb protection. So either limit the FileReader usage to the files you
trust, or triple-check all the inputs upfront. Patches to make this reader more secure
are welcome of course.
Usage
File.open('zipfile.zip', 'rb') do |f|
entries = FileReader.read_zip_structure(f)
entries.each do |e|
File.open(e.filename, 'wb') do |extracted_file|
ex = e.extractor_from(f)
extracted_file << ex.extract(1024 * 1024) until ex.eof?
end
end
end
Supported features
- Deflate and stored storage modes
- Zip64 (extra fields and offsets)
- Data descriptors
Unsupported features
- Archives split over multiple disks/files
- Any ZIP encryption
- EFS language flag and InfoZIP filename extra field
- CRC32 checksums are not verified
Mode of operation
Basically, FileReader ignores the data in local file headers (as it is often unreliable).
It reads the ZIP file "from the tail", finds the end-of-central-directory signatures, then
reads the central directory entries, reconstitutes the entries with their filenames, attributes
and so on, and sets these entries up with the absolute offsets into the source file/IO object.
These offsets can then be used to extract the actual compressed data of the files and to expand it.
Defined Under Namespace
Classes: ZipEntry
Constant Summary collapse
- ReadError =
Class.new(StandardError)
- UnsupportedFeature =
Class.new(StandardError)
- InvalidStructure =
Class.new(ReadError)
- LocalHeaderPending =
Class.new(StandardError) do def "The compressed data offset is not available (local header has not been read)" end end
Class Method Summary collapse
-
.read_zip_structure(**options) ⇒ Array<Entry>
Parse an IO handle to a ZIP archive into an array of Entry objects.
Instance Method Summary collapse
-
#get_compressed_data_offset(io:, local_file_header_offset:) ⇒ Fixnum
Get the offset in the IO at which the actual compressed data of the file starts within the ZIP.
-
#read_zip_structure(io:, read_local_headers: true) ⇒ Array<Entry>
Parse an IO handle to a ZIP archive into an array of Entry objects.
Class Method Details
.read_zip_structure(**options) ⇒ Array<Entry>
Parse an IO handle to a ZIP archive into an array of Entry objects.
240 241 242 |
# File 'lib/zip_tricks/file_reader.rb', line 240 def self.read_zip_structure(**) new.read_zip_structure(**) end |
Instance Method Details
#get_compressed_data_offset(io:, local_file_header_offset:) ⇒ Fixnum
Get the offset in the IO at which the actual compressed data of the file starts within the ZIP. The method will eager-read the entire local header for the file (the maximum size the local header may use), starting at the given offset, and will then compute its size. That size plus the local header offset given will be the compressed data offset of the entry (read starting at this offset to get the data).
205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 |
# File 'lib/zip_tricks/file_reader.rb', line 205 def get_compressed_data_offset(io:, local_file_header_offset:) seek(io, local_file_header_offset) # Reading in bulk is cheaper - grab the maximum length of the local header, # including any headroom local_file_header_str_plus_headroom = io.read(MAX_LOCAL_HEADER_SIZE) io_starting_at_local_header = StringIO.new(local_file_header_str_plus_headroom) assert_signature(io_starting_at_local_header, 0x04034b50) # The rest is unreliable, and we have that information from the central directory already. # So just skip over it to get at the offset where the compressed data begins skip_ahead_2(io_starting_at_local_header) # Version needed to extract skip_ahead_2(io_starting_at_local_header) # gp flags skip_ahead_2(io_starting_at_local_header) # storage mode skip_ahead_2(io_starting_at_local_header) # dos time skip_ahead_2(io_starting_at_local_header) # dos date skip_ahead_4(io_starting_at_local_header) # CRC32 skip_ahead_4(io_starting_at_local_header) # Comp size skip_ahead_4(io_starting_at_local_header) # Uncomp size filename_size = read_2b(io_starting_at_local_header) extra_size = read_2b(io_starting_at_local_header) skip_ahead_n(io_starting_at_local_header, filename_size) skip_ahead_n(io_starting_at_local_header, extra_size) local_file_header_offset + io_starting_at_local_header.tell end |
#read_zip_structure(io:, read_local_headers: true) ⇒ Array<Entry>
Parse an IO handle to a ZIP archive into an array of Entry objects.
169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 |
# File 'lib/zip_tricks/file_reader.rb', line 169 def read_zip_structure(io:, read_local_headers: true) zip_file_size = io.size eocd_offset = get_eocd_offset(io, zip_file_size) zip64_end_of_cdir_location = get_zip64_eocd_location(io, eocd_offset) num_files, cdir_location, cdir_size = if zip64_end_of_cdir_location num_files_and_central_directory_offset_zip64(io, zip64_end_of_cdir_location) else num_files_and_central_directory_offset(io, eocd_offset) end log { 'Located the central directory start at %d' % cdir_location } seek(io, cdir_location) # Read the entire central directory in one fell swoop central_directory_str = read_n(io, cdir_size) central_directory_io = StringIO.new(central_directory_str) log { 'Read %d bytes with central directory entries' % cdir_size } entries = (0...num_files).map do |entry_n| log { 'Reading the central directory entry %d starting at offset %d' % [entry_n, cdir_location + central_directory_io.tell] } read_cdir_entry(central_directory_io) end read_local_headers(entries, io) if read_local_headers entries end |