Purpose
Excavate is a Ruby gem that provides a unified interface for extracting nested archives across multiple compression and archive formats. The gem enables recursive extraction of archives within archives, making it ideal for processing complex software distributions, font packages, and other nested archive scenarios.
Features
Architecture
Excavate follows a clean object-oriented architecture with clear separation of concerns:
┌─────────────────────────────────────────────────────────────┐
│ │
│ Excavate::Archive │
│ (Facade providing unified interface) │
│ │
└───────────────────┬─────────────────────────────────────────┘
│
│ delegates to
│
┌───────────────┴───────────────┐
│ │
│ Format Type Registry │
│ (TYPES hash mapping) │
│ │
└───────────────┬───────────────┘
│
│ instantiates
│
┌───────────────┴───────────────────────────────────┐
│ │
│ Extractors::Extractor │
│ (Abstract base class) │
│ │
└───────────────┬───────────────────────────────────┘
│
│ specialized by
│
┌───────────────┴───────────────────────────┐
│ │
├─ CabExtractor ├─ RpmExtractor │
├─ CpioExtractor ├─ SevenZipExtractor │
├─ GzipExtractor ├─ TarExtractor │
├─ OleExtractor ├─ XarExtractor │
├─ XzExtractor ├─ ZipExtractor │
│ │
└───────────────────────────────────────────┘
User Request
│
├── Archive.new(file_path)
│ │
│ ├── Detect format from extension
│ │
│ └── Select appropriate Extractor
│
├── Archive#extract(target_dir)
│ │
│ ├── Create target directory
│ │
│ ├── Extractor#extract(target)
│ │ │
│ │ └── Format-specific extraction
│ │
│ └── [Optional] Recursive extraction
│ │
│ └── Repeat for nested archives
│
└── Return: Extracted files
The architecture follows these principles:
-
Single Responsibility: Each extractor handles one format
-
Open/Closed: New formats can be added without modifying existing code
-
Dependency Inversion: Archive class depends on Extractor abstraction
Installation
Add this line to your application’s Gemfile:
gem "excavate"
And then execute:
bundle install
Or install it yourself as:
gem install excavate
Supported formats
Excavate supports the following archive and compression formats:
-
CAB (
.cab,.exewith CAB) -
CPIO (
.cpio) -
GZIP (
.gz) -
MSI (
.msi) -
RPM (
.rpm) -
7-Zip (
.7z,.exewith 7z) -
TAR (
.tar) -
XAR (
.pkg) -
XZ (
.xz,.tar.xz) -
ZIP (
.zip)
All formats support recursive extraction for nested archives.
Basic extraction
General
This feature provides the fundamental capability to extract archives to a target directory. It is the core functionality of Excavate and is used as the foundation for all other extraction features.
Syntax
Excavate::Archive.new(archive_path).extract(target_directory) <b class="conum">(1)</b> <b class="conum">(2)</b>
-
archive_path- Path to the archive file to extract -
target_directory- Directory where files will be extracted (optional)
Where,
archive_path-
(required) Path to the archive file to extract. Can be an absolute or relative path.
target_directory-
(optional) Target directory for extraction. If omitted, a directory with the archive’s base name will be created in the current directory.
Usage example
This example extracts a ZIP archive to a temporary directory and lists all extracted files.
Recursive extraction
General
This feature enables automatic extraction of nested archives. When an archive contains other archives, Excavate can recursively extract them all in a single operation. This is particularly useful for complex software distributions that package multiple archives together.
Syntax
Excavate::Archive.new(archive_path).extract(
target_directory,
recursive_packages: true <b class="conum">(1)</b>
)
-
Enable recursive extraction of nested archives
Where,
recursive_packages-
(optional) Boolean flag to enable recursive extraction. When
true, archives found within the extracted files are automatically extracted. Default isfalse.
Usage example
This example shows extraction of an MSI installer that contains nested CAB
archives. With recursive_packages: true, both the MSI and all contained CAB
files are automatically extracted.
The files method with recursive_packages: true processes each extracted
file through a block, allowing selective collection of specific file types.
Selective extraction
General
This feature allows extraction of specific files from an archive without extracting the entire contents. It is useful when working with large archives where only certain files are needed.
Syntax
Excavate::Archive.new(archive_path).extract(
target_directory,
files: [file1, file2, ...] <b class="conum">(1)</b>
)
-
Array of specific file paths to extract from the archive
Where,
files-
(optional) Array of file paths to extract. Paths should match the structure within the archive. If a file is not found, a
TargetNotFoundErroris raised.
Usage example
This extracts only the specified files, even though the archive may contain many more files.
When combined with recursive_packages: true, you can specify paths through
nested archives using the format archive.zip/path/to/file.
Filter-based extraction
General
This feature provides pattern-based file selection for extraction. Instead of specifying exact file paths, you can use glob patterns to match multiple files, making it ideal for extracting files by type or naming convention.
Syntax
Excavate::Archive.new(archive_path).extract(
target_directory,
filter: "pattern" <b class="conum">(1)</b>
)
-
Glob pattern to match files for extraction
Where,
filter-
(optional) Glob pattern string to match files. Supports standard glob syntax including
(any characters),*(any directories), and character classes. If no files match, aTargetNotFoundErroris raised.
Usage example
Complex patterns can use brace expansion to match multiple extensions or patterns.
Command-line interface
General
Excavate provides a command-line tool for extracting archives directly from the shell. The CLI supports all the same features as the Ruby API, making it suitable for shell scripts and interactive use.
Syntax
excavate [OPTIONS] ARCHIVE [FILES...] <b class="conum">(1)</b> <b class="conum">(2)</b> <b class="conum">(3)</b>
-
Command options for controlling extraction behavior
-
Path to the archive file
-
Optional list of specific files to extract
Where,
--recursive-
Enable recursive extraction of nested archives
--filter PATTERN-
Extract only files matching the glob pattern
ARCHIVE-
Path to the archive file to extract
FILES…-
Optional list of specific file paths to extract
Usage example
# Extract archive to a directory with the archive's base name
excavate fonts.zip
# Extract with recursive nested archive processing
excavate --recursive application.msi
# Extract from a directory of archives
excavate --recursive archive_directory/
Basic extraction creates a directory named after the archive (without extension) in the current directory.
# Extract specific files
excavate fonts.zip Fonts/Arial.ttf Fonts/Verdana.ttf
# Extract files matching a pattern
excavate --filter "**/*.ttf" fonts.zip
# Extract from nested archives
excavate --recursive outer.zip nested.zip/file.txt
The CLI supports the same selective extraction features as the Ruby API.
# Extract TAR.XZ archive
excavate wine-10.18.tar.xz
# Extract XZ with recursive processing
excavate --recursive package.tar.xz
# Extract specific files from XZ archive
excavate package.tar.xz --filter "*.conf"
XZ compressed archives (both .xz and .tar.xz) are fully supported through
the command-line interface.
Dependencies
Excavate depends on the following system libraries through the ffi-libarchive-binary gem:
-
zlib
-
Expat
-
OpenSSL (for Linux only)
These dependencies are generally present on all systems and require no special installation steps.
Development
General
When contributing to Excavate, follow these development guidelines to maintain code quality and consistency.
Coding standards
We follow Sandi Metz’s Rules for this gem. You can read the description of the rules here. All new code should follow these rules. If you make changes in a pre-existing file that violates these rules, you should fix the violations as part of your contribution.
Testing
Run the test suite with:
bundle exec rspec
Ensure all tests pass before submitting a pull request.
Releasing
Releasing is done automatically with GitHub Actions. Just bump and tag with
gem-release.
For a patch release (0.0.x) use:
gem bump --version patch --tag --push
For a minor release (0.x.0) use:
gem bump --version minor --tag --push
Contributing
First, thank you for contributing! We love pull requests from everyone. By participating in this project, you hereby grant Ribose Inc. the right to grant or transfer an unlimited number of non exclusive licenses or sub-licenses to third parties, under the copyright covering the contribution to use the contribution by all means.
Here are a few technical guidelines to follow:
-
Open an issue to discuss a new feature.
-
Write tests to support your new feature.
-
Make sure the entire test suite passes locally and on CI.
-
Open a Pull Request.
-
Squash your commits after receiving feedback.
-
Party!
License
This gem is distributed with a BSD 3-Clause license.
This gem is developed, maintained and funded by Ribose Inc.