Class: Chimps::Workflows::Upload::Bundler
- Defined in:
- lib/chimps/workflows/upload/bundler.rb
Overview
Encapsulates the process of analyzing and bundling input paths.
Instance Attribute Summary collapse
-
#dataset ⇒ Object
The dataset this bundler is processing data for.
-
#fmt ⇒ Object
The format of the data being bundled.
-
#paths ⇒ Object
The paths this bundler is processing.
-
#resources ⇒ Object
readonly
The resources this bundler is processing.
Instance Method Summary collapse
-
#archive ⇒ IMW::Resource
The archive this bundler will build for uploading to Infochimps.
-
#archive=(path_or_obj) ⇒ Object
Set the path to the archive that will be built.
-
#archiver ⇒ IMW::Tools::Archiver
The IMW::Tools::Archiver responsible for packaging files into a local archive.
-
#bundle! ⇒ Object
Bundle the data for this bundler together.
-
#default_archive_extension ⇒ 'tar.bz2', 'zip'
end
zipif the data is less than 500 MB in size andtar.bz2otherwise. -
#default_archive_path ⇒ String
The default path to the archive that will be built.
-
#icss_url ⇒ Object
The URL to the ICSS file for this dataset on Infochimps servers.
-
#initialize(dataset, paths, options = {}) ⇒ Bundler
constructor
Instantiate a new Bundler for bundling
pathsas a package fordataset. -
#paths_to_bundle ⇒ Array<String>
Both the local paths and remote paths to package.
-
#pkg_fmt ⇒ String
Return the package format of this bundler’s archive, i.e.
-
#readme_url ⇒ String
The URL to the
README-infochimpsfile on Infochimps’ servers. -
#size ⇒ Integer
Return the total size of the package after aggregating and packaging.
-
#skip_packaging? ⇒ true, false
Should the packaging step be skipped?.
-
#summarizer ⇒ IMW::Tools::Summarizer
Return the summarizer responsible for summarizing data on this upload.
-
#summary ⇒ Hash
Return summary information about the package prepared by the bundler.
Constructor Details
#initialize(dataset, paths, options = {}) ⇒ Bundler
Instantiate a new Bundler for bundling paths as a package for dataset.
Each input path can be either a String or an IMW::Resource identifying a local or remote resource to bundle into an upload package for Infochimps (remote resources will be first copied to the local filesystem by IMW).
If no format is given the format will be guessed by IMW.
If not archive is given the archive path will be set to a timestamped named in the current directory, see Bundler#default_archive_path.
32 33 34 35 36 37 38 39 40 41 42 |
# File 'lib/chimps/workflows/upload/bundler.rb', line 32 def initialize dataset, paths, ={} require_imw @dataset = dataset self.paths = paths if [:fmt] self.fmt = [:fmt] end if [:archive] self.archive = [:archive] end end |
Instance Attribute Details
#dataset ⇒ Object
The dataset this bundler is processing data for.
45 46 47 |
# File 'lib/chimps/workflows/upload/bundler.rb', line 45 def dataset @dataset end |
#fmt ⇒ Object
The format of the data being bundled.
Will make a guess using IMW::Tools::Summarizer if no format is given.
90 91 92 |
# File 'lib/chimps/workflows/upload/bundler.rb', line 90 def fmt @fmt ||= summarizer.most_common_data_format end |
#paths ⇒ Object
The paths this bundler is processing.
48 49 50 |
# File 'lib/chimps/workflows/upload/bundler.rb', line 48 def paths @paths end |
#resources ⇒ Object (readonly)
The resources this bundler is processing.
Resources are IMW::Resource objects built from this Bundler’s paths.
54 55 56 |
# File 'lib/chimps/workflows/upload/bundler.rb', line 54 def resources @resources end |
Instance Method Details
#archive ⇒ IMW::Resource
The archive this bundler will build for uploading to Infochimps.
98 99 100 101 102 |
# File 'lib/chimps/workflows/upload/bundler.rb', line 98 def archive return @archive if @archive self.archive = default_archive_path self.archive end |
#archive=(path_or_obj) ⇒ Object
Set the path to the archive that will be built.
The given path must represent a compressed file or archive (.tar, .tar.gz., .tar.bz2, .zip, .rar, .bz2, or .gz extension).
Additionally, if multiple local paths are being packaged, the given path must be an archive (not simply .bz2 or .gz extensions).
116 117 118 119 120 121 |
# File 'lib/chimps/workflows/upload/bundler.rb', line 116 def archive= path_or_obj potential_package = IMW.open(path_or_obj) raise PackagingError.new("Invalid path #{potential_package}, not an archive or compressed file") unless potential_package.is_compressed? || potential_package.is_archive? raise PackagingError.new("Multiple local paths must be packaged in an archive, not a compressed file.") if resources.size > 1 && !potential_package.is_archive? @archive = potential_package end |
#archiver ⇒ IMW::Tools::Archiver
The IMW::Tools::Archiver responsible for packaging files into a local archive.
163 164 165 |
# File 'lib/chimps/workflows/upload/bundler.rb', line 163 def archiver @archiver ||= IMW::Tools::Archiver.new(archive.name, paths_to_bundle) end |
#bundle! ⇒ Object
Bundle the data for this bundler together.
148 149 150 151 152 153 |
# File 'lib/chimps/workflows/upload/bundler.rb', line 148 def bundle! return if skip_packaging? result = archiver.package(archive.path) raise PackagingError.new("Unable to package files for upload. Temporary files left in #{archiver.tmp_dir}") if result.is_a?(StandardError) || (!archiver.success?) archiver.clean! end |
#default_archive_extension ⇒ 'tar.bz2', 'zip'
end zip if the data is less than 500 MB in size and tar.bz2 otherwise.
207 208 209 |
# File 'lib/chimps/workflows/upload/bundler.rb', line 207 def default_archive_extension summarizer.total_size >= 524288000 ? 'tar.bz2' : 'zip' end |
#default_archive_path ⇒ String
The default path to the archive that will be built.
Defaults to a file in the current directory named after the dataset‘s ID or handle and the current time. The package format (.zip or .tar.bz2) is determined by size, see Chimps::Workflows::Uploader#default_archive_extension.
198 199 200 201 |
# File 'lib/chimps/workflows/upload/bundler.rb', line 198 def default_archive_path # in current working directory... "chimps_#{dataset}-#{Time.now.strftime(Chimps::CONFIG[:timestamp_format])}.#{default_archive_extension}" end |
#icss_url ⇒ Object
The URL to the ICSS file for this dataset on Infochimps servers
221 222 223 |
# File 'lib/chimps/workflows/upload/bundler.rb', line 221 def icss_url File.join(Chimps::CONFIG[:site][:host], "datasets", "#{dataset}.yaml") end |
#paths_to_bundle ⇒ Array<String>
Both the local paths and remote paths to package.
228 229 230 |
# File 'lib/chimps/workflows/upload/bundler.rb', line 228 def paths_to_bundle paths + [readme_url, icss_url] end |
#pkg_fmt ⇒ String
Return the package format of this bundler’s archive, i.e. - its extension.
127 128 129 |
# File 'lib/chimps/workflows/upload/bundler.rb', line 127 def pkg_fmt archive.extension end |
#readme_url ⇒ String
The URL to the README-infochimps file on Infochimps’ servers.
215 216 217 |
# File 'lib/chimps/workflows/upload/bundler.rb', line 215 def readme_url File.join(Chimps::CONFIG[:site][:host], "/README-infochimps") end |
#size ⇒ Integer
Return the total size of the package after aggregating and packaging.
135 136 137 |
# File 'lib/chimps/workflows/upload/bundler.rb', line 135 def size archive.size end |
#skip_packaging? ⇒ true, false
Should the packaging step be skipped?
This will happen if only one local input path was provided and it exists and is a compressed file or archive.
181 182 183 |
# File 'lib/chimps/workflows/upload/bundler.rb', line 181 def skip_packaging? !! @skip_packaging end |
#summarizer ⇒ IMW::Tools::Summarizer
Return the summarizer responsible for summarizing data on this upload.
171 172 173 |
# File 'lib/chimps/workflows/upload/bundler.rb', line 171 def summarizer @summarizer ||= IMW::Tools::Summarizer.new(resources) end |
#summary ⇒ Hash
Return summary information about the package prepared by the bundler.
143 144 145 |
# File 'lib/chimps/workflows/upload/bundler.rb', line 143 def summary summarizer.summary end |