Class: Chimps::Workflows::Upload::Bundler

Inherits:
Object
  • Object
show all
Defined in:
lib/chimps/workflows/upload/bundler.rb

Overview

Encapsulates the process of analyzing and bundling input paths.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(dataset, paths, options = {}) ⇒ Bundler

Instantiate a new Bundler for bundling paths as a package for dataset.

Each input path can be either a String or an IMW::Resource identifying a local or remote resource to bundle into an upload package for Infochimps (remote resources will be first copied to the local filesystem by IMW).

If no format is given the format will be guessed by IMW.

If not archive is given the archive path will be set to a timestamped named in the current directory, see Bundler#default_archive_path.

Parameters:

  • dataset (String, Integer)

    the ID or slug of an existing Infochimps dataset

  • paths (Array<String, IMW::Resource>)
  • options (Hash) (defaults to: {})

Options Hash (options):

  • fmt (String)

    the format (csv, tsv, xls, &c.) of the data being uploaded

  • archive (String, IMW::Resource)

    the path to the local archive to package the input paths into



32
33
34
35
36
37
38
39
40
41
42
# File 'lib/chimps/workflows/upload/bundler.rb', line 32

def initialize dataset, paths, options={}
  require_imw
  @dataset     = dataset
  self.paths   = paths
  if options[:fmt]
    self.fmt     = options[:fmt]
  end
  if options[:archive]
    self.archive = options[:archive]
  end
end

Instance Attribute Details

#datasetObject

The dataset this bundler is processing data for.



45
46
47
# File 'lib/chimps/workflows/upload/bundler.rb', line 45

def dataset
  @dataset
end

#fmtObject

The format of the data being bundled.

Will make a guess using IMW::Tools::Summarizer if no format is given.



90
91
92
# File 'lib/chimps/workflows/upload/bundler.rb', line 90

def fmt
  @fmt ||= summarizer.most_common_data_format
end

#pathsObject

The paths this bundler is processing.



48
49
50
# File 'lib/chimps/workflows/upload/bundler.rb', line 48

def paths
  @paths
end

#resourcesObject (readonly)

The resources this bundler is processing.

Resources are IMW::Resource objects built from this Bundler’s paths.



54
55
56
# File 'lib/chimps/workflows/upload/bundler.rb', line 54

def resources
  @resources
end

Instance Method Details

#archiveIMW::Resource

The archive this bundler will build for uploading to Infochimps.

Returns:

  • (IMW::Resource)


98
99
100
101
102
# File 'lib/chimps/workflows/upload/bundler.rb', line 98

def archive
  return @archive if @archive
  self.archive = default_archive_path
  self.archive
end

#archive=(path_or_obj) ⇒ Object

Set the path to the archive that will be built.

The given path must represent a compressed file or archive (.tar, .tar.gz., .tar.bz2, .zip, .rar, .bz2, or .gz extension).

Additionally, if multiple local paths are being packaged, the given path must be an archive (not simply .bz2 or .gz extensions).

Parameters:

  • path_or_obj (String, IMW::Resource)

    the obj or IMW::Resource object pointing to the archive to use

Raises:



116
117
118
119
120
121
# File 'lib/chimps/workflows/upload/bundler.rb', line 116

def archive= path_or_obj
  potential_package = IMW.open(path_or_obj)
  raise PackagingError.new("Invalid path #{potential_package}, not an archive or compressed file")        unless potential_package.is_compressed? ||  potential_package.is_archive?
  raise PackagingError.new("Multiple local paths must be packaged in an archive, not a compressed file.") if     resources.size > 1               && !potential_package.is_archive?
  @archive = potential_package
end

#archiverIMW::Tools::Archiver

The IMW::Tools::Archiver responsible for packaging files into a local archive.

Returns:

  • (IMW::Tools::Archiver)


163
164
165
# File 'lib/chimps/workflows/upload/bundler.rb', line 163

def archiver
  @archiver ||= IMW::Tools::Archiver.new(archive.name, paths_to_bundle)
end

#bundle!Object

Bundle the data for this bundler together.

Raises:



148
149
150
151
152
153
# File 'lib/chimps/workflows/upload/bundler.rb', line 148

def bundle!
  return if skip_packaging?
  result = archiver.package(archive.path)
  raise PackagingError.new("Unable to package files for upload.  Temporary files left in #{archiver.tmp_dir}") if result.is_a?(StandardError) || (!archiver.success?)
  archiver.clean!
end

#default_archive_extension'tar.bz2', 'zip'

end zip if the data is less than 500 MB in size and tar.bz2 otherwise.

Returns:

  • ('tar.bz2', 'zip')


207
208
209
# File 'lib/chimps/workflows/upload/bundler.rb', line 207

def default_archive_extension
  summarizer.total_size >= 524288000 ? 'tar.bz2' : 'zip'
end

#default_archive_pathString

The default path to the archive that will be built.

Defaults to a file in the current directory named after the dataset‘s ID or handle and the current time. The package format (.zip or .tar.bz2) is determined by size, see Chimps::Workflows::Uploader#default_archive_extension.

Returns:



198
199
200
201
# File 'lib/chimps/workflows/upload/bundler.rb', line 198

def default_archive_path
  # in current working directory...
  "chimps_#{dataset}-#{Time.now.strftime(Chimps::CONFIG[:timestamp_format])}.#{default_archive_extension}"
end

#icss_urlObject

The URL to the ICSS file for this dataset on Infochimps servers



221
222
223
# File 'lib/chimps/workflows/upload/bundler.rb', line 221

def icss_url
  File.join(Chimps::CONFIG[:site][:host], "datasets", "#{dataset}.yaml")
end

#paths_to_bundleArray<String>

Both the local paths and remote paths to package.

Returns:



228
229
230
# File 'lib/chimps/workflows/upload/bundler.rb', line 228

def paths_to_bundle
  paths + [readme_url, icss_url]
end

#pkg_fmtString

Return the package format of this bundler’s archive, i.e. - its extension.

Returns:



127
128
129
# File 'lib/chimps/workflows/upload/bundler.rb', line 127

def pkg_fmt
  archive.extension
end

#readme_urlString

The URL to the README-infochimps file on Infochimps’ servers.

Returns:



215
216
217
# File 'lib/chimps/workflows/upload/bundler.rb', line 215

def readme_url
  File.join(Chimps::CONFIG[:site][:host], "/README-infochimps")
end

#sizeInteger

Return the total size of the package after aggregating and packaging.

Returns:

  • (Integer)


135
136
137
# File 'lib/chimps/workflows/upload/bundler.rb', line 135

def size
  archive.size
end

#skip_packaging?true, false

Should the packaging step be skipped?

This will happen if only one local input path was provided and it exists and is a compressed file or archive.

Returns:

  • (true, false)


181
182
183
# File 'lib/chimps/workflows/upload/bundler.rb', line 181

def skip_packaging?
  !! @skip_packaging
end

#summarizerIMW::Tools::Summarizer

Return the summarizer responsible for summarizing data on this upload.

Returns:

  • (IMW::Tools::Summarizer)


171
172
173
# File 'lib/chimps/workflows/upload/bundler.rb', line 171

def summarizer
  @summarizer ||= IMW::Tools::Summarizer.new(resources)
end

#summaryHash

Return summary information about the package prepared by the bundler.

Returns:



143
144
145
# File 'lib/chimps/workflows/upload/bundler.rb', line 143

def summary
  summarizer.summary
end