Module: MiGA::Project::Dataset
- Included in:
- MiGA::Project
- Defined in:
- lib/miga/project/dataset.rb
Overview
Helper module including specific functions handle datasets.
Instance Method Summary collapse
-
#add_dataset(name) ⇒ Object
Add dataset identified by
name
and return MiGA::Dataset. -
#dataset(name) ⇒ Object
Returns MiGA::Dataset.
-
#dataset_names ⇒ Object
Returns Array of String (without evaluating dataset objects).
-
#dataset_names_hash ⇒ Object
Returns Hash of Strings => true.
-
#datasets ⇒ Object
Returns Array of MiGA::Dataset.
-
#done_preprocessing?(save = true) ⇒ Boolean
Are all the datasets in the project preprocessed? Save intermediate results if
save
(until the first incomplete dataset is reached). -
#each_dataset(&blk) ⇒ Object
Iterate through datasets, with one or two variables passed to
blk
. -
#each_dataset_profile_advance(&blk) ⇒ Object
Call
blk
passing the result of MiGA::Dataset#profile_advance for each registered dataset. -
#import_dataset(ds, method = :hardlink) ⇒ Object
Import the dataset
ds
, a MiGA::Dataset, usingmethod
which is any method supported by File#generic_transfer. -
#profile_datasets_advance ⇒ Object
Returns a two-dimensional matrix (Array of Array) where the first index corresponds to the dataset, the second index corresponds to the dataset task, and the value corresponds to: - 0: Before execution.
-
#unlink_dataset(name) ⇒ Object
Unlink dataset identified by
name
and return MiGA::Dataset. -
#unregistered_datasets ⇒ Object
Find all datasets with (potential) result files but are yet unregistered.
Instance Method Details
#add_dataset(name) ⇒ Object
Add dataset identified by name
and return MiGA::Dataset.
53 54 55 56 57 58 59 60 |
# File 'lib/miga/project/dataset.rb', line 53 def add_dataset(name) unless [:datasets].include? name MiGA::Dataset.new(self, name) @metadata[:datasets] << name save end dataset(name) end |
#dataset(name) ⇒ Object
Returns MiGA::Dataset.
29 30 31 32 33 34 35 |
# File 'lib/miga/project/dataset.rb', line 29 def dataset(name) name = name.miga_name return nil unless MiGA::Dataset.exist?(self, name) @datasets ||= {} @datasets[name] ||= MiGA::Dataset.new(self, name) @datasets[name] end |
#dataset_names ⇒ Object
Returns Array of String (without evaluating dataset objects).
16 17 18 |
# File 'lib/miga/project/dataset.rb', line 16 def dataset_names [:datasets] end |
#dataset_names_hash ⇒ Object
Returns Hash of Strings => true. Similar to dataset_names
but as Hash for efficiency.
23 24 25 |
# File 'lib/miga/project/dataset.rb', line 23 def dataset_names_hash @dataset_names_hash ||= Hash[dataset_names.map{ |i| [i,true] }] end |
#datasets ⇒ Object
Returns Array of MiGA::Dataset.
10 11 12 |
# File 'lib/miga/project/dataset.rb', line 10 def datasets [:datasets].map{ |name| dataset(name) } end |
#done_preprocessing?(save = true) ⇒ Boolean
Are all the datasets in the project preprocessed? Save intermediate results if save
(until the first incomplete dataset is reached).
123 124 125 126 127 128 129 |
# File 'lib/miga/project/dataset.rb', line 123 def done_preprocessing?(save=true) dataset_names.each do |dn| ds = dataset(dn) return false if ds.is_ref? and not ds.done_preprocessing?(save) end true end |
#each_dataset(&blk) ⇒ Object
Iterate through datasets, with one or two variables passed to blk
. If one, the dataset MiGA::Dataset object is passed. If two, the name and the dataset object are passed.
41 42 43 44 45 46 47 48 49 |
# File 'lib/miga/project/dataset.rb', line 41 def each_dataset(&blk) [:datasets].each do |name| if blk.arity == 1 blk.call(dataset(name)) else blk.call(name, dataset(name)) end end end |
#each_dataset_profile_advance(&blk) ⇒ Object
Call blk
passing the result of MiGA::Dataset#profile_advance for each registered dataset.
149 150 151 |
# File 'lib/miga/project/dataset.rb', line 149 def each_dataset_profile_advance(&blk) each_dataset { |ds| blk.call(ds.profile_advance) } end |
#import_dataset(ds, method = :hardlink) ⇒ Object
Import the dataset ds
, a MiGA::Dataset, using method
which is any method supported by File#generic_transfer.
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
# File 'lib/miga/project/dataset.rb', line 75 def import_dataset(ds, method=:hardlink) raise "Impossible to import dataset, it already exists: #{ds.name}." if MiGA::Dataset.exist?(self, ds.name) # Import dataset results ds.each_result do |task, result| # import result files result.each_file do |file| File.generic_transfer("#{result.dir}/#{file}", "#{path}/data/#{MiGA::Dataset.RESULT_DIRS[task]}/#{file}", method) end # import result metadata %w(json start done).each do |suffix| if File.exist? "#{result.dir}/#{ds.name}.#{suffix}" File.generic_transfer("#{result.dir}/#{ds.name}.#{suffix}", "#{path}/data/#{MiGA::Dataset.RESULT_DIRS[task]}/" + "#{ds.name}.#{suffix}", method) end end end # Import dataset metadata File.generic_transfer("#{ds.project.path}/metadata/#{ds.name}.json", "#{self.path}/metadata/#{ds.name}.json", method) # Save dataset self.add_dataset(ds.name) end |
#profile_datasets_advance ⇒ Object
Returns a two-dimensional matrix (Array of Array) where the first index corresponds to the dataset, the second index corresponds to the dataset task, and the value corresponds to:
-
0: Before execution.
-
1: Done (or not required).
-
2: To do.
138 139 140 141 142 143 144 |
# File 'lib/miga/project/dataset.rb', line 138 def profile_datasets_advance advance = [] self.each_dataset_profile_advance do |ds_adv| advance << ds_adv end advance end |
#unlink_dataset(name) ⇒ Object
Unlink dataset identified by name
and return MiGA::Dataset.
64 65 66 67 68 69 70 |
# File 'lib/miga/project/dataset.rb', line 64 def unlink_dataset(name) d = dataset(name) return nil if d.nil? self.[:datasets].delete(name) save d end |
#unregistered_datasets ⇒ Object
Find all datasets with (potential) result files but are yet unregistered.
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
# File 'lib/miga/project/dataset.rb', line 103 def unregistered_datasets datasets = [] MiGA::Dataset.RESULT_DIRS.values.each do |dir| dir_p = "#{path}/data/#{dir}" next unless Dir.exist? dir_p Dir.entries(dir_p).each do |file| next unless file =~ %r{ \.(fa(a|sta|stqc?)?|fna|solexaqa|gff[23]?|done|ess)(\.gz)?$ }x m = /([^\.]+)/.match(file) datasets << m[1] unless m.nil? or m[1] == "miga-project" end end datasets.uniq - [:datasets] end |