Module: MiGA::Project::Dataset

Included in:
MiGA::Project
Defined in:
lib/miga/project/dataset.rb

Overview

Helper module including specific functions handle datasets.

Instance Method Summary collapse

Instance Method Details

#add_dataset(name) ⇒ Object

Add dataset identified by name and return MiGA::Dataset.



53
54
55
56
57
58
59
60
# File 'lib/miga/project/dataset.rb', line 53

def add_dataset(name)
  unless [:datasets].include? name
    MiGA::Dataset.new(self, name)
    @metadata[:datasets] << name
    save
  end
  dataset(name)
end

#dataset(name) ⇒ Object

Returns MiGA::Dataset.



29
30
31
32
33
34
35
# File 'lib/miga/project/dataset.rb', line 29

def dataset(name)
  name = name.miga_name
  return nil unless MiGA::Dataset.exist?(self, name)
  @datasets ||= {}
  @datasets[name] ||= MiGA::Dataset.new(self, name)
  @datasets[name]
end

#dataset_namesObject

Returns Array of String (without evaluating dataset objects).



16
17
18
# File 'lib/miga/project/dataset.rb', line 16

def dataset_names
  [:datasets]
end

#dataset_names_hashObject

Returns Hash of Strings => true. Similar to dataset_names but as Hash for efficiency.



23
24
25
# File 'lib/miga/project/dataset.rb', line 23

def dataset_names_hash
  @dataset_names_hash ||= Hash[dataset_names.map{ |i| [i,true] }]
end

#datasetsObject

Returns Array of MiGA::Dataset.



10
11
12
# File 'lib/miga/project/dataset.rb', line 10

def datasets
  [:datasets].map{ |name| dataset(name) }
end

#done_preprocessing?(save = true) ⇒ Boolean

Are all the datasets in the project preprocessed? Save intermediate results if save (until the first incomplete dataset is reached).

Returns:

  • (Boolean)


123
124
125
126
127
128
129
# File 'lib/miga/project/dataset.rb', line 123

def done_preprocessing?(save=true)
  dataset_names.each do |dn|
    ds = dataset(dn)
    return false if ds.is_ref? and not ds.done_preprocessing?(save)
  end
  true
end

#each_dataset(&blk) ⇒ Object

Iterate through datasets, with one or two variables passed to blk. If one, the dataset MiGA::Dataset object is passed. If two, the name and the dataset object are passed.



41
42
43
44
45
46
47
48
49
# File 'lib/miga/project/dataset.rb', line 41

def each_dataset(&blk)
  [:datasets].each do |name|
    if blk.arity == 1
      blk.call(dataset(name))
    else
      blk.call(name, dataset(name))
    end
  end
end

#each_dataset_profile_advance(&blk) ⇒ Object

Call blk passing the result of MiGA::Dataset#profile_advance for each registered dataset.



149
150
151
# File 'lib/miga/project/dataset.rb', line 149

def each_dataset_profile_advance(&blk)
  each_dataset { |ds| blk.call(ds.profile_advance) }
end

#import_dataset(ds, method = :hardlink) ⇒ Object

Import the dataset ds, a MiGA::Dataset, using method which is any method supported by File#generic_transfer.



75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
# File 'lib/miga/project/dataset.rb', line 75

def import_dataset(ds, method=:hardlink)
  raise "Impossible to import dataset, it already exists: #{ds.name}." if
    MiGA::Dataset.exist?(self, ds.name)
  # Import dataset results
  ds.each_result do |task, result|
    # import result files
    result.each_file do |file|
      File.generic_transfer("#{result.dir}/#{file}",
        "#{path}/data/#{MiGA::Dataset.RESULT_DIRS[task]}/#{file}", method)
    end
    # import result metadata
    %w(json start done).each do |suffix|
      if File.exist? "#{result.dir}/#{ds.name}.#{suffix}"
        File.generic_transfer("#{result.dir}/#{ds.name}.#{suffix}",
          "#{path}/data/#{MiGA::Dataset.RESULT_DIRS[task]}/" +
                     "#{ds.name}.#{suffix}", method)
      end
    end
  end
  # Import dataset metadata
  File.generic_transfer("#{ds.project.path}/metadata/#{ds.name}.json",
    "#{self.path}/metadata/#{ds.name}.json", method)
  # Save dataset
  self.add_dataset(ds.name)
end

#profile_datasets_advanceObject

Returns a two-dimensional matrix (Array of Array) where the first index corresponds to the dataset, the second index corresponds to the dataset task, and the value corresponds to:

  • 0: Before execution.

  • 1: Done (or not required).

  • 2: To do.



138
139
140
141
142
143
144
# File 'lib/miga/project/dataset.rb', line 138

def profile_datasets_advance
  advance = []
  self.each_dataset_profile_advance do |ds_adv|
    advance << ds_adv
  end
  advance
end

Unlink dataset identified by name and return MiGA::Dataset.



64
65
66
67
68
69
70
# File 'lib/miga/project/dataset.rb', line 64

def unlink_dataset(name)
  d = dataset(name)
  return nil if d.nil?
  self.[:datasets].delete(name)
  save
  d
end

#unregistered_datasetsObject

Find all datasets with (potential) result files but are yet unregistered.



103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
# File 'lib/miga/project/dataset.rb', line 103

def unregistered_datasets
  datasets = []
  MiGA::Dataset.RESULT_DIRS.values.each do |dir|
    dir_p = "#{path}/data/#{dir}"
    next unless Dir.exist? dir_p
    Dir.entries(dir_p).each do |file|
      next unless
        file =~ %r{
          \.(fa(a|sta|stqc?)?|fna|solexaqa|gff[23]?|done|ess)(\.gz)?$
        }x
      m = /([^\.]+)/.match(file)
      datasets << m[1] unless m.nil? or m[1] == "miga-project"
    end
  end
  datasets.uniq - [:datasets]
end