Class: MiGA::Dataset
- Includes:
- DatasetResult
- Defined in:
- lib/miga/dataset.rb
Overview
Dataset representation in MiGA.
Constant Summary collapse
- @@RESULT_DIRS =
{ # Preprocessing raw_reads: "01.raw_reads", trimmed_reads: "02.trimmed_reads", read_quality: "03.read_quality", trimmed_fasta: "04.trimmed_fasta", assembly: "05.assembly", cds: "06.cds", # Annotation essential_genes: "07.annotation/01.function/01.essential", ssu: "07.annotation/01.function/02.ssu", mytaxa: "07.annotation/02.taxonomy/01.mytaxa", mytaxa_scan: "07.annotation/03.qa/02.mytaxa_scan", # Mapping mapping_on_contigs: "08.mapping/01.read-ctg", mapping_on_genes: "08.mapping/02.read-gene", # Distances (for single-species datasets) distances: "09.distances", # General statistics stats: "90.stats" }
- @@KNOWN_TYPES =
{ genome: {description: "The genome from an isolate.", multi: false}, metagenome: {description: "A metagenome (excluding viromes).", multi: true}, virome: {description: "A viral metagenome.", multi: true}, scgenome: {description: "A Single-cell Genome Amplification (SGA).", multi: false}, popgenome: {description: "A population genome (including " + "metagenomic bins).", :multi=>false} }
- @@PREPROCESSING_TASKS =
[:raw_reads, :trimmed_reads, :read_quality, :trimmed_fasta, :assembly, :cds, :essential_genes, :ssu, :mytaxa, :mytaxa_scan, :distances, :stats]
- @@EXCLUDE_NOREF_TASKS =
Tasks to be excluded from query datasets.
[:mytaxa_scan]
- @@ONLY_NONMULTI_TASKS =
Tasks to be executed only in datasets that are not multi-organism. These tasks are ignored for multi-organism datasets or for unknown types.
[:mytaxa_scan, :distances]
- @@ONLY_MULTI_TASKS =
Tasks to be executed only in datasets that are multi-organism. These tasks are ignored for single-organism datasets or for unknwon types.
[:mytaxa]
Constants included from MiGA
CITATION, VERSION, VERSION_DATE, VERSION_NAME
Instance Attribute Summary collapse
-
#metadata ⇒ Object
readonly
MiGA::Metadata with information about the dataset.
-
#name ⇒ Object
readonly
Datasets are uniquely identified by
name
in a project. -
#project ⇒ Object
readonly
MiGA::Project that contains the dataset.
Class Method Summary collapse
-
.exist?(project, name) ⇒ Boolean
Does the
project
already have a dataset with thatname
?. -
.INFO_FIELDS ⇒ Object
Standard fields of metadata for datasets.
-
.KNOWN_TYPES ⇒ Object
Supported dataset types.
-
.PREPROCESSING_TASKS ⇒ Object
Returns an Array of tasks to be executed before project-wide tasks.
-
.RESULT_DIRS ⇒ Object
Directories containing the results from dataset-specific tasks.
Instance Method Summary collapse
-
#add_result(result_type, save = true) ⇒ Object
Look for the result with symbol key
result_type
and register it in the dataset. -
#done_preprocessing?(save = false) ⇒ Boolean
Are all the dataset-specific tasks done? Passes
save
to #add_result. -
#each_result(&blk) ⇒ Object
For each result executes the 2-ary
blk
block: key symbol and MiGA::Result. -
#first_preprocessing(save = false) ⇒ Object
Returns the key symbol of the first registered result (sorted by the execution order).
-
#get_result(result_type) ⇒ Object
Gets a result as MiGA::Result for the datasets with
result_type
. -
#ignore_task?(task) ⇒ Boolean
Should I ignore
task
for this dataset?. -
#info ⇒ Object
Get standard metadata values for the dataset as Array.
-
#initialize(project, name, is_ref = true, metadata = {}) ⇒ Dataset
constructor
Create a MiGA::Dataset object in a
project
MiGA::Project with a uniquely identifyingname
. -
#is_multi? ⇒ Boolean
Is this dataset known to be multi-organism?.
-
#is_nonmulti? ⇒ Boolean
Is this dataset known to be single-organism?.
-
#is_ref? ⇒ Boolean
Is this dataset a reference?.
-
#next_preprocessing(save = false) ⇒ Object
Returns the key symbol of the next task that needs to be executed.
-
#profile_advance(save = false) ⇒ Object
Returns an array indicating the stage of each task (sorted by execution order).
-
#remove! ⇒ Object
Delete the dataset with all it’s contents (including results) and returns nil.
-
#result(k) ⇒ Object
Get the result MiGA::Result in this dataset identified by the symbol
k
. -
#results ⇒ Object
Get all the results (Array of MiGA::Result) in this dataset.
-
#save ⇒ Object
Save any changes you’ve made in the dataset.
-
#type ⇒ Object
Get the type of dataset as Symbol.
Methods included from DatasetResult
Methods inherited from MiGA
CITATION, DEBUG, DEBUG_OFF, DEBUG_ON, DEBUG_TRACE_OFF, DEBUG_TRACE_ON, FULL_VERSION, LONG_VERSION, VERSION, VERSION_DATE, clean_fasta_file, initialized?, #result_files_exist?, root_path, tabulate
Constructor Details
#initialize(project, name, is_ref = true, metadata = {}) ⇒ Dataset
Create a MiGA::Dataset object in a project
MiGA::Project with a uniquely identifying name
. is_ref
indicates if the dataset is to be treated as reference (true, default) or query (false). Pass any additional metadata
as a Hash.
104 105 106 107 108 109 110 111 112 113 114 |
# File 'lib/miga/dataset.rb', line 104 def initialize(project, name, is_ref=true, ={}) raise "Invalid name '#{name}', please use only alphanumerics and " + "underscores." unless name.miga_name? @project = project @name = name [:ref] = is_ref @metadata = MiGA::Metadata.new( File.("metadata/#{name}.json", project.path), ) warn "Warning: Unrecognized dataset type: #{type}." if !type.nil? and @@KNOWN_TYPES[type].nil? end |
Instance Attribute Details
#metadata ⇒ Object (readonly)
MiGA::Metadata with information about the dataset.
97 98 99 |
# File 'lib/miga/dataset.rb', line 97 def @metadata end |
#name ⇒ Object (readonly)
Datasets are uniquely identified by name
in a project.
93 94 95 |
# File 'lib/miga/dataset.rb', line 93 def name @name end |
#project ⇒ Object (readonly)
MiGA::Project that contains the dataset.
89 90 91 |
# File 'lib/miga/dataset.rb', line 89 def project @project end |
Class Method Details
.exist?(project, name) ⇒ Boolean
Does the project
already have a dataset with that name
?
75 76 77 |
# File 'lib/miga/dataset.rb', line 75 def self.exist?(project, name) File.exist? project.path + "/metadata/" + name + ".json" end |
.INFO_FIELDS ⇒ Object
Standard fields of metadata for datasets.
81 82 83 |
# File 'lib/miga/dataset.rb', line 81 def self.INFO_FIELDS %w(name created updated type ref user description comments) end |
.KNOWN_TYPES ⇒ Object
Supported dataset types.
40 |
# File 'lib/miga/dataset.rb', line 40 def self.KNOWN_TYPES ; @@KNOWN_TYPES end |
.PREPROCESSING_TASKS ⇒ Object
Returns an Array of tasks to be executed before project-wide tasks.
54 |
# File 'lib/miga/dataset.rb', line 54 def self.PREPROCESSING_TASKS ; @@PREPROCESSING_TASKS ; end |
.RESULT_DIRS ⇒ Object
Directories containing the results from dataset-specific tasks.
18 |
# File 'lib/miga/dataset.rb', line 18 def self.RESULT_DIRS ; @@RESULT_DIRS end |
Instance Method Details
#add_result(result_type, save = true) ⇒ Object
Look for the result with symbol key result_type
and register it in the dataset. If save
is false, it doesn’t register the result, but it still returns a result if the expected files are complete. Returns MiGA::Result or nil.
189 190 191 192 193 194 195 196 197 198 199 |
# File 'lib/miga/dataset.rb', line 189 def add_result(result_type, save=true) return nil if @@RESULT_DIRS[result_type].nil? base = File.("data/#{@@RESULT_DIRS[result_type]}/#{name}", project.path) r_pre = MiGA::Result.load("#{base}.json") return r_pre if (r_pre.nil? and not save) or not r_pre.nil? return nil unless result_files_exist?(base, ".done") r = self.send("add_result_#{result_type}", base) r.save unless r.nil? r end |
#done_preprocessing?(save = false) ⇒ Boolean
Are all the dataset-specific tasks done? Passes save
to #add_result.
242 243 244 |
# File 'lib/miga/dataset.rb', line 242 def done_preprocessing?(save=false) !first_preprocessing(save).nil? and next_preprocessing(save).nil? end |
#each_result(&blk) ⇒ Object
For each result executes the 2-ary blk
block: key symbol and MiGA::Result.
178 179 180 181 182 |
# File 'lib/miga/dataset.rb', line 178 def each_result(&blk) @@RESULT_DIRS.keys.each do |k| blk.call(k, result(k)) unless result(k).nil? end end |
#first_preprocessing(save = false) ⇒ Object
Returns the key symbol of the first registered result (sorted by the execution order). This typically corresponds to the result used as the initial input. Passes save
to #add_result.
210 211 212 213 214 |
# File 'lib/miga/dataset.rb', line 210 def first_preprocessing(save=false) @@PREPROCESSING_TASKS.find do |t| not ignore_task?(t) and not add_result(t, save).nil? end end |
#get_result(result_type) ⇒ Object
Gets a result as MiGA::Result for the datasets with result_type
. This is equivalent to add_result(result_type, false).
204 |
# File 'lib/miga/dataset.rb', line 204 def get_result(result_type) ; add_result(result_type, false) ; end |
#ignore_task?(task) ⇒ Boolean
Should I ignore task
for this dataset?
233 234 235 236 237 238 |
# File 'lib/miga/dataset.rb', line 233 def ignore_task?(task) return !["run_#{task}"] unless ["run_#{task}"].nil? ( (@@EXCLUDE_NOREF_TASKS.include?(task) and not is_ref?) or (@@ONLY_MULTI_TASKS.include?(task) and not is_multi?) or (@@ONLY_NONMULTI_TASKS.include?(task) and not is_nonmulti?)) end |
#info ⇒ Object
Get standard metadata values for the dataset as Array.
138 139 140 141 142 |
# File 'lib/miga/dataset.rb', line 138 def info MiGA::Dataset.INFO_FIELDS.map do |k| (k=="name") ? self.name : self.[k.to_sym] end end |
#is_multi? ⇒ Boolean
Is this dataset known to be multi-organism?
150 151 152 153 154 |
# File 'lib/miga/dataset.rb', line 150 def is_multi? return false if self.[:type].nil? or @@KNOWN_TYPES[self.[:type]].nil? @@KNOWN_TYPES[self.[:type]][:multi] end |
#is_nonmulti? ⇒ Boolean
Is this dataset known to be single-organism?
158 159 160 161 162 |
# File 'lib/miga/dataset.rb', line 158 def is_nonmulti? return false if self.[:type].nil? or @@KNOWN_TYPES[self.[:type]].nil? !@@KNOWN_TYPES[self.[:type]][:multi] end |
#is_ref? ⇒ Boolean
Is this dataset a reference?
146 |
# File 'lib/miga/dataset.rb', line 146 def is_ref? ; !!self.[:ref] ; end |
#next_preprocessing(save = false) ⇒ Object
Returns the key symbol of the next task that needs to be executed. Passes save
to #add_result.
219 220 221 222 223 224 225 226 227 228 229 |
# File 'lib/miga/dataset.rb', line 219 def next_preprocessing(save=false) after_first = false first = first_preprocessing(save) return nil if first.nil? @@PREPROCESSING_TASKS.each do |t| next if ignore_task? t return t if after_first and add_result(t, save).nil? after_first = (after_first or (t==first)) end nil end |
#profile_advance(save = false) ⇒ Object
Returns an array indicating the stage of each task (sorted by execution order). The values are integers:
-
0 for an undefined result (a task before the initial input).
-
1 for a registered result (a completed task).
-
2 for a queued result (a task yet to be executed).
It passes save
to #add_result
253 254 255 256 257 258 259 260 261 262 263 264 265 |
# File 'lib/miga/dataset.rb', line 253 def profile_advance(save=false) first_task = first_preprocessing(save) return Array.new(@@PREPROCESSING_TASKS.size, 0) if first_task.nil? adv = [] state = 0 next_task = next_preprocessing(save) @@PREPROCESSING_TASKS.each do |task| state = 1 if first_task==task state = 2 if !next_task.nil? and next_task==task adv << state end adv end |
#remove! ⇒ Object
Delete the dataset with all it’s contents (including results) and returns nil.
131 132 133 134 |
# File 'lib/miga/dataset.rb', line 131 def remove! self.results.each{ |r| r.remove! } self..remove! end |
#result(k) ⇒ Object
Get the result MiGA::Result in this dataset identified by the symbol k
.
166 167 168 169 170 |
# File 'lib/miga/dataset.rb', line 166 def result(k) return nil if @@RESULT_DIRS[k.to_sym].nil? MiGA::Result.load(project.path + "/data/" + @@RESULT_DIRS[k.to_sym] + "/" + name + ".json") end |
#results ⇒ Object
Get all the results (Array of MiGA::Result) in this dataset.
174 |
# File 'lib/miga/dataset.rb', line 174 def results ; @@RESULT_DIRS.keys.map{ |k| result k }.compact ; end |
#save ⇒ Object
Save any changes you’ve made in the dataset.
118 119 120 121 122 |
# File 'lib/miga/dataset.rb', line 118 def save self.[:type] = :metagenome if ![:tax].nil? and ![:tax][:ns].nil? and [:tax][:ns]=="COMMUNITY" self..save end |
#type ⇒ Object
Get the type of dataset as Symbol.
126 |
# File 'lib/miga/dataset.rb', line 126 def type ; [:type] ; end |