Class: MiGA::Dataset

Inherits:
MiGA
  • Object
show all
Includes:
Result
Defined in:
lib/miga/dataset/base.rb,
lib/miga/dataset.rb

Overview

Dataset representation in MiGA.

Defined Under Namespace

Modules: Base, Result

Constant Summary

Constants included from MiGA

CITATION, VERSION, VERSION_DATE, VERSION_NAME

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Result

#add_result, #cleanup_distances!, #done_preprocessing?, #each_result, #first_preprocessing, #get_result, #next_preprocessing, #profile_advance, #result, #results

Methods inherited from MiGA

CITATION, DEBUG, DEBUG_OFF, DEBUG_ON, DEBUG_TRACE_OFF, DEBUG_TRACE_ON, FULL_VERSION, LONG_VERSION, VERSION, VERSION_DATE, clean_fasta_file, initialized?, #result_files_exist?, root_path, script_path, seqs_length, tabulate

Constructor Details

#initialize(project, name, is_ref = true, metadata = {}) ⇒ Dataset

Create a MiGA::Dataset object in a project MiGA::Project with a uniquely identifying name. is_ref indicates if the dataset is to be treated as reference (true, default) or query (false). Pass any additional metadata as a Hash.



50
51
52
53
54
55
56
57
58
# File 'lib/miga/dataset.rb', line 50

def initialize(project, name, is_ref=true, ={})
  raise "Invalid name '#{name}', please use only alphanumerics and " +
    "underscores." unless name.miga_name?
  @project = project
  @name = name
  [:ref] = is_ref
  @metadata = MiGA::Metadata.new(
    File.expand_path("metadata/#{name}.json", project.path),  )
end

Instance Attribute Details

#metadataObject (readonly)

MiGA::Metadata with information about the dataset.



43
44
45
# File 'lib/miga/dataset.rb', line 43

def 
  @metadata
end

#nameObject (readonly)

Datasets are uniquely identified by name in a project.



39
40
41
# File 'lib/miga/dataset.rb', line 39

def name
  @name
end

#projectObject (readonly)

MiGA::Project that contains the dataset.



35
36
37
# File 'lib/miga/dataset.rb', line 35

def project
  @project
end

Class Method Details

.exist?(project, name) ⇒ Boolean

Does the project already have a dataset with that name?

Returns:

  • (Boolean)


19
20
21
# File 'lib/miga/dataset.rb', line 19

def exist?(project, name)
  project.dataset_names.include? name
end

.INFO_FIELDSObject

Standard fields of metadata for datasets.



25
26
27
# File 'lib/miga/dataset.rb', line 25

def INFO_FIELDS
  %w(name created updated type ref user description comments)
end

.KNOWN_TYPESObject



9
# File 'lib/miga/dataset/base.rb', line 9

def KNOWN_TYPES ; @@KNOWN_TYPES ; end

.PREPROCESSING_TASKSObject



10
# File 'lib/miga/dataset/base.rb', line 10

def PREPROCESSING_TASKS ; @@PREPROCESSING_TASKS ; end

.RESULT_DIRSObject



8
# File 'lib/miga/dataset/base.rb', line 8

def RESULT_DIRS ; @@RESULT_DIRS ; end

Instance Method Details

#closest_relatives(how_many = 1, ref_project = false) ⇒ Object

Returns an Array of how_many duples (Arrays) sorted by AAI:

  • 0: A String with the name(s) of the reference dataset.

  • 1: A Float with the AAI.

This function is currently only supported for query datasets when ref_project is false (default), and only for reference dataset when ref_project is true. It returns nil if this analysis is not supported.



130
131
132
133
134
135
136
137
# File 'lib/miga/dataset.rb', line 130

def closest_relatives(how_many=1, ref_project=false)
  return nil if (is_ref? != ref_project) or is_multi?
  r = result(ref_project ? :taxonomy : :distances)
  return nil if r.nil?
  db = SQLite3::Database.new(r.file_path :aai_db)
  db.execute("SELECT seq2, aai FROM aai WHERE seq2 != ? " +
    "GROUP BY seq2 ORDER BY aai DESC LIMIT ?", [name, how_many])
end

#ignore_task?(task) ⇒ Boolean

Should I ignore task for this dataset?

Returns:

  • (Boolean)


114
115
116
117
118
119
120
121
# File 'lib/miga/dataset.rb', line 114

def ignore_task?(task)
  return !["run_#{task}"] unless ["run_#{task}"].nil?
  return true if task==:taxonomy and project.[:ref_project].nil?
  pattern = [true, false]
  ( [@@_EXCLUDE_NOREF_TASKS_H[task], is_ref?     ]==pattern or
    [@@_ONLY_MULTI_TASKS_H[task],    is_multi?   ]==pattern or
    [@@_ONLY_NONMULTI_TASKS_H[task], is_nonmulti?]==pattern )
end

#infoObject

Get standard metadata values for the dataset as Array.



82
83
84
85
86
# File 'lib/miga/dataset.rb', line 82

def info
  MiGA::Dataset.INFO_FIELDS.map do |k|
    (k=="name") ? self.name : [k.to_sym]
  end
end

#is_multi?Boolean

Is this dataset known to be multi-organism?

Returns:

  • (Boolean)


98
99
100
101
102
# File 'lib/miga/dataset.rb', line 98

def is_multi?
  return false if [:type].nil? or
    @@KNOWN_TYPES[type].nil?
  @@KNOWN_TYPES[type][:multi]
end

#is_nonmulti?Boolean

Is this dataset known to be single-organism?

Returns:

  • (Boolean)


106
107
108
109
110
# File 'lib/miga/dataset.rb', line 106

def is_nonmulti?
  return false if [:type].nil? or
    @@KNOWN_TYPES[type].nil?
  !@@KNOWN_TYPES[type][:multi]
end

#is_query?Boolean

Is this dataset a query (non-reference)?

Returns:

  • (Boolean)


94
# File 'lib/miga/dataset.rb', line 94

def is_query? ; ![:ref] ; end

#is_ref?Boolean

Is this dataset a reference?

Returns:

  • (Boolean)


90
# File 'lib/miga/dataset.rb', line 90

def is_ref? ; !![:ref] ; end

#remove!Object

Delete the dataset with all it’s contents (including results) and returns nil.



75
76
77
78
# File 'lib/miga/dataset.rb', line 75

def remove!
  self.results.each{ |r| r.remove! }
  self..remove!
end

#saveObject

Save any changes you’ve made in the dataset.



62
63
64
65
66
# File 'lib/miga/dataset.rb', line 62

def save
  self.[:type] = :metagenome if ![:tax].nil? and
    ![:tax][:ns].nil? and [:tax][:ns]=="COMMUNITY"
  self..save
end

#typeObject

Get the type of dataset as Symbol.



70
# File 'lib/miga/dataset.rb', line 70

def type ; [:type] ; end