Class: GeneValidator::LengthClusterValidation

Inherits:
ValidationTest show all
Defined in:
lib/genevalidator/validation_length_cluster.rb

Overview

This class contains the methods necessary for length validation by hit length clusterization

Instance Attribute Summary collapse

Attributes inherited from ValidationTest

#cli_name, #description, #header, #hits, #prediction, #run_time, #short_header, #type, #validation_report

Instance Method Summary collapse

Constructor Details

#initialize(prediction, hits) ⇒ LengthClusterValidation

Initilizes the object Params: type: type of the predicted sequence (:nucleotide or :protein) prediction: a Sequence object representing the blast query hits: a vector of Sequence objects (representing blast hits) dilename: String with the name of the fasta file



80
81
82
83
84
85
86
87
88
89
# File 'lib/genevalidator/validation_length_cluster.rb', line 80

def initialize(prediction, hits)
  super
  @short_header = 'LengthCluster'
  @header       = 'Length Cluster'
  @description  = 'Check whether the prediction length fits most of the' \
                  ' BLAST hit lengths, by 1D hierarchical clusterization.' \
                  ' Meaning of the output displayed: Query_length' \
                  ' [Main Cluster Length Interval]'
  @cli_name     = 'lenc'
end

Instance Attribute Details

#clustersObject (readonly)

Returns the value of attribute clusters.



70
71
72
# File 'lib/genevalidator/validation_length_cluster.rb', line 70

def clusters
  @clusters
end

#max_density_clusterObject (readonly)

Returns the value of attribute max_density_cluster.



71
72
73
# File 'lib/genevalidator/validation_length_cluster.rb', line 71

def max_density_cluster
  @max_density_cluster
end

Instance Method Details

#clusterization_by_length(_debug = false, lst = @hits, predicted_seq = @prediction) ⇒ Object

Clusterization by length from a list of sequences Params:

debug (optional)

true to display debug information, false by default

lst

array of Query objects

predicted_seq

Query objetc

Output

output 1

array of Cluster objects

output 2

the index of the most dense cluster



143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
# File 'lib/genevalidator/validation_length_cluster.rb', line 143

def clusterization_by_length(_debug = false,
                             lst = @hits,
                             predicted_seq = @prediction)
  fail TypeError unless lst[0].is_a?(Query) && predicted_seq.is_a?(Query)

  contents = lst.map { |x| x.length_protein.to_i }.sort { |a, b| a <=> b }

  hc = HierarchicalClusterization.new(contents)
  clusters = hc.hierarchical_clusterization

  max_density             = 0
  max_density_cluster_idx = 0
  clusters.each_with_index do |item, i|
    next unless item.density > max_density
    max_density             = item.density
    max_density_cluster_idx = i
  end

  [clusters, max_density_cluster_idx]

rescue TypeError => error
  error_location = error.backtrace[0].scan(%r{([^/]+:\d+):.*})[0][0]
  $stderr.puts "Type error at #{error_location}."
  $stderr.puts ' Possible cause: one of the arguments of the' \
               ' "clusterization_by_length" method has not the proper type.'
  exit 1
end

#plot_histo_clusters(clusters = @clusters, max_density_cluster = @max_density_cluster, prediction = @prediction) ⇒ Object

Generates a json file containing data used for plotting the histogram of the length distribution given a lust of Cluster objects output: plot_path where to save the graph clusters: array of Cluster objects max_density_cluster: index of the most dense cluster prediction: Sequence object Output: Plot object



180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
# File 'lib/genevalidator/validation_length_cluster.rb', line 180

def plot_histo_clusters(clusters = @clusters,
                        max_density_cluster = @max_density_cluster,
                        prediction = @prediction)

  data = clusters.each_with_index.map { |cluster, i|
    cluster.lengths.collect { |k, v|
      { 'key' => k, 'value' => v, 'main' => (i == max_density_cluster) }
    }
  }

  Plot.new(data,
           :bars,
           'Length Cluster Validation: Distribution of BLAST hit lengths',
           'Query Sequence, black;Most Dense Cluster,red;Other Hits, blue',
           'Sequence Length',
           'Number of Sequences',
           prediction.length_protein)
end

#runObject

Validates the length of the predicted gene by comparing the length of the prediction to the most dense cluster The most dense cluster is obtained by hierarchical clusterization Plots are generated if required (see plot variable) Output: LengthClusterValidationOutput object



98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
# File 'lib/genevalidator/validation_length_cluster.rb', line 98

def run
  fail NotEnoughHitsError unless hits.length >= 5
  fail unless prediction.is_a?(Query) && hits[0].is_a?(Query)

  start = Time.now
  # get [clusters, max_density_cluster_idx]
  clusterization = clusterization_by_length

  @clusters = clusterization[0]
  @max_density_cluster = clusterization[1]
  limits = @clusters[@max_density_cluster].get_limits
  query_length = @prediction.length_protein

  @validation_report = LengthClusterValidationOutput.new(@short_header,
                                                         @header,
                                                         @description,
                                                         query_length,
                                                         limits)
  plot1 = plot_histo_clusters
  @validation_report.plot_files.push(plot1)

  @validation_report.run_time = Time.now - start

  @validation_report

rescue NotEnoughHitsError
  @validation_report = ValidationReport.new('Not enough evidence', :warning,
                                            @short_header, @header,
                                            @description)
rescue
  @validation_report = ValidationReport.new('Unexpected error', :error,
                                            @short_header, @header,
                                            @description)
  @validation_report.errors.push 'Unexpected Error'
end