Class: GeneValidator::LengthClusterValidation

Inherits:
ValidationTest show all
Extended by:
Forwardable
Defined in:
lib/genevalidator/validation_length_cluster.rb

Overview

This class contains the methods necessary for length validation by hit length clusterization

Instance Attribute Summary collapse

Attributes inherited from ValidationTest

#cli_name, #description, #header, #hits, #prediction, #run_time, #short_header, #type, #validation_report

Instance Method Summary collapse

Constructor Details

#initialize(prediction, hits) ⇒ LengthClusterValidation

Initilizes the object Params: type: type of the predicted sequence (:nucleotide or :protein) prediction: a Sequence object representing the blast query hits: a vector of Sequence objects (representing blast hits) dilename: String with the name of the fasta file



85
86
87
88
89
90
91
92
93
94
# File 'lib/genevalidator/validation_length_cluster.rb', line 85

def initialize(prediction, hits)
  super
  @short_header = 'LengthCluster'
  @header       = 'Length Cluster'
  @description  = 'Check whether the prediction length fits most of the' \
                  ' BLAST hit lengths, by 1D hierarchical clusterization.' \
                  ' Meaning of the output displayed: Query_length' \
                  ' [Main Cluster Length Interval]'
  @cli_name     = 'lenc'
end

Instance Attribute Details

#clustersObject (readonly)

Returns the value of attribute clusters.



75
76
77
# File 'lib/genevalidator/validation_length_cluster.rb', line 75

def clusters
  @clusters
end

#max_density_clusterObject (readonly)

Returns the value of attribute max_density_cluster.



76
77
78
# File 'lib/genevalidator/validation_length_cluster.rb', line 76

def max_density_cluster
  @max_density_cluster
end

Instance Method Details

#clusterization_by_length(_debug = false, lst = @hits, predicted_seq = @prediction) ⇒ Object

Clusterization by length from a list of sequences Params:

debug (optional)

true to display debug information, false by default

lst

array of Query objects

predicted_seq

Query objetc

Output

output 1

array of Cluster objects

output 2

the index of the most dense cluster



147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
# File 'lib/genevalidator/validation_length_cluster.rb', line 147

def clusterization_by_length(_debug = false,
                             lst = @hits,
                             predicted_seq = @prediction)
  raise TypeError unless lst[0].is_a?(Query) && predicted_seq.is_a?(Query)

  contents = lst.map { |x| x.length_protein.to_i }.sort { |a, b| a <=> b }

  hc = HierarchicalClusterization.new(contents)
  clusters = hc.hierarchical_clusterization

  max_density             = 0
  max_density_cluster_idx = 0
  clusters.each_with_index do |item, i|
    next unless item.density > max_density
    max_density             = item.density
    max_density_cluster_idx = i
  end

  [clusters, max_density_cluster_idx]
rescue TypeError => error
  error_location = error.backtrace[0].scan(%r{([^/]+:\d+):.*})[0][0]
  warn "Type error at #{error_location}."
  warn ' Possible cause: one of the arguments of the' \
               ' "clusterization_by_length" method has not the proper type.'
  exit 1
end

#plot_histo_clusters(clusters = @clusters, max_density_cluster = @max_density_cluster, prediction = @prediction) ⇒ Object

Generates a json file containing data used for plotting the histogram of the length distribution given a lust of Cluster objects output: plot_path where to save the graph clusters: array of Cluster objects max_density_cluster: index of the most dense cluster prediction: Sequence object Output: Plot object



183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
# File 'lib/genevalidator/validation_length_cluster.rb', line 183

def plot_histo_clusters(clusters = @clusters,
                        max_density_cluster = @max_density_cluster,
                        prediction = @prediction)

  data = clusters.each_with_index.map do |cluster, i|
    cluster.lengths.collect do |k, v|
      { 'key' => k, 'value' => v, 'main' => (i == max_density_cluster) }
    end
  end

  Plot.new(data,
           :bars,
           'Length Cluster Validation: Distribution of BLAST hit lengths',
           'Query Sequence, black;Most Dense Cluster,red;Other Hits, blue',
           'Sequence Length',
           'Number of Sequences',
           prediction.length_protein)
end

#runObject

Validates the length of the predicted gene by comparing the length of the prediction to the most dense cluster The most dense cluster is obtained by hierarchical clusterization Plots are generated if required (see plot variable) Output: LengthClusterValidationOutput object



103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
# File 'lib/genevalidator/validation_length_cluster.rb', line 103

def run
  raise NotEnoughHitsError if hits.length < opt[:min_blast_hits]
  raise unless prediction.is_a?(Query) && hits[0].is_a?(Query)

  start = Time.now
  # get [clusters, max_density_cluster_idx]
  clusterization = clusterization_by_length

  @clusters = clusterization[0]
  @max_density_cluster = clusterization[1]
  limits = @clusters[@max_density_cluster].get_limits
  query_length = @prediction.length_protein

  @validation_report = LengthClusterValidationOutput.new(@short_header,
                                                         @header,
                                                         @description,
                                                         query_length,
                                                         limits)
  plot1 = plot_histo_clusters
  @validation_report.plot_files.push(plot1)

  @validation_report.run_time = Time.now - start

  @validation_report
rescue NotEnoughHitsError
  @validation_report = ValidationReport.new('Not enough evidence', :warning,
                                            @short_header, @header,
                                            @description)
rescue StandardError
  @validation_report = ValidationReport.new('Unexpected error', :error,
                                            @short_header, @header,
                                            @description)
  @validation_report.errors.push 'Unexpected Error'
end