Class: GeneValidator::LengthClusterValidation
- Inherits:
-
ValidationTest
- Object
- ValidationTest
- GeneValidator::LengthClusterValidation
- Defined in:
- lib/genevalidator/validation_length_cluster.rb
Overview
This class contains the methods necessary for length validation by hit length clusterization
Instance Attribute Summary collapse
-
#clusters ⇒ Object
readonly
Returns the value of attribute clusters.
-
#max_density_cluster ⇒ Object
readonly
Returns the value of attribute max_density_cluster.
Attributes inherited from ValidationTest
#cli_name, #description, #header, #hits, #prediction, #run_time, #short_header, #type, #validation_report
Instance Method Summary collapse
-
#clusterization_by_length(_debug = false, lst = @hits, predicted_seq = @prediction) ⇒ Object
- Clusterization by length from a list of sequences Params:
debug
(optional) - true to display debug information, false by default
lst
- array of
Query
objectspredicted_seq
Query
objetc Output output 1- array of Cluster objects output 2
-
the index of the most dense cluster.
- array of
- true to display debug information, false by default
- Clusterization by length from a list of sequences Params:
-
#initialize(prediction, hits) ⇒ LengthClusterValidation
constructor
Initilizes the object Params:
type
: type of the predicted sequence (:nucleotide or :protein)prediction
: aSequence
object representing the blast queryhits
: a vector ofSequence
objects (representing blast hits)dilename
:String
with the name of the fasta file. -
#plot_histo_clusters(clusters = @clusters, max_density_cluster = @max_density_cluster, prediction = @prediction) ⇒ Object
Generates a json file containing data used for plotting the histogram of the length distribution given a lust of Cluster objects
output
: plot_path where to save the graphclusters
: array ofCluster
objectsmax_density_cluster
: index of the most dense clusterprediction
:Sequence
object Output:Plot
object. -
#run ⇒ Object
Validates the length of the predicted gene by comparing the length of the prediction to the most dense cluster The most dense cluster is obtained by hierarchical clusterization Plots are generated if required (see
plot
variable) Output:LengthClusterValidationOutput
object.
Constructor Details
#initialize(prediction, hits) ⇒ LengthClusterValidation
Initilizes the object Params: type
: type of the predicted sequence (:nucleotide or :protein) prediction
: a Sequence
object representing the blast query hits
: a vector of Sequence
objects (representing blast hits) dilename
: String
with the name of the fasta file
80 81 82 83 84 85 86 87 88 89 |
# File 'lib/genevalidator/validation_length_cluster.rb', line 80 def initialize(prediction, hits) super @short_header = 'LengthCluster' @header = 'Length Cluster' @description = 'Check whether the prediction length fits most of the' \ ' BLAST hit lengths, by 1D hierarchical clusterization.' \ ' Meaning of the output displayed: Query_length' \ ' [Main Cluster Length Interval]' @cli_name = 'lenc' end |
Instance Attribute Details
#clusters ⇒ Object (readonly)
Returns the value of attribute clusters.
70 71 72 |
# File 'lib/genevalidator/validation_length_cluster.rb', line 70 def clusters @clusters end |
#max_density_cluster ⇒ Object (readonly)
Returns the value of attribute max_density_cluster.
71 72 73 |
# File 'lib/genevalidator/validation_length_cluster.rb', line 71 def max_density_cluster @max_density_cluster end |
Instance Method Details
#clusterization_by_length(_debug = false, lst = @hits, predicted_seq = @prediction) ⇒ Object
Clusterization by length from a list of sequences Params:
debug
(optional)-
true to display debug information, false by default
lst
-
array of
Query
objects predicted_seq
-
Query
objetc
Output
- output 1
-
array of Cluster objects
- output 2
-
the index of the most dense cluster
143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 |
# File 'lib/genevalidator/validation_length_cluster.rb', line 143 def clusterization_by_length(_debug = false, lst = @hits, predicted_seq = @prediction) fail TypeError unless lst[0].is_a?(Query) && predicted_seq.is_a?(Query) contents = lst.map { |x| x.length_protein.to_i }.sort { |a, b| a <=> b } hc = HierarchicalClusterization.new(contents) clusters = hc.hierarchical_clusterization max_density = 0 max_density_cluster_idx = 0 clusters.each_with_index do |item, i| next unless item.density > max_density max_density = item.density max_density_cluster_idx = i end [clusters, max_density_cluster_idx] rescue TypeError => error error_location = error.backtrace[0].scan(%r{([^/]+:\d+):.*})[0][0] $stderr.puts "Type error at #{error_location}." $stderr.puts ' Possible cause: one of the arguments of the' \ ' "clusterization_by_length" method has not the proper type.' exit 1 end |
#plot_histo_clusters(clusters = @clusters, max_density_cluster = @max_density_cluster, prediction = @prediction) ⇒ Object
Generates a json file containing data used for plotting the histogram of the length distribution given a lust of Cluster objects output
: plot_path where to save the graph clusters
: array of Cluster
objects max_density_cluster
: index of the most dense cluster prediction
: Sequence
object Output: Plot
object
180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 |
# File 'lib/genevalidator/validation_length_cluster.rb', line 180 def plot_histo_clusters(clusters = @clusters, max_density_cluster = @max_density_cluster, prediction = @prediction) data = clusters.each_with_index.map { |cluster, i| cluster.lengths.collect { |k, v| { 'key' => k, 'value' => v, 'main' => (i == max_density_cluster) } } } Plot.new(data, :bars, 'Length Cluster Validation: Distribution of BLAST hit lengths', 'Query Sequence, black;Most Dense Cluster,red;Other Hits, blue', 'Sequence Length', 'Number of Sequences', prediction.length_protein) end |
#run ⇒ Object
Validates the length of the predicted gene by comparing the length of the prediction to the most dense cluster The most dense cluster is obtained by hierarchical clusterization Plots are generated if required (see plot
variable) Output: LengthClusterValidationOutput
object
98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
# File 'lib/genevalidator/validation_length_cluster.rb', line 98 def run fail NotEnoughHitsError unless hits.length >= 5 fail unless prediction.is_a?(Query) && hits[0].is_a?(Query) start = Time.now # get [clusters, max_density_cluster_idx] clusterization = clusterization_by_length @clusters = clusterization[0] @max_density_cluster = clusterization[1] limits = @clusters[@max_density_cluster].get_limits query_length = @prediction.length_protein @validation_report = LengthClusterValidationOutput.new(@short_header, @header, @description, query_length, limits) plot1 = plot_histo_clusters @validation_report.plot_files.push(plot1) @validation_report.run_time = Time.now - start @validation_report rescue NotEnoughHitsError @validation_report = ValidationReport.new('Not enough evidence', :warning, @short_header, @header, @description) rescue @validation_report = ValidationReport.new('Unexpected error', :error, @short_header, @header, @description) @validation_report.errors.push 'Unexpected Error' end |