Class: GeneValidator::LengthClusterValidation
- Inherits:
-
ValidationTest
- Object
- ValidationTest
- GeneValidator::LengthClusterValidation
- Extended by:
- Forwardable
- Defined in:
- lib/genevalidator/validation_length_cluster.rb
Overview
This class contains the methods necessary for length validation by hit length clusterization
Instance Attribute Summary collapse
-
#clusters ⇒ Object
readonly
Returns the value of attribute clusters.
-
#max_density_cluster ⇒ Object
readonly
Returns the value of attribute max_density_cluster.
Attributes inherited from ValidationTest
#cli_name, #description, #header, #hits, #prediction, #run_time, #short_header, #type, #validation_report
Instance Method Summary collapse
-
#clusterization_by_length(_debug = false, lst = @hits, predicted_seq = @prediction) ⇒ Object
- Clusterization by length from a list of sequences Params:
debug
(optional) - true to display debug information, false by default
lst
- array of
Query
objectspredicted_seq
Query
objetc Output output 1- array of Cluster objects output 2
-
the index of the most dense cluster.
- array of
- true to display debug information, false by default
- Clusterization by length from a list of sequences Params:
-
#initialize(prediction, hits) ⇒ LengthClusterValidation
constructor
Initilizes the object Params:
type
: type of the predicted sequence (:nucleotide or :protein)prediction
: aSequence
object representing the blast queryhits
: a vector ofSequence
objects (representing blast hits)dilename
:String
with the name of the fasta file. -
#plot_histo_clusters(clusters = @clusters, max_density_cluster = @max_density_cluster, prediction = @prediction) ⇒ Object
Generates a json file containing data used for plotting the histogram of the length distribution given a lust of Cluster objects
output
: plot_path where to save the graphclusters
: array ofCluster
objectsmax_density_cluster
: index of the most dense clusterprediction
:Sequence
object Output:Plot
object. -
#run ⇒ Object
Validates the length of the predicted gene by comparing the length of the prediction to the most dense cluster The most dense cluster is obtained by hierarchical clusterization Plots are generated if required (see
plot
variable) Output:LengthClusterValidationOutput
object.
Constructor Details
#initialize(prediction, hits) ⇒ LengthClusterValidation
Initilizes the object Params: type
: type of the predicted sequence (:nucleotide or :protein) prediction
: a Sequence
object representing the blast query hits
: a vector of Sequence
objects (representing blast hits) dilename
: String
with the name of the fasta file
85 86 87 88 89 90 91 92 93 94 |
# File 'lib/genevalidator/validation_length_cluster.rb', line 85 def initialize(prediction, hits) super @short_header = 'LengthCluster' @header = 'Length Cluster' @description = 'Check whether the prediction length fits most of the' \ ' BLAST hit lengths, by 1D hierarchical clusterization.' \ ' Meaning of the output displayed: Query_length' \ ' [Main Cluster Length Interval]' @cli_name = 'lenc' end |
Instance Attribute Details
#clusters ⇒ Object (readonly)
Returns the value of attribute clusters.
75 76 77 |
# File 'lib/genevalidator/validation_length_cluster.rb', line 75 def clusters @clusters end |
#max_density_cluster ⇒ Object (readonly)
Returns the value of attribute max_density_cluster.
76 77 78 |
# File 'lib/genevalidator/validation_length_cluster.rb', line 76 def max_density_cluster @max_density_cluster end |
Instance Method Details
#clusterization_by_length(_debug = false, lst = @hits, predicted_seq = @prediction) ⇒ Object
Clusterization by length from a list of sequences Params:
debug
(optional)-
true to display debug information, false by default
lst
-
array of
Query
objects predicted_seq
-
Query
objetc
Output
- output 1
-
array of Cluster objects
- output 2
-
the index of the most dense cluster
147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
# File 'lib/genevalidator/validation_length_cluster.rb', line 147 def clusterization_by_length(_debug = false, lst = @hits, predicted_seq = @prediction) raise TypeError unless lst[0].is_a?(Query) && predicted_seq.is_a?(Query) contents = lst.map { |x| x.length_protein.to_i }.sort { |a, b| a <=> b } hc = HierarchicalClusterization.new(contents) clusters = hc.hierarchical_clusterization max_density = 0 max_density_cluster_idx = 0 clusters.each_with_index do |item, i| next unless item.density > max_density max_density = item.density max_density_cluster_idx = i end [clusters, max_density_cluster_idx] rescue TypeError => error error_location = error.backtrace[0].scan(%r{([^/]+:\d+):.*})[0][0] warn "Type error at #{error_location}." warn ' Possible cause: one of the arguments of the' \ ' "clusterization_by_length" method has not the proper type.' exit 1 end |
#plot_histo_clusters(clusters = @clusters, max_density_cluster = @max_density_cluster, prediction = @prediction) ⇒ Object
Generates a json file containing data used for plotting the histogram of the length distribution given a lust of Cluster objects output
: plot_path where to save the graph clusters
: array of Cluster
objects max_density_cluster
: index of the most dense cluster prediction
: Sequence
object Output: Plot
object
183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 |
# File 'lib/genevalidator/validation_length_cluster.rb', line 183 def plot_histo_clusters(clusters = @clusters, max_density_cluster = @max_density_cluster, prediction = @prediction) data = clusters.each_with_index.map do |cluster, i| cluster.lengths.collect do |k, v| { 'key' => k, 'value' => v, 'main' => (i == max_density_cluster) } end end Plot.new(data, :bars, 'Length Cluster Validation: Distribution of BLAST hit lengths', 'Query Sequence, black;Most Dense Cluster,red;Other Hits, blue', 'Sequence Length', 'Number of Sequences', prediction.length_protein) end |
#run ⇒ Object
Validates the length of the predicted gene by comparing the length of the prediction to the most dense cluster The most dense cluster is obtained by hierarchical clusterization Plots are generated if required (see plot
variable) Output: LengthClusterValidationOutput
object
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
# File 'lib/genevalidator/validation_length_cluster.rb', line 103 def run raise NotEnoughHitsError if hits.length < opt[:min_blast_hits] raise unless prediction.is_a?(Query) && hits[0].is_a?(Query) start = Time.now # get [clusters, max_density_cluster_idx] clusterization = clusterization_by_length @clusters = clusterization[0] @max_density_cluster = clusterization[1] limits = @clusters[@max_density_cluster].get_limits query_length = @prediction.length_protein @validation_report = LengthClusterValidationOutput.new(@short_header, @header, @description, query_length, limits) plot1 = plot_histo_clusters @validation_report.plot_files.push(plot1) @validation_report.run_time = Time.now - start @validation_report rescue NotEnoughHitsError @validation_report = ValidationReport.new('Not enough evidence', :warning, @short_header, @header, @description) rescue StandardError @validation_report = ValidationReport.new('Unexpected error', :error, @short_header, @header, @description) @validation_report.errors.push 'Unexpected Error' end |