Class: Bio::DB::Vcf

Inherits:
Object
  • Object
show all
Defined in:
lib/bio/util/bio-gngm.rb

Overview

Extends the methods of the Bio::DB::Vcf class in bio-samtools. A Vcf object represents the VCF format described at www.1000genomes.org/node/101 . The Bio::DB::Vcf object returns all information in the VCF line, but the implementation here acts as if there is only possibly one variant at each position and ignores positions at which there may be multiple variants. Vcf format is only used when the Bio::Util::Gngm object requests information about indels using SAMtools mpileup method.

Instance Method Summary collapse

Instance Method Details

#alternativesObject

List of alternate alleles at this locus, obtained by splitting the vcf.alt attribute string on commas

Example vcf.alt = “ACT,TCA” vcf.alternatives = [“ACT”, “TCA”] vcf.alt = “T” vcf.alternatives = [“T”]



123
124
125
# File 'lib/bio/util/bio-gngm.rb', line 123

def alternatives
  self.alt.split(",") rescue []
end

#gqObject

Returns the genotype quality score from the sample data (as defined by the Vcf GQ attribute) for the first sample in the Vcf only.



146
147
148
# File 'lib/bio/util/bio-gngm.rb', line 146

def gq 
  self.samples["1"]["GQ"].to_f rescue 0.0
end

#is_indel?(options) ⇒ Boolean

Returns true if ref col is different in length from any of the entries in alt column

Returns:

  • (Boolean)


186
187
188
189
# File 'lib/bio/util/bio-gngm.rb', line 186

def is_indel?(options)
  return true if self.variant? and self.alternatives.any? {|x| x.length != self.ref.length} and self.pass_quality?(options)
  false
end

#is_mnp?(options) ⇒ Boolean

returns true if ref col has same length as all alternatives and position variant passes quality

Returns:

  • (Boolean)


174
175
176
177
# File 'lib/bio/util/bio-gngm.rb', line 174

def is_mnp?(options)
  return true if self.alternatives.all? {|x| x.length == self.ref.length} and self.pass_quality?(options)
  false
end

#is_snp?(options) ⇒ Boolean

returns true if ref col has length of 1 and is_mnp?

Returns:

  • (Boolean)


180
181
182
183
# File 'lib/bio/util/bio-gngm.rb', line 180

def is_snp?(options)
  return true if self.is_mnp?(options) and self.ref.length == 1
  false
end

#mqObject

Returns the mean Mapping Quality from the reads over this position as defined by the Vcf MQ attribute.



141
142
143
# File 'lib/bio/util/bio-gngm.rb', line 141

def mq
  self.info["MQ"].to_f rescue 0.0
end

#non_ref_allele_countObject

Returns the depth of reads containing the non reference allele. IE the sum of the last two figures in the DP4 attribute.



128
129
130
# File 'lib/bio/util/bio-gngm.rb', line 128

def non_ref_allele_count
  self.info["DP4"].split(",")[2..3].inject {|sum,n| sum.to_f + n.to_f } rescue 0.0
end

#non_ref_allele_freqObject

Returns the non-reference allele frequency based on depth of reads used for the genotype call,

IE vcf.non_ref_allele_count / vcf.used_depth



136
137
138
# File 'lib/bio/util/bio-gngm.rb', line 136

def non_ref_allele_freq
  self.non_ref_allele_count / self.used_depth
end

#pass_quality?(options) ⇒ Boolean

Returns true if the position passes criteria

Options and Defaults:

  • :min_depth => 2

  • :min_non_ref_count => 2

  • :mapping_quality => 10

Example vcf.pass_quality?(:min_depth => 5, :min_non_ref_count => 2, :mapping_quality => 25, :min_snp_quality => 20)

Returns:

  • (Boolean)


169
170
171
# File 'lib/bio/util/bio-gngm.rb', line 169

def pass_quality?(options)
  (self.used_depth >= options[:min_depth] and self.mq >= options[:mapping_quality] and self.non_ref_allele_count >= options[:min_non_ref_count] and self.qual >= options[:min_snp_quality])
end

#plObject

Returns the phred scaled likelihood of the first non-reference allele (as defined by the Vcf PL attribute) for the first sample in the Vcf only.



151
152
153
# File 'lib/bio/util/bio-gngm.rb', line 151

def pl
  self.samples["1"]["PL"].split(",")[1].to_f rescue 0.0
end

#to_sObject

Return a short string representing chromosome, position, reference sequence, alt sequence(s) and the info string of the Vcf object.



107
108
109
# File 'lib/bio/util/bio-gngm.rb', line 107

def to_s
  "#{self.chrom} #{self.pos} #{self.ref} #{self.alt} #{self.info}"
end

#used_depthObject

The depth of reads actually used in the genotype call by Vcftools. The sum of the DP4 attribute. Returns 0.0 if no value is calculated.



112
113
114
# File 'lib/bio/util/bio-gngm.rb', line 112

def used_depth
  self.info["DP4"].split(",").inject {|sum,n| sum.to_f + n.to_f} rescue 0.0
end

#variant?Boolean

returns true if the alt column of the Vcf is not .

Examples

vcf record = 20 14370 rs6054257 G A 29 PASS … vcf.variant? #=> true vcf record = 20 1230237 . T . 47 PASS … vcf.variant? #=> false

Returns:

  • (Boolean)


102
103
104
# File 'lib/bio/util/bio-gngm.rb', line 102

def variant?
  not self.alt == "." rescue false
end