Class: Ensembl::Core::Transcript
- Inherits:
-
DBConnection
- Object
- ActiveRecord::Base
- DBConnection
- Ensembl::Core::Transcript
- Includes:
- Sliceable
- Defined in:
- lib/ensembl/core/transcript.rb
Overview
DESCRIPTION
The Transcript class provides an interface to the transcript table. This table contains mappings of transcripts for a Gene to a SeqRegion.
This class uses ActiveRecord to access data in the Ensembl database. See the general documentation of the Ensembl module for more information on what this means and what methods are available.
This class includes the mixin Sliceable, which means that it is mapped to a SeqRegion object and a Slice can be created for objects of this class. See Sliceable and Slice for more information.
USAGE
#TODO
Class Method Summary collapse
-
.find_all_by_stable_id(stable_id) ⇒ Object
DESCRIPTION The Transcript#find_all_by_stable_id class method returns an array of transcripts with the given stable_id.
-
.find_by_stable_id(stable_id) ⇒ Object
DESCRIPTION The Transcript#find_by_stable_id class method fetches a Transcript object based on its stable ID (i.e. the “ENST” accession number).
Instance Method Summary collapse
-
#cdna2genomic(pos) ⇒ Object
DESCRIPTION The Transcript#cdna2genomic method converts cDNA coordinates to genomic coordinates for this transcript.
-
#cds2genomic(pos) ⇒ Object
DESCRIPTION The Transcript#cds2genomic method converts CDS coordinates to genomic coordinates for this transcript.
-
#cds_seq ⇒ Object
DESCRIPTION The Transcript#cds_seq method returns the coding sequence of the transcript, i.e.
-
#coding_region_cdna_end ⇒ Object
DESCRIPTION The Transcript#coding_region_cdna_end returns the stop position of the CDS in cDNA coordinates.
-
#coding_region_cdna_start ⇒ Object
DESCRIPTION The Transcript#coding_region_cdna_start returns the start position of the CDS in cDNA coordinates.
-
#coding_region_genomic_end ⇒ Object
DESCRIPTION The Transcript#coding_region_genomic_end returns the stop position of the CDS in genomic coordinates.
-
#coding_region_genomic_start ⇒ Object
DESCRIPTION The Transcript#coding_region_genomic_start returns the start position of the CDS in genomic coordinates.
-
#display_label ⇒ Object
(also: #display_name, #label, #name)
DESCRIPTION The Transcript#display_label method returns the default name of the transcript.
-
#exon_for_cdna_position(pos) ⇒ Object
DESCRIPTION The Transcript#exon_for_position identifies the exon that covers a given position of the cDNA.
-
#exon_for_genomic_position(pos) ⇒ Object
DESCRIPTION The Transcript#exon_for_position identifies the exon that covers a given genomic position.
-
#exons ⇒ Object
The Transcript#exons method returns the exons for this transcript in the order of their ranks in the exon_transcript table.
-
#five_prime_utr_seq ⇒ Object
DESCRIPTION The Transcript#five_prime_utr_seq method returns the sequence of the 5’UTR of the transcript.
-
#genomic2cdna(pos) ⇒ Object
DESCRIPTION The Transcript#genomic2cdna method converts genomic coordinates to cDNA coordinates for this transcript.
-
#genomic2cds(pos) ⇒ Object
DESCRIPTION The Transcript#genomic2cds method converts genomic coordinates to CDS coordinates for this transcript.
-
#genomic2pep(pos) ⇒ Object
DESCRIPTION The Transcript#genomic2pep method converts genomic coordinates to peptide coordinates for this transcript.
-
#introns ⇒ Object
- The Transcript#introns methods returns the introns for this transcript — Arguments
- none Returns
-
sorted array of Intron objects.
-
#pep2genomic(pos) ⇒ Object
DESCRIPTION The Transcript#pep2genomic method converts peptide coordinates to genomic coordinates for this transcript.
-
#protein_seq ⇒ Object
DESCRIPTION The Transcript#protein_seq method returns the sequence of the protein of the transcript.
-
#seq ⇒ Object
DESCRIPTION The Transcript#seq method returns the full sequence of all concatenated exons.
-
#stable_id ⇒ Object
The Transcript#stable_id method returns the stable ID of the transcript.
-
#three_prime_utr_seq ⇒ Object
DESCRIPTION The Transcript#three_prime_utr_seq method returns the sequence of the 3’UTR of the transcript.
Methods included from Sliceable
#length, #project, #slice, #start, #stop, #strand, #transform
Methods inherited from DBConnection
Class Method Details
.find_all_by_stable_id(stable_id) ⇒ Object
DESCRIPTION
The Transcript#find_all_by_stable_id class method returns an array of transcripts with the given stable_id. If none were found, an empty array is returned.
148 149 150 151 152 153 154 155 156 |
# File 'lib/ensembl/core/transcript.rb', line 148 def self.find_all_by_stable_id(stable_id) answer = Array.new transcript_stable_id_objects = Ensembl::Core::TranscriptStableId.find_all_by_stable_id(stable_id) transcript_stable_id_objects.each do |transcript_stable_id_object| answer.push(Ensembl::Core::Transcript.find(transcript_stable_id_object.transcript_id)) end return answer end |
.find_by_stable_id(stable_id) ⇒ Object
DESCRIPTION
The Transcript#find_by_stable_id class method fetches a Transcript object based on its stable ID (i.e. the “ENST” accession number). If the name is not found, it returns nil.
161 162 163 164 165 166 167 168 |
# File 'lib/ensembl/core/transcript.rb', line 161 def self.find_by_stable_id(stable_id) all = self.find_all_by_stable_id(stable_id) if all.length == 0 return nil else return all[0] end end |
Instance Method Details
#cdna2genomic(pos) ⇒ Object
DESCRIPTION
The Transcript#cdna2genomic method converts cDNA coordinates to genomic coordinates for this transcript.
Arguments:
- position
-
position on the cDNA (required)
- Returns
-
integer
339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 |
# File 'lib/ensembl/core/transcript.rb', line 339 def cdna2genomic(pos) #FIXME: Still have to check for when pos is outside of scope of cDNA. # Identify the exon we're looking at. exon_with_target = self.exon_for_cdna_position(pos) accumulated_position = 0 self.exons.each do |exon| if exon == exon_with_target answer = exon.start + ( pos - accumulated_position ) return answer else accumulated_position += exon.length end end end |
#cds2genomic(pos) ⇒ Object
DESCRIPTION
The Transcript#cds2genomic method converts CDS coordinates to genomic coordinates for this transcript.
Arguments:
- pos
-
position on the CDS (required)
- Returns
362 363 364 |
# File 'lib/ensembl/core/transcript.rb', line 362 def cds2genomic(pos) return self.cdna2genomic(pos + self.coding_region_cdna_start) end |
#cds_seq ⇒ Object
DESCRIPTION
The Transcript#cds_seq method returns the coding sequence of the transcript, i.e. the concatenated sequence of all exons minus the UTRs.
199 200 201 202 203 |
# File 'lib/ensembl/core/transcript.rb', line 199 def cds_seq cds_length = self.coding_region_cdna_end - self.coding_region_cdna_start + 1 return self.seq[(self.coding_region_cdna_start - 1), cds_length] end |
#coding_region_cdna_end ⇒ Object
DESCRIPTION
The Transcript#coding_region_cdna_end returns the stop position of the CDS in cDNA coordinates. Note that, in contrast to the Transcript#coding_region_genomic_end, the CDS start position is always at the border of the 3’UTR. So for genes on the reverse strand, the CDS start position in cDNA coordinates will be ”right” of the CDS stop position.
287 288 289 290 291 292 293 294 295 296 297 298 |
# File 'lib/ensembl/core/transcript.rb', line 287 def coding_region_cdna_end answer = 0 self.exons.each do |exon| if exon == self.translation.end_exon answer += self.translation.seq_end return answer else answer += exon.length end end end |
#coding_region_cdna_start ⇒ Object
DESCRIPTION
The Transcript#coding_region_cdna_start returns the start position of the CDS in cDNA coordinates. Note that, in contrast to the Transcript#coding_region_genomic_start, the CDS start position is always at the border of the 5’UTR. So for genes on the reverse strand, the CDS start position in cDNA coordinates will be ”right” of the CDS stop position.
266 267 268 269 270 271 272 273 274 275 276 277 278 |
# File 'lib/ensembl/core/transcript.rb', line 266 def coding_region_cdna_start answer = 0 self.exons.each do |exon| if exon == self.translation.start_exon answer += self.translation.seq_start return answer else answer += exon.length end end end |
#coding_region_genomic_end ⇒ Object
DESCRIPTION
The Transcript#coding_region_genomic_end returns the stop position of the CDS in genomic coordinates. Note that, in contrast to Transcript#coding_region_cdna_end, the CDS stop position is always ”right” of the start position. So for transcripts on the reverse strand, the CDS stop position is at the border of the 5’UTR instead of the 3’UTR.
250 251 252 253 254 255 256 257 |
# File 'lib/ensembl/core/transcript.rb', line 250 def coding_region_genomic_end strand = self.translation.start_exon.seq_region_strand if strand == 1 return self.translation.end_exon.seq_region_start + ( self.translation.seq_end - 1 ) else return self.translation.start_exon.seq_region_end - ( self.translation.seq_start - 1 ) end end |
#coding_region_genomic_start ⇒ Object
DESCRIPTION
The Transcript#coding_region_genomic_start returns the start position of the CDS in genomic coordinates. Note that, in contrast to Transcript#coding_region_cdna_start, the CDS start position is always ”left” of the end position. So for transcripts on the reverse strand, the CDS start position is at the border of the 3’UTR instead of the 5’UTR.
234 235 236 237 238 239 240 241 |
# File 'lib/ensembl/core/transcript.rb', line 234 def coding_region_genomic_start strand = self.translation.start_exon.seq_region_strand if strand == 1 return self.translation.start_exon.seq_region_start + ( self.translation.seq_start - 1 ) else return self.translation.end_exon.seq_region_end - ( self.translation.seq_end - 1 ) end end |
#display_label ⇒ Object Also known as: display_name, label, name
DESCRIPTION
The Transcript#display_label method returns the default name of the transcript.
137 138 139 |
# File 'lib/ensembl/core/transcript.rb', line 137 def display_label return Xref.find(self.display_xref_id).display_label end |
#exon_for_cdna_position(pos) ⇒ Object
DESCRIPTION
The Transcript#exon_for_position identifies the exon that covers a given position of the cDNA.
319 320 321 322 323 324 325 326 327 328 329 330 |
# File 'lib/ensembl/core/transcript.rb', line 319 def exon_for_cdna_position(pos) # FIXME: Still have to check for when pos is outside of scope of cDNA. accumulated_exon_length = 0 self.exons.each do |exon| accumulated_exon_length += exon.length if accumulated_exon_length > pos return exon end end raise RuntimeError, "Position outside of cDNA scope" end |
#exon_for_genomic_position(pos) ⇒ Object
DESCRIPTION
The Transcript#exon_for_position identifies the exon that covers a given genomic position. Returns the exon object, or nil if in intron.
304 305 306 307 308 309 310 311 312 313 314 |
# File 'lib/ensembl/core/transcript.rb', line 304 def exon_for_genomic_position(pos) if pos < coding_region_genomic_start or pos > coding_region_genomic_end raise RuntimeError, "Position has to be within transcript" end self.exons.each do |exon| if exon.start <= pos and exon.stop >= pos return exon end end return nil end |
#exons ⇒ Object
The Transcript#exons method returns the exons for this transcript in the order of their ranks in the exon_transcript table.
- Arguments
-
none
- Returns
-
sorted array of Exon objects
103 104 105 106 107 108 |
# File 'lib/ensembl/core/transcript.rb', line 103 def exons if @exons.nil? @exons = self.exon_transcripts.sort_by{|et| et.rank.to_i}.collect{|et| et.exon} end return @exons end |
#five_prime_utr_seq ⇒ Object
DESCRIPTION
The Transcript#five_prime_utr_seq method returns the sequence of the 5’UTR of the transcript.
208 209 210 |
# File 'lib/ensembl/core/transcript.rb', line 208 def five_prime_utr_seq return self.seq[0, self.coding_region_cdna_start - 1] end |
#genomic2cdna(pos) ⇒ Object
DESCRIPTION
The Transcript#genomic2cdna method converts genomic coordinates to cDNA coordinates for this transcript.
Arguments:
- pos
-
position on the chromosome (required)
- Returns
384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 |
# File 'lib/ensembl/core/transcript.rb', line 384 def genomic2cdna(pos) #FIXME: Still have to check for when pos is outside of scope of cDNA. # Identify the exon we're looking at. exon_with_target = self.exon_for_genomic_position(pos) accumulated_position = 0 self.exons.each do |exon| if exon == exon_with_target accumulated_position += ( pos - exon.start ) return accumulated_position else accumulated_position += exon.length end end return RuntimeError, "Position outside of cDNA scope" end |
#genomic2cds(pos) ⇒ Object
DESCRIPTION
The Transcript#genomic2cds method converts genomic coordinates to CDS coordinates for this transcript.
Arguments:
- pos
-
position on the chromosome (required)
- Returns
408 409 410 |
# File 'lib/ensembl/core/transcript.rb', line 408 def genomic2cds(pos) return self.genomic2cdna(pos) - self.coding_region_cdna_start end |
#genomic2pep(pos) ⇒ Object
DESCRIPTION
The Transcript#genomic2pep method converts genomic coordinates to peptide coordinates for this transcript.
Arguments:
- pos
-
position on the chromosome (required)
- Returns
419 420 421 |
# File 'lib/ensembl/core/transcript.rb', line 419 def genomic2pep(pos) raise NotImplementedError end |
#introns ⇒ Object
The Transcript#introns methods returns the introns for this transcript
- Arguments
-
none
- Returns
-
sorted array of Intron objects
114 115 116 117 118 119 120 121 122 123 124 125 |
# File 'lib/ensembl/core/transcript.rb', line 114 def introns if @introns.nil? @introns = Array.new if self.exons.length > 1 self.exons.each_with_index do |exon, index| next if index == 0 @introns.push(Intron.new(self.exons[index - 1], exon)) end end end return @introns end |
#pep2genomic(pos) ⇒ Object
DESCRIPTION
The Transcript#pep2genomic method converts peptide coordinates to genomic coordinates for this transcript.
Arguments:
- pos
-
position on the peptide (required)
- Returns
373 374 375 |
# File 'lib/ensembl/core/transcript.rb', line 373 def pep2genomic(pos) raise NotImplementedError end |
#protein_seq ⇒ Object
DESCRIPTION
The Transcript#protein_seq method returns the sequence of the protein of the transcript.
222 223 224 |
# File 'lib/ensembl/core/transcript.rb', line 222 def protein_seq return Bio::Sequence::NA.new(self.cds_seq).translate.seq end |
#seq ⇒ Object
DESCRIPTION
The Transcript#seq method returns the full sequence of all concatenated exons.
186 187 188 189 190 191 192 193 194 |
# File 'lib/ensembl/core/transcript.rb', line 186 def seq if @seq.nil? @seq = '' self.exons.each do |exon| @seq += exon.seq end end return @seq end |
#stable_id ⇒ Object
The Transcript#stable_id method returns the stable ID of the transcript.
- Arguments
-
none
- Returns
-
String
131 132 133 |
# File 'lib/ensembl/core/transcript.rb', line 131 def stable_id return self.transcript_stable_id.stable_id end |
#three_prime_utr_seq ⇒ Object
DESCRIPTION
The Transcript#three_prime_utr_seq method returns the sequence of the 3’UTR of the transcript.
215 216 217 |
# File 'lib/ensembl/core/transcript.rb', line 215 def three_prime_utr_seq return self.seq[self.coding_region_cdna_end..-1] end |