Class: Ensembl::Core::Transcript

Inherits:
DBConnection
  • Object
show all
Includes:
Sliceable
Defined in:
lib/ensembl/core/transcript.rb

Overview

DESCRIPTION

The Transcript class provides an interface to the transcript table. This table contains mappings of transcripts for a Gene to a SeqRegion.

This class uses ActiveRecord to access data in the Ensembl database. See the general documentation of the Ensembl module for more information on what this means and what methods are available.

This class includes the mixin Sliceable, which means that it is mapped to a SeqRegion object and a Slice can be created for objects of this class. See Sliceable and Slice for more information.

USAGE

#TODO

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Sliceable

#length, #project, #slice, #start, #stop, #strand, #transform

Methods inherited from DBConnection

connect

Class Method Details

.find_all_by_stable_id(stable_id) ⇒ Object

DESCRIPTION

The Transcript#find_all_by_stable_id class method returns an array of transcripts with the given stable_id. If none were found, an empty array is returned.



148
149
150
151
152
153
154
155
156
# File 'lib/ensembl/core/transcript.rb', line 148

def self.find_all_by_stable_id(stable_id)
	answer = Array.new
  transcript_stable_id_objects = Ensembl::Core::TranscriptStableId.find_all_by_stable_id(stable_id)
  transcript_stable_id_objects.each do |transcript_stable_id_object|
    answer.push(Ensembl::Core::Transcript.find(transcript_stable_id_object.transcript_id))
  end

	return answer
end

.find_by_stable_id(stable_id) ⇒ Object

DESCRIPTION

The Transcript#find_by_stable_id class method fetches a Transcript object based on its stable ID (i.e. the “ENST” accession number). If the name is not found, it returns nil.



161
162
163
164
165
166
167
168
# File 'lib/ensembl/core/transcript.rb', line 161

def self.find_by_stable_id(stable_id)
  all = self.find_all_by_stable_id(stable_id)
  if all.length == 0
    return nil
  else
    return all[0]
  end
end

Instance Method Details

#cdna2genomic(pos) ⇒ Object

DESCRIPTION

The Transcript#cdna2genomic method converts cDNA coordinates to genomic coordinates for this transcript.


Arguments:

  • position

    position on the cDNA (required)

Returns

integer



339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
# File 'lib/ensembl/core/transcript.rb', line 339

def cdna2genomic(pos)
  #FIXME: Still have to check for when pos is outside of scope of cDNA.
  # Identify the exon we're looking at.
  exon_with_target = self.exon_for_cdna_position(pos)
  
  accumulated_position = 0
  self.exons.each do |exon|
    if exon == exon_with_target
      answer = exon.start + ( pos - accumulated_position )
      return answer
    else
      accumulated_position += exon.length
    end
  end
end

#cds2genomic(pos) ⇒ Object

DESCRIPTION

The Transcript#cds2genomic method converts CDS coordinates to genomic coordinates for this transcript.


Arguments:

  • pos

    position on the CDS (required)

Returns


362
363
364
# File 'lib/ensembl/core/transcript.rb', line 362

def cds2genomic(pos)
  return self.cdna2genomic(pos + self.coding_region_cdna_start)
end

#cds_seqObject

DESCRIPTION

The Transcript#cds_seq method returns the coding sequence of the transcript, i.e. the concatenated sequence of all exons minus the UTRs.



199
200
201
202
203
# File 'lib/ensembl/core/transcript.rb', line 199

def cds_seq
  cds_length = self.coding_region_cdna_end - self.coding_region_cdna_start + 1
  
  return self.seq[(self.coding_region_cdna_start - 1), cds_length]
end

#coding_region_cdna_endObject

DESCRIPTION

The Transcript#coding_region_cdna_end returns the stop position of the CDS in cDNA coordinates. Note that, in contrast to the Transcript#coding_region_genomic_end, the CDS start position is always at the border of the 3’UTR. So for genes on the reverse strand, the CDS start position in cDNA coordinates will be ”right” of the CDS stop position.



287
288
289
290
291
292
293
294
295
296
297
298
# File 'lib/ensembl/core/transcript.rb', line 287

def coding_region_cdna_end
  answer = 0
  
  self.exons.each do |exon|
    if exon == self.translation.end_exon
      answer += self.translation.seq_end
      return answer
    else
      answer += exon.length
    end
  end
end

#coding_region_cdna_startObject

DESCRIPTION

The Transcript#coding_region_cdna_start returns the start position of the CDS in cDNA coordinates. Note that, in contrast to the Transcript#coding_region_genomic_start, the CDS start position is always at the border of the 5’UTR. So for genes on the reverse strand, the CDS start position in cDNA coordinates will be ”right” of the CDS stop position.



266
267
268
269
270
271
272
273
274
275
276
277
278
# File 'lib/ensembl/core/transcript.rb', line 266

def coding_region_cdna_start
  answer = 0
  
  self.exons.each do |exon|
    if exon == self.translation.start_exon
      answer += self.translation.seq_start
      return answer
    else
      answer += exon.length
    end
  end
  
end

#coding_region_genomic_endObject

DESCRIPTION

The Transcript#coding_region_genomic_end returns the stop position of the CDS in genomic coordinates. Note that, in contrast to Transcript#coding_region_cdna_end, the CDS stop position is always ”right” of the start position. So for transcripts on the reverse strand, the CDS stop position is at the border of the 5’UTR instead of the 3’UTR.



250
251
252
253
254
255
256
257
# File 'lib/ensembl/core/transcript.rb', line 250

def coding_region_genomic_end
  strand = self.translation.start_exon.seq_region_strand
  if strand == 1
    return self.translation.end_exon.seq_region_start + ( self.translation.seq_end - 1 )
  else
    return self.translation.start_exon.seq_region_end - ( self.translation.seq_start - 1 )
  end
end

#coding_region_genomic_startObject

DESCRIPTION

The Transcript#coding_region_genomic_start returns the start position of the CDS in genomic coordinates. Note that, in contrast to Transcript#coding_region_cdna_start, the CDS start position is always ”left” of the end position. So for transcripts on the reverse strand, the CDS start position is at the border of the 3’UTR instead of the 5’UTR.



234
235
236
237
238
239
240
241
# File 'lib/ensembl/core/transcript.rb', line 234

def coding_region_genomic_start
  strand = self.translation.start_exon.seq_region_strand
  if strand == 1
    return self.translation.start_exon.seq_region_start + ( self.translation.seq_start - 1 )
  else
    return self.translation.end_exon.seq_region_end - ( self.translation.seq_end - 1 )
  end
end

#display_labelObject Also known as: display_name, label, name

DESCRIPTION

The Transcript#display_label method returns the default name of the transcript.



137
138
139
# File 'lib/ensembl/core/transcript.rb', line 137

def display_label
  return Xref.find(self.display_xref_id).display_label
end

#exon_for_cdna_position(pos) ⇒ Object

DESCRIPTION

The Transcript#exon_for_position identifies the exon that covers a given position of the cDNA.

Raises:

  • (RuntimeError)


319
320
321
322
323
324
325
326
327
328
329
330
# File 'lib/ensembl/core/transcript.rb', line 319

def exon_for_cdna_position(pos)
  # FIXME: Still have to check for when pos is outside of scope of cDNA.
  accumulated_exon_length = 0
  
  self.exons.each do |exon|
    accumulated_exon_length += exon.length
    if accumulated_exon_length > pos
      return exon
    end
  end
  raise RuntimeError, "Position outside of cDNA scope"
end

#exon_for_genomic_position(pos) ⇒ Object

DESCRIPTION

The Transcript#exon_for_position identifies the exon that covers a given genomic position. Returns the exon object, or nil if in intron.



304
305
306
307
308
309
310
311
312
313
314
# File 'lib/ensembl/core/transcript.rb', line 304

def exon_for_genomic_position(pos)
  if pos < coding_region_genomic_start or pos > coding_region_genomic_end
    raise RuntimeError, "Position has to be within transcript"
  end
  self.exons.each do |exon|
    if exon.start <= pos and exon.stop >= pos
      return exon
    end
  end
  return nil
end

#exonsObject

The Transcript#exons method returns the exons for this transcript in the order of their ranks in the exon_transcript table.


Arguments

none

Returns

sorted array of Exon objects



103
104
105
106
107
108
# File 'lib/ensembl/core/transcript.rb', line 103

def exons
  if @exons.nil?
    @exons = self.exon_transcripts.sort_by{|et| et.rank.to_i}.collect{|et| et.exon}
  end
  return @exons
end

#five_prime_utr_seqObject

DESCRIPTION

The Transcript#five_prime_utr_seq method returns the sequence of the 5’UTR of the transcript.



208
209
210
# File 'lib/ensembl/core/transcript.rb', line 208

def five_prime_utr_seq
  return self.seq[0, self.coding_region_cdna_start - 1]
end

#genomic2cdna(pos) ⇒ Object

DESCRIPTION

The Transcript#genomic2cdna method converts genomic coordinates to cDNA coordinates for this transcript.


Arguments:

  • pos

    position on the chromosome (required)

Returns


384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
# File 'lib/ensembl/core/transcript.rb', line 384

def genomic2cdna(pos)
  #FIXME: Still have to check for when pos is outside of scope of cDNA.
  # Identify the exon we're looking at.
  exon_with_target = self.exon_for_genomic_position(pos)
  
  accumulated_position = 0
  self.exons.each do |exon|
    if exon == exon_with_target
      accumulated_position += ( pos - exon.start )
      return accumulated_position
    else
      accumulated_position += exon.length
    end
  end
  return RuntimeError, "Position outside of cDNA scope"
end

#genomic2cds(pos) ⇒ Object

DESCRIPTION

The Transcript#genomic2cds method converts genomic coordinates to CDS coordinates for this transcript.


Arguments:

  • pos

    position on the chromosome (required)

Returns


408
409
410
# File 'lib/ensembl/core/transcript.rb', line 408

def genomic2cds(pos)
  return self.genomic2cdna(pos) - self.coding_region_cdna_start
end

#genomic2pep(pos) ⇒ Object

DESCRIPTION

The Transcript#genomic2pep method converts genomic coordinates to peptide coordinates for this transcript.


Arguments:

  • pos

    position on the chromosome (required)

Returns

Raises:

  • (NotImplementedError)


419
420
421
# File 'lib/ensembl/core/transcript.rb', line 419

def genomic2pep(pos)
  raise NotImplementedError
end

#intronsObject

The Transcript#introns methods returns the introns for this transcript


Arguments

none

Returns

sorted array of Intron objects



114
115
116
117
118
119
120
121
122
123
124
125
# File 'lib/ensembl/core/transcript.rb', line 114

def introns
  if @introns.nil?
    @introns = Array.new
    if self.exons.length > 1
      self.exons.each_with_index do |exon, index|
        next if index == 0
        @introns.push(Intron.new(self.exons[index - 1], exon))
      end
    end
  end
  return @introns
end

#pep2genomic(pos) ⇒ Object

DESCRIPTION

The Transcript#pep2genomic method converts peptide coordinates to genomic coordinates for this transcript.


Arguments:

  • pos

    position on the peptide (required)

Returns

Raises:

  • (NotImplementedError)


373
374
375
# File 'lib/ensembl/core/transcript.rb', line 373

def pep2genomic(pos)
  raise NotImplementedError
end

#protein_seqObject

DESCRIPTION

The Transcript#protein_seq method returns the sequence of the protein of the transcript.



222
223
224
# File 'lib/ensembl/core/transcript.rb', line 222

def protein_seq
  return Bio::Sequence::NA.new(self.cds_seq).translate.seq
end

#seqObject

DESCRIPTION

The Transcript#seq method returns the full sequence of all concatenated exons.



186
187
188
189
190
191
192
193
194
# File 'lib/ensembl/core/transcript.rb', line 186

def seq
  if @seq.nil?
    @seq = ''
    self.exons.each do |exon|
      @seq += exon.seq
    end
  end
  return @seq
end

#stable_idObject

The Transcript#stable_id method returns the stable ID of the transcript.


Arguments

none

Returns

String



131
132
133
# File 'lib/ensembl/core/transcript.rb', line 131

def stable_id
	return self.transcript_stable_id.stable_id
end

#three_prime_utr_seqObject

DESCRIPTION

The Transcript#three_prime_utr_seq method returns the sequence of the 3’UTR of the transcript.



215
216
217
# File 'lib/ensembl/core/transcript.rb', line 215

def three_prime_utr_seq
  return self.seq[self.coding_region_cdna_end..-1]
end