Class: Ensembl::Core::Slice
- Inherits:
-
Object
- Object
- Ensembl::Core::Slice
- Defined in:
- lib/ensembl/core/slice.rb,
lib/ensembl/core/project.rb
Overview
DESCRIPTION
From the perl API tutorial (www.ensembl.org/info/software/core/core_tutorial.html): “A Slice object represents a continuous region of a genome. Slices can be used to obtain sequence, features or other information from a particular region of interest.”
In contrast to almost all other classes of Ensembl::Core, the Slice class is not based on ActiveRecord.
USAGE
chr4 = SeqRegion.find_by_name('4')
my_slice = Slice.new(chr4, 95000, 98000, -1)
puts my_slice.display_name #--> 'chromosome:4:Btau_3.1:95000:98000:1'
Instance Attribute Summary collapse
-
#seq ⇒ Object
DESCRIPTION Get the sequence of the Slice as a Bio::Sequence::NA object.
-
#seq_region ⇒ Object
Returns the value of attribute seq_region.
-
#start ⇒ Object
Returns the value of attribute start.
-
#stop ⇒ Object
Returns the value of attribute stop.
-
#strand ⇒ Object
Returns the value of attribute strand.
Class Method Summary collapse
-
.fetch_all(coord_system_name = 'chromosome', version = nil) ⇒ Object
DESCRIPTION Create an array of all Slices for a given coordinate system.
-
.fetch_by_gene_stable_id(gene_stable_id, flanking_seq_length = 0) ⇒ Object
DESCRIPTION Create a Slice based on a Gene.
-
.fetch_by_region(coord_system_name, seq_region_name, start = nil, stop = nil, strand = 1, version = nil) ⇒ Object
DESCRIPTION Create a Slice without first creating the SeqRegion object.
-
.fetch_by_transcript_stable_id(transcript_stable_id, flanking_seq_length = 0) ⇒ Object
DESCRIPTION Create a Slice based on a Transcript.
Instance Method Summary collapse
-
#display_name ⇒ Object
(also: #to_s)
DESCRIPTION The display_name method returns a full name of this slice, containing the name of the coordinate system, the sequence region, start and stop positions on that sequence region and the strand.
-
#dna_align_features(analysis_name = nil) ⇒ Object
DESCRIPTION Get all DnaAlignFeatures that are located on a Slice for a given Analysis.
-
#excise(ranges) ⇒ Object
DESCRIPTION The Slice#excise method removes a bit of a slice and returns the remainder as separate slices.
-
#get_objects(target_class, table_name, inclusive = false) ⇒ Object
Don’t use this method yourself.
-
#initialize(seq_region, start = 1, stop = seq_region.length, strand = 1) ⇒ Slice
constructor
DESCRIPTION Create a new Slice object from scratch.
-
#length ⇒ Object
DESCRIPTION Get the length of a slice.
-
#method_missing(method_name, *args) ⇒ Object
– As there should be ‘getters’ for a lot of classes, we’ll implement this with method_missing.
-
#misc_features(code) ⇒ Object
DESCRIPTION Get all MiscFeatures that are located on a Slice for a given MiscSet.
-
#overlaps?(other_slice) ⇒ Boolean
DESCRIPTION The Slice#overlaps? method checks if this slice overlaps another one.
-
#project(coord_system_name) ⇒ Object
DESCRIPTION The Slice#project method is used to transfer coordinates from one coordinate system to another.
-
#protein_align_features(analysis_name) ⇒ Object
DESCRIPTION Get all ProteinAlignFeatures that are located on a Slice for a given Analysis.
- #repeatmasked_seq ⇒ Object
-
#split(max_size = 100000, overlap = 0) ⇒ Object
DESCRIPTION Creates overlapping subslices for a given Slice.
-
#sub_slice(start = self.start, stop = self.stop) ⇒ Object
DESCRIPTION Take a sub_slice from an existing one.
-
#within?(other_slice) ⇒ Boolean
DESCRIPTION The Slice#within? method checks if this slice is contained withing another one.
Constructor Details
#initialize(seq_region, start = 1, stop = seq_region.length, strand = 1) ⇒ Slice
DESCRIPTION
Create a new Slice object from scratch.
USAGE
chr4 = SeqRegion.find_by_name('4')
my_slice = Slice.new(chr4, 95000, 98000, -1)
Arguments:
-
seq_region: SeqRegion object
-
start: start position of the Slice on the SeqRegion (default = 1)
-
stop: stop position of the Slice on the SeqRegion (default: end of SeqRegion)
-
strand: strand of the Slice relative to the SeqRegion (default = 1)
- Returns
-
Slice object
46 47 48 49 50 51 52 53 54 55 56 57 58 |
# File 'lib/ensembl/core/slice.rb', line 46 def initialize(seq_region, start = 1, stop = seq_region.length, strand = 1) if start.nil? start = 1 end if stop.nil? stop = seq_region.length end unless seq_region.class == Ensembl::Core::SeqRegion raise 'First argument has to be a Ensembl::Core::SeqRegion object' end @seq_region, @start, @stop, @strand = seq_region, start, stop, strand @seq = nil end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(method_name, *args) ⇒ Object
– As there should be ‘getters’ for a lot of classes, we’ll implement this with method_missing. For some of the original methods, see the end of this file.
The optional argument is either ‘true’ or ‘false’ (default = false). False if the features have to be completely contained within the slice; true if just a partly overlap is sufficient. ++ Don’t use this method yourself.
414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 |
# File 'lib/ensembl/core/slice.rb', line 414 def method_missing(method_name, *args) table_name = method_name.to_s.singularize class_name = table_name.camelcase # Convert to the class object target_class = nil ObjectSpace.each_object(Class) do |o| if o.name =~ /^Ensembl::Core::#{class_name}$/ target_class = o end end # If it exists, see if it implements Sliceable if ! target_class.nil? and target_class.include?(Sliceable) inclusive = false if [TrueClass, FalseClass].include?(args[0].class) inclusive = args[0] end return self.get_objects(target_class, table_name, inclusive) end raise NoMethodError end |
Instance Attribute Details
#seq ⇒ Object
DESCRIPTION
Get the sequence of the Slice as a Bio::Sequence::NA object.
If the Slice is on a CoordSystem that is not seq_level, it will try to project it coordinates to the CoordSystem that does. At this moment, this is only done if there is a direct link between the two coordinate systems. (The perl API allows for following an indirect link as well.)
Caution: Bio::Sequence::NA makes the sequence downcase!!
USAGE
my_slice.seq.seq.to_s
- Arguments
-
none
- Returns
-
Bio::Sequence::NA object
322 323 324 |
# File 'lib/ensembl/core/slice.rb', line 322 def seq @seq end |
#seq_region ⇒ Object
Returns the value of attribute seq_region.
26 27 28 |
# File 'lib/ensembl/core/slice.rb', line 26 def seq_region @seq_region end |
#start ⇒ Object
Returns the value of attribute start.
26 27 28 |
# File 'lib/ensembl/core/slice.rb', line 26 def start @start end |
#stop ⇒ Object
Returns the value of attribute stop.
26 27 28 |
# File 'lib/ensembl/core/slice.rb', line 26 def stop @stop end |
#strand ⇒ Object
Returns the value of attribute strand.
26 27 28 |
# File 'lib/ensembl/core/slice.rb', line 26 def strand @strand end |
Class Method Details
.fetch_all(coord_system_name = 'chromosome', version = nil) ⇒ Object
DESCRIPTION
Create an array of all Slices for a given coordinate system.
USAGE
slices = Slice.fetch_all('chromosome')
Arguments:
- coord_system_name
-
name of coordinate system (default = chromosome)
- coord_system_version
-
version of coordinate system (default = nil)
- Returns
-
an array of Ensembl::Core::Slice objects
146 147 148 149 150 151 152 153 154 155 156 157 158 159 |
# File 'lib/ensembl/core/slice.rb', line 146 def self.fetch_all(coord_system_name = 'chromosome', version = nil) answer = Array.new if version.nil? coord_system = Ensembl::Core::CoordSystem.find_by_name(coord_system_name) else coord_system = Ensembl::Core::CoordSystem.find_by_name_and_version(coord_system_name, version) end coord_system.seq_regions.each do |seq_region| answer.push(Ensembl::Core::Slice.new(seq_region)) end return answer end |
.fetch_by_gene_stable_id(gene_stable_id, flanking_seq_length = 0) ⇒ Object
DESCRIPTION
Create a Slice based on a Gene
USAGE
my_slice = Slice.fetch_by_gene_stable_id('ENSG00000184895')
Arguments:
-
gene_stable_id: Ensembl gene stable_id (required)
- Returns
-
Ensembl::Core::Slice object
109 110 111 112 113 114 115 |
# File 'lib/ensembl/core/slice.rb', line 109 def self.fetch_by_gene_stable_id(gene_stable_id, flanking_seq_length = 0) gene_stable_id = Ensembl::Core::GeneStableId.find_by_stable_id(gene_stable_id) gene = gene_stable_id.gene seq_region = gene.seq_region return Ensembl::Core::Slice.new(seq_region, gene.seq_region_start - flanking_seq_length, gene.seq_region_end + flanking_seq_length, gene.seq_region_strand) end |
.fetch_by_region(coord_system_name, seq_region_name, start = nil, stop = nil, strand = 1, version = nil) ⇒ Object
DESCRIPTION
Create a Slice without first creating the SeqRegion object.
USAGE
my_slice_1 = Slice.fetch_by_region('chromosome','4',95000,98000,1)
Arguments:
-
coord_system: name of CoordSystem (required)
-
seq_region: name of SeqRegion (required)
-
start: start of Slice on SeqRegion (default = 1)
-
stop: stop of Slice on SeqRegion (default = end of SeqRegion)
-
strand: strand of Slice on SeqRegion
- Returns
-
Ensembl::Core::Slice object
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
# File 'lib/ensembl/core/slice.rb', line 74 def self.fetch_by_region(coord_system_name, seq_region_name, start = nil, stop = nil, strand = 1, version = nil) all_coord_systems = Ensembl::Core::CoordSystem.find_all_by_name(coord_system_name) coord_system = nil if version.nil? # Take the version with the lowest rank coord_system = all_coord_systems.sort_by{|cs| cs.version}.reverse.shift else coord_system = all_coord_systems.select{|cs| cs.version == version}[0] end unless coord_system.class == Ensembl::Core::CoordSystem = "Couldn't find a Ensembl::Core::CoordSystem object with name '" + coord_system_name + "'" if ! version.nil? += " and version '" + version + "'" end raise end seq_region = Ensembl::Core::SeqRegion.find_by_name_and_coord_system_id(seq_region_name, coord_system.id) #seq_region = Ensembl::Core::SeqRegion.find_by_sql("SELECT * FROM seq_region WHERE name = '" + seq_region_name + "' AND coord_system_id = " + coord_system.id.to_s)[0] unless seq_region.class == Ensembl::Core::SeqRegion raise "Couldn't find a Ensembl::Core::SeqRegion object with the name '" + seq_region_name + "'" end return Ensembl::Core::Slice.new(seq_region, start, stop, strand) end |
.fetch_by_transcript_stable_id(transcript_stable_id, flanking_seq_length = 0) ⇒ Object
DESCRIPTION
Create a Slice based on a Transcript
USAGE
my_slice = Slice.fetch_by_transcript_stable_id('ENST00000383673')
Arguments:
-
transcript_stable_id: Ensembl transcript stable_id (required)
- Returns
-
Ensembl::Core::Slice object
127 128 129 130 131 132 133 |
# File 'lib/ensembl/core/slice.rb', line 127 def self.fetch_by_transcript_stable_id(transcript_stable_id, flanking_seq_length = 0) transcript_stable_id = Ensembl::Core::TranscriptStableId.find_by_stable_id(transcript_stable_id) transcript = transcript_stable_id.transcript seq_region = transcript.seq_region return Ensembl::Core::Slice.new(seq_region, transcript.seq_region_start - flanking_seq_length, transcript.seq_region_end + flanking_seq_length, transcript.seq_region_strand) end |
Instance Method Details
#display_name ⇒ Object Also known as: to_s
DESCRIPTION
The display_name method returns a full name of this slice, containing the name of the coordinate system, the sequence region, start and stop positions on that sequence region and the strand. E.g. for a slice of bovine chromosome 4 from position 95000 to 98000 on the reverse strand, the display_name would look like: chromosome:4:Btau_3.1:95000:98000:-1
USAGE
puts my_slice.display_name
- Arguments
-
none
- Result
-
String
191 192 193 |
# File 'lib/ensembl/core/slice.rb', line 191 def display_name return [self.seq_region.coord_system.name, self.seq_region.coord_system.version, self.seq_region.name, self.start.to_s, self.stop.to_s, self.strand.to_s].join(':') end |
#dna_align_features(analysis_name = nil) ⇒ Object
DESCRIPTION
Get all DnaAlignFeatures that are located on a Slice for a given Analysis.
Pitfall: just looks at the CoordSystem that the Slice is located on. For example, if a Slice is located on a SeqRegion on the ‘chromosome’ CoordSystem, but all dna_align_features are annotated on SeqRegions of the ‘scaffold’ CoordSystem, this method will return an empty array.
USAGE
my_slice.dna_align_features('Vertrna').each do |feature|
puts feature.to_yaml
end
Arguments:
-
code: name of analysis
- Returns
-
array of DnaAlignFeature objects
561 562 563 564 565 566 567 568 |
# File 'lib/ensembl/core/slice.rb', line 561 def dna_align_features(analysis_name = nil) if analysis_name.nil? return DnaAlignFeature.find_by_sql('SELECT * FROM dna_align_feature WHERE seq_region_id = ' + self.seq_region.id.to_s + ' AND seq_region_start >= ' + self.start.to_s + ' AND seq_region_end <= ' + self.stop.to_s) else analysis = Analysis.find_by_logic_name(analysis_name) return DnaAlignFeature.find_by_sql('SELECT * FROM dna_align_feature WHERE seq_region_id = ' + self.seq_region.id.to_s + ' AND seq_region_start >= ' + self.start.to_s + ' AND seq_region_end <= ' + self.stop.to_s + ' AND analysis_id = ' + analysis.id.to_s) end end |
#excise(ranges) ⇒ Object
DESCRIPTION
The Slice#excise method removes a bit of a slice and returns the remainder as separate slices.
USAGE
original_slice = Slice.fetch_by_region('chromosome','X',1,10000)
new_slices = original_slice.excise([500..750, 1050..1075])
new_slices.each do |s|
puts s.display_name
end
# result:
# chromosome:X:1:499:1
# chromosome:X:751:1049:1
# chromosome:X:1076:10000:1
Arguments:
-
ranges: array of ranges (required)
- Returns
-
array of Slice objects
277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 |
# File 'lib/ensembl/core/slice.rb', line 277 def excise(ranges) if ranges.class != Array raise RuntimeError, "Argument should be an array of ranges" end ranges.each do |r| if r.class != Range raise RuntimeError, "Argument should be an array of ranges" end end answer = Array.new previous_excised_stop = self.start - 1 ranges.sort_by{|r| r.first}.each do |r| subslice_start = previous_excised_stop + 1 if subslice_start <= r.first - 1 answer.push(Slice.new(self.seq_region, subslice_start, r.first - 1)) end previous_excised_stop = r.last if r.last > self.stop return answer end end subslice_start = previous_excised_stop + 1 answer.push(Slice.new(self.seq_region, subslice_start, self.stop)) return answer end |
#get_objects(target_class, table_name, inclusive = false) ⇒ Object
Don’t use this method yourself.
440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 |
# File 'lib/ensembl/core/slice.rb', line 440 def get_objects(target_class, table_name, inclusive = false) answer = Array.new # Get all the coord_systems with this type of features on them coord_system_ids_with_features = MetaCoord.find_all_by_table_name(table_name).collect{|mc| mc.coord_system_id} # Get the features of the original slice if coord_system_ids_with_features.include?(self.seq_region.coord_system_id) sql = '' if inclusive sql = "SELECT * FROM \#{table_name}\nWHERE seq_region_id = \#{self.seq_region.id.to_s}\nAND (( seq_region_start BETWEEN \#{self.start.to_s} AND \#{self.stop.to_s} )\nOR ( seq_region_end BETWEEN \#{self.start.to_s} AND \#{self.stop.to_s} )\nOR ( seq_region_start <= \#{self.start.to_s} AND seq_region_end >= \#{self.stop.to_s} )\n )\n" else sql = "SELECT * FROM \#{table_name}\nWHERE seq_region_id = \#{self.seq_region.id.to_s}\nAND seq_region_start >= \#{self.start.to_s}\nAND seq_region_end <= \#{self.stop.to_s} \n" end answer.push(target_class.find_by_sql(sql)) coord_system_ids_with_features.delete(self.seq_region.coord_system_id) end # Transform the original slice to other coord systems and get those # features as well. At the moment, only 'direct' projections can be made. # Later, I'm hoping to add functionality for following a path from one # coord_system to another if they're not directly linked in the assembly # table. coord_system_ids_with_features.each do |target_coord_system_id| target_slices = self.project(CoordSystem.find(target_coord_system_id).name) target_slices.each do |slice| if slice.class == Slice if inclusive sql = "SELECT * FROM \#{table_name}\nWHERE seq_region_id = \#{slice.seq_region.id.to_s}\nAND (( seq_region_start BETWEEN \#{slice.start.to_s} AND \#{slice.stop.to_s} )\nOR ( seq_region_end BETWEEN \#{slice.start.to_s} AND \#{slice.stop.to_s} )\nOR ( seq_region_start <= \#{slice.start.to_s} AND seq_region_end >= \#{slice.stop.to_s} )\n )\n" else sql = "SELECT * FROM \#{table_name}\nWHERE seq_region_id = \#{slice.seq_region.id.to_s}\nAND seq_region_start >= \#{slice.start.to_s}\nAND seq_region_end <= \#{slice.stop.to_s} \n" end answer.push(target_class.find_by_sql(sql)) end end end answer.flatten! answer.uniq! return answer end |
#length ⇒ Object
DESCRIPTION
Get the length of a slice
USAGE
chr4 = SeqRegion.find_by_name('4')
my_slice = Slice.new(chr4, 95000, 98000, -1)
puts my_slice.length
- Arguments
-
none
- Returns
-
Integer
175 176 177 |
# File 'lib/ensembl/core/slice.rb', line 175 def length return self.stop - self.start + 1 end |
#misc_features(code) ⇒ Object
DESCRIPTION
Get all MiscFeatures that are located on a Slice for a given MiscSet.
Pitfall: just looks at the CoordSystem that the Slice is located on. For example, if a Slice is located on a SeqRegion on the ‘chromosome’ CoordSystem, but all misc_features are annotated on SeqRegions of the ‘scaffold’ CoordSystem, this method will return an empty array.
USAGE
my_slice.misc_features('encode').each do |feature|
puts feature.to_yaml
end
Arguments:
-
code: code of MiscSet
- Returns
-
array of MiscFeature objects
525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 |
# File 'lib/ensembl/core/slice.rb', line 525 def misc_features(code) answer = Array.new if code.nil? self.seq_region.misc_features.each do |mf| if mf.seq_region_start > self.start and mf.seq_region_end < self.stop answer.push(mf) end end else self.seq_region.misc_features.each do |mf| if mf.misc_sets[0].code == code if mf.seq_region_start > self.start and mf.seq_region_end < self.stop answer.push(mf) end end end end return answer end |
#overlaps?(other_slice) ⇒ Boolean
DESCRIPTION
The Slice#overlaps? method checks if this slice overlaps another one. The other slice has to be on the same coordinate system
USAGE
slice_a = Slice.fetch_by_region('chromosome','X',1,1000)
slice_b = Slice.fetch_by_region('chromosome','X',900,1500)
if slice_a.overlaps?(slice_b)
puts "There slices overlap"
end
- Arguments
-
another slice
- Returns
-
true or false
209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 |
# File 'lib/ensembl/core/slice.rb', line 209 def overlaps?(other_slice) if ! other_slice.class == Slice raise RuntimeError, "The Slice#overlaps? method takes a Slice object as its arguments." end if self.seq_region.coord_system != other_slice.seq_region.coord_system raise RuntimeError, "The argument slice of Slice#overlaps? has to be in the same coordinate system, but were " + self.seq_region.coord_system.name + " and " + other_slice.seq_region.coord_system.name end self_range = self.start .. self.stop other_range = other_slice.start .. other_slice.stop if self_range.include?(other_slice.start) or other_range.include?(self.start) return true else return false end end |
#project(coord_system_name) ⇒ Object
DESCRIPTION
The Slice#project method is used to transfer coordinates from one coordinate system to another. Suppose you have a slice on a contig in human (let’s say on contig AC000031.6.1.38703) and you want to know the coordinates on the chromosome. This is a projection of coordinates from a higher ranked coordinate system to a lower ranked coordinate system. Projections can also be done from a chromosome to the contig level. However, it might be possible that more than one contig has to be included and that there exist gaps between the contigs. The output of this method therefore is an array of Slice and Gap objects.
At the moment, projections can only be done if the two coordinate systems are linked directly in the ‘assembly’ table.
USAGE
# Get a contig slice in cow and project to scaffold level
# (i.e. going from a high rank coord system to a lower rank coord
# system)
source_slice = Slice.fetch_by_region('contig', 'AAFC03020247', 42, 2007)
target_slices = source_slice.project('scaffold')
puts target_slices.length #--> 1
puts target_slices[0].display_name #--> scaffold:ChrUn.003.3522:6570:8535:1
# Get a chromosome slice in cow and project to scaffold level
# (i.e. going from a low rank coord system to a higher rank coord
# system)
# The region 96652152..98000000 on BTA4 is covered by 2 scaffolds
# that are separated by a gap.
source_slice = Slice.fetch_by_region('chromosome','4', 96652152, 98000000)
target_slices = source_slice.project('scaffold')
puts target_slices.length #--> 3
first_bit, second_bit, third_bit = target_slices
puts first_bit.display_name #--> scaffold:Btau_3.1:Chr4.003.105:42:599579:1
puts second_bit.class #--> Gap
puts third_bit.display_name #--> scaffold:Btau_3.1:Chr4.003.106:1:738311:1
Arguments:
- coord_system_name
-
name of coordinate system to project
coordinates to
- Returns
-
an array consisting of Slices and, if necessary, Gaps
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 |
# File 'lib/ensembl/core/project.rb', line 53 def project(coord_system_name) answer = Array.new # an array of slices source_coord_system = self.seq_region.coord_system target_coord_system = nil if coord_system_name == 'toplevel' target_coord_system = CoordSystem.find_toplevel coord_system_name = target_coord_system.name elsif coord_system_name == 'seqlevel' target_coord_system = CoordSystem.find_seqlevel coord_system_name = target_coord_system.name else target_coord_system = CoordSystem.find_by_name(coord_system_name) end if target_coord_system.rank < source_coord_system.rank # We're going from component to assembly, which is easy. assembly_links = self.seq_region.assembly_links_as_component(coord_system_name) if assembly_links.length == 0 return [] else assembly_links.each do |assembly_link| target_seq_region = assembly_link.asm_seq_region target_start = self.start + assembly_link.asm_start - assembly_link.cmp_start target_stop = self.stop + assembly_link.asm_start - assembly_link.cmp_start target_strand = self.strand * assembly_link.ori # 1x1=>1, 1x-1=>-1, -1x-1=>1 answer.push(Slice.new(target_seq_region, target_start, target_stop, target_strand)) end end else # If we're going from assembly to component, the answer of the target method # is an array consisting of Slices intermitted with Gaps. # ASSEMBLY_EXCEPTIONS # CAUTION: there are exceptions to the assembly (stored in the assembly_exception) # table which make things a little bit more difficult... For example, # in human, the assembly data for the pseudo-autosomal region (PAR) of # Y is *not* stored in the assembly table. Instead, there is a record # in the assembly_exception table that says: "For chr Y positions 1 # to 2709520, use chr X:1-2709520 for the assembly data." # As a solution, what we'll do here, is split the assembly up in blocks: # if a slice covers both the PAR and the allosomal region, we'll make # two subslices (let's call them blocks not to intercede with the # Slice#subslices method) and project these independently. assembly_exceptions = AssemblyException.find_all_by_seq_region_id(self.seq_region.id) if assembly_exceptions.length > 0 # Check if this bit of the original slice is covered in the # assembly_exception table. overlapping_exceptions = Array.new assembly_exceptions.each do |ae| if Slice.new(self.seq_region, ae.seq_region_start, ae.seq_region_end).overlaps?(self) if ae.exc_type == 'HAP' raise NotImplementedError, "The haplotype exceptions are not implemented (yet). You can't project this slice." end overlapping_exceptions.push(ae) end end if overlapping_exceptions.length > 0 # First get all assembly blocks from chromosome Y source_assembly_blocks = self.excise(overlapping_exceptions.collect{|e| e.seq_region_start .. e.seq_region_end}) # And insert the blocks of chromosome X all_assembly_blocks = Array.new #both for chr X and Y # First do all exceptions between the first and last block previous_block = nil source_assembly_blocks.sort_by{|b| b.start}.each do |b| if previous_block.nil? all_assembly_blocks.push(b) previous_block = b next end # Find the exception record exception = nil assembly_exceptions.each do |ae| if ae.seq_region_end == b.start - 1 exception = ae break end end new_slice_start = exception.exc_seq_region_start + ( previous_block.stop - exception.seq_region_start ) new_slice_stop = exception.exc_seq_region_start + ( b.start - exception.seq_region_start ) new_slice_strand = self.strand * exception.ori new_slice = Slice.fetch_by_region(self.seq_region.coord_system.name, SeqRegion.find(exception.exc_seq_region_id).name, new_slice_start, new_slice_stop, new_slice_strand) all_assembly_blocks.push(new_slice) all_assembly_blocks.push(b) previous_block = b end # And then see if we have to add an additional one at the start or end first_block = source_assembly_blocks.sort_by{|b| b.start}[0] if first_block.start > self.start exception = assembly_exceptions.sort_by{|ae| ae.seq_region_start}[0] new_slice_start = exception.exc_seq_region_start + ( self.start - exception.seq_region_start ) new_slice_stop = exception.exc_seq_region_start + ( first_block.start - 1 - exception.seq_region_start ) new_slice_strand = self.strand * exception.ori new_slice = Slice.fetch_by_region(self.seq_region.coord_system.name, SeqRegion.find(exception.exc_seq_region_id).name, new_slice_start, new_slice_stop, new_slice_strand) all_assembly_blocks.unshift(new_slice) end last_block = source_assembly_blocks.sort_by{|b| b.start}[-1] if last_block.stop < self.stop exception = assembly_exceptions.sort_by{|ae| ae.seq_region_start}[-1] new_slice_start = exception.exc_seq_region_start + ( last_block.stop + 1 - exception.seq_region_start ) new_slice_stop = exception.exc_seq_region_start + ( self.stop - exception.seq_region_start ) new_slice_strand = self.strand * exception.ori new_slice = Slice.fetch_by_region(self.seq_region.coord_system.name, SeqRegion.find(exception.exc_seq_region_id).name, new_slice_start, new_slice_stop, new_slice_strand) all_assembly_blocks.shift(new_slice) end answer = Array.new all_assembly_blocks.each do |b| answer.push(b.project(coord_system_name)) end answer.flatten! return answer end end # END OF ASSEMBLY_EXCEPTIONS # Get all AssemblyLinks starting from this assembly and for which # the cmp_seq_region.coord_system is what we want. assembly_links = self.seq_region.assembly_links_as_assembly(coord_system_name) # Now reject all the components that lie _before_ the source, then # reject all the components that lie _after_ the source. # Then sort based on their positions. sorted_overlapping_assembly_links = assembly_links.reject{|al| al.asm_end < self.start}.reject{|al| al.asm_start > self.stop}.sort_by{|al| al.asm_start} if sorted_overlapping_assembly_links.length == 0 return [] end # What we'll do, is create slices for all the underlying components, # including the first and the last one. At first, the first and last # components are added in their entirity and will only be cropped afterwards. previous_stop = nil sorted_overlapping_assembly_links.each_index do |i| this_link = sorted_overlapping_assembly_links[i] if i == 0 answer.push(Slice.new(this_link.cmp_seq_region, this_link.cmp_start, this_link.cmp_end, this_link.ori)) next end previous_link = sorted_overlapping_assembly_links[i-1] # If there is a gap with the previous link: add a gap if this_link.asm_start > ( previous_link.asm_end + 1 ) gap_size = this_link.asm_start - previous_link.asm_end - 1 answer.push(Gap.new(CoordSystem.find_by_name(coord_system_name), gap_size)) end # And add the component itself as a Slice answer.push(Slice.new(this_link.cmp_seq_region, this_link.cmp_start, this_link.cmp_end, this_link.ori)) end # Now see if we have to crop the first and/or last slice first_link = sorted_overlapping_assembly_links[0] if self.start > first_link.asm_start if first_link.ori == -1 answer[0].stop = first_link.cmp_start + ( first_link.asm_end - self.start ) else answer[0].start = first_link.cmp_start + ( self.start - first_link.asm_start ) end end last_link = sorted_overlapping_assembly_links[-1] if self.stop < last_link.asm_end if last_link.ori == -1 answer[-1].start = last_link.cmp_start + ( last_link.asm_end - self.stop) else answer[-1].stop = last_link.cmp_start + ( self.stop - last_link.asm_start ) end end # And check if we have to add Ns at the front and/or back if self.start < first_link.asm_start gap_size = first_link.asm_start - self.start answer.unshift(Gap.new(CoordSystem.find_by_name(coord_system_name), gap_size)) end if self.stop > last_link.asm_end gap_size = self.stop - last_link.asm_end answer.push(Gap.new(CoordSystem.find_by_name(coord_system_name), gap_size)) end end return answer end |
#protein_align_features(analysis_name) ⇒ Object
DESCRIPTION
Get all ProteinAlignFeatures that are located on a Slice for a given Analysis.
Pitfall: just looks at the CoordSystem that the Slice is located on. For example, if a Slice is located on a SeqRegion on the ‘chromosome’ CoordSystem, but all protein_align_features are annotated on SeqRegions of the ‘scaffold’ CoordSystem, this method will return an empty array.
USAGE
my_slice.protein_align_features('Uniprot').each do |feature|
puts feature.to_yaml
end
Arguments:
-
code: name of analysis
- Returns
-
array of ProteinAlignFeature objects
586 587 588 589 590 591 592 593 |
# File 'lib/ensembl/core/slice.rb', line 586 def protein_align_features(analysis_name) if analysis_name.nil? return ProteinAlignFeature.find_by_sql('SELECT * FROM protein_align_feature WHERE seq_region_id = ' + self.seq_region.id.to_s + ' AND seq_region_start >= ' + self.start.to_s + ' AND seq_region_end <= ' + self.stop.to_s) else analysis = Analysis.find_by_logic_name(analysis_name) return ProteinAlignFeature.find_by_sql('SELECT * FROM protein_align_feature WHERE seq_region_id = ' + self.seq_region.id.to_s + ' AND seq_region_start >= ' + self.start.to_s + ' AND seq_region_end <= ' + self.stop.to_s + ' AND analysis_id = ' + analysis.id.to_s) end end |
#repeatmasked_seq ⇒ Object
357 358 359 |
# File 'lib/ensembl/core/slice.rb', line 357 def repeatmasked_seq raise NotImplementedError end |
#split(max_size = 100000, overlap = 0) ⇒ Object
DESCRIPTION
Creates overlapping subslices for a given Slice.
USAGE
my_slice.split(50000, 250).each do |sub_slice|
puts sub_slice.display_name
end
Arguments:
-
max_size: maximal size of subslices (default: 100000)
-
overlap: overlap in bp between consecutive subslices (default: 0)
- Returns
-
array of Ensembl::Core::Slice objects
389 390 391 392 393 394 395 396 397 398 |
# File 'lib/ensembl/core/slice.rb', line 389 def split(max_size = 100000, overlap = 0) sub_slices = Array.new i = 0 self.start.step(self.length, max_size - overlap - 1) do |i| sub_slices.push(self.sub_slice(i, i + max_size - 1)) end i -= (overlap + 1) sub_slices.push(self.sub_slice(i + max_size)) return sub_slices end |
#sub_slice(start = self.start, stop = self.stop) ⇒ Object
DESCRIPTION
Take a sub_slice from an existing one.
USAGE
my_sub_slice = my_slice.sub_slice(400,500)
Arguments:
-
start: start of subslice relative to slice (default: start of slice)
-
stop: stop of subslice relative to slice (default: stop of slice)
- Returns
-
Ensembl::Core::Slice object
372 373 374 |
# File 'lib/ensembl/core/slice.rb', line 372 def sub_slice(start = self.start, stop = self.stop) return self.class.new(self.seq_region, start, stop, self.strand) end |
#within?(other_slice) ⇒ Boolean
DESCRIPTION
The Slice#within? method checks if this slice is contained withing another one. The other slice has to be on the same coordinate system
USAGE
slice_a = Slice.fetch_by_region('chromosome','X',1,1000)
slice_b = Slice.fetch_by_region('chromosome','X',900,950)
if slice_b.overlaps?(slice_a)
puts "Slice b is within slice a"
end
- Arguments
-
another slice
- Returns
-
true or false
240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 |
# File 'lib/ensembl/core/slice.rb', line 240 def within?(other_slice) if ! other_slice.class == Slice raise RuntimeError, "The Slice#overlaps? method takes a Slice object as its arguments." end if self.seq_region.coord_system != other_slice.seq_region.coord_system raise RuntimeError, "The argument slice of Slice#overlaps? has to be in the same coordinate system, but were " + self.seq_region.coord_system.name + " and " + other_slice.seq_region.coord_system.name end self_range = self.start .. self.stop other_range = other_slice.start .. other_slice.stop if other_range.include?(self.start) and other_range.include?(self.stop) return true else return false end end |