Class: Ensembl::Core::Slice

Inherits:
Object
  • Object
show all
Defined in:
lib/ensembl/core/slice.rb,
lib/ensembl/core/project.rb

Overview

DESCRIPTION

From the perl API tutorial (www.ensembl.org/info/software/core/core_tutorial.html): “A Slice object represents a continuous region of a genome. Slices can be used to obtain sequence, features or other information from a particular region of interest.”

In contrast to almost all other classes of Ensembl::Core, the Slice class is not based on ActiveRecord.

USAGE

chr4 = SeqRegion.find_by_name('4')
my_slice = Slice.new(chr4, 95000, 98000, -1)
puts my_slice.display_name    #--> 'chromosome:4:Btau_3.1:95000:98000:1'

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(seq_region, start = 1, stop = seq_region.length, strand = 1) ⇒ Slice

DESCRIPTION

Create a new Slice object from scratch.

USAGE

chr4 = SeqRegion.find_by_name('4')
my_slice = Slice.new(chr4, 95000, 98000, -1)

Arguments:

  • seq_region: SeqRegion object

  • start: start position of the Slice on the SeqRegion (default = 1)

  • stop: stop position of the Slice on the SeqRegion (default: end of SeqRegion)

  • strand: strand of the Slice relative to the SeqRegion (default = 1)

Returns

Slice object



46
47
48
49
50
51
52
53
54
55
56
57
58
# File 'lib/ensembl/core/slice.rb', line 46

def initialize(seq_region, start = 1, stop = seq_region.length, strand = 1)
  if start.nil?
    start = 1
  end
  if stop.nil?
    stop = seq_region.length
  end
  unless seq_region.class == Ensembl::Core::SeqRegion
    raise 'First argument has to be a Ensembl::Core::SeqRegion object'
  end
  @seq_region, @start, @stop, @strand = seq_region, start, stop, strand
  @seq = nil
end

Dynamic Method Handling

This class handles dynamic methods through the method_missing method

#method_missing(method_name, *args) ⇒ Object

– As there should be ‘getters’ for a lot of classes, we’ll implement this with method_missing. For some of the original methods, see the end of this file.

The optional argument is either ‘true’ or ‘false’ (default = false). False if the features have to be completely contained within the slice; true if just a partly overlap is sufficient. ++ Don’t use this method yourself.

Raises:

  • (NoMethodError)


414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
# File 'lib/ensembl/core/slice.rb', line 414

def method_missing(method_name, *args)
  table_name = method_name.to_s.singularize
  class_name = table_name.camelcase

  # Convert to the class object

  target_class = nil
  ObjectSpace.each_object(Class) do |o|
    if o.name =~ /^Ensembl::Core::#{class_name}$/
      target_class = o
    end
  end

  # If it exists, see if it implements Sliceable

  if ! target_class.nil? and target_class.include?(Sliceable)
    inclusive = false
    if [TrueClass, FalseClass].include?(args[0].class)
      inclusive = args[0]
    end
    return self.get_objects(target_class, table_name, inclusive)
  end

  raise NoMethodError

end

Instance Attribute Details

#seqObject

DESCRIPTION

Get the sequence of the Slice as a Bio::Sequence::NA object.

If the Slice is on a CoordSystem that is not seq_level, it will try to project it coordinates to the CoordSystem that does. At this moment, this is only done if there is a direct link between the two coordinate systems. (The perl API allows for following an indirect link as well.)

Caution: Bio::Sequence::NA makes the sequence downcase!!

USAGE

my_slice.seq.seq.to_s

Arguments

none

Returns

Bio::Sequence::NA object



322
323
324
# File 'lib/ensembl/core/slice.rb', line 322

def seq
  @seq
end

#seq_regionObject

Returns the value of attribute seq_region.



26
27
28
# File 'lib/ensembl/core/slice.rb', line 26

def seq_region
  @seq_region
end

#startObject

Returns the value of attribute start.



26
27
28
# File 'lib/ensembl/core/slice.rb', line 26

def start
  @start
end

#stopObject

Returns the value of attribute stop.



26
27
28
# File 'lib/ensembl/core/slice.rb', line 26

def stop
  @stop
end

#strandObject

Returns the value of attribute strand.



26
27
28
# File 'lib/ensembl/core/slice.rb', line 26

def strand
  @strand
end

Class Method Details

.fetch_all(coord_system_name = 'chromosome', version = nil) ⇒ Object

DESCRIPTION

Create an array of all Slices for a given coordinate system.

USAGE

slices = Slice.fetch_all('chromosome')

Arguments:

  • coord_system_name

    name of coordinate system (default = chromosome)

  • coord_system_version

    version of coordinate system (default = nil)

Returns

an array of Ensembl::Core::Slice objects



146
147
148
149
150
151
152
153
154
155
156
157
158
159
# File 'lib/ensembl/core/slice.rb', line 146

def self.fetch_all(coord_system_name = 'chromosome', version = nil)
  answer = Array.new
  if version.nil?
    coord_system = Ensembl::Core::CoordSystem.find_by_name(coord_system_name)
  else
    coord_system = Ensembl::Core::CoordSystem.find_by_name_and_version(coord_system_name, version)
  end

  coord_system.seq_regions.each do |seq_region|
    answer.push(Ensembl::Core::Slice.new(seq_region))
  end

  return answer
end

.fetch_by_gene_stable_id(gene_stable_id, flanking_seq_length = 0) ⇒ Object

DESCRIPTION

Create a Slice based on a Gene

USAGE

my_slice = Slice.fetch_by_gene_stable_id('ENSG00000184895')

Arguments:

  • gene_stable_id: Ensembl gene stable_id (required)

Returns

Ensembl::Core::Slice object



109
110
111
112
113
114
115
# File 'lib/ensembl/core/slice.rb', line 109

def self.fetch_by_gene_stable_id(gene_stable_id, flanking_seq_length = 0)
  gene_stable_id = Ensembl::Core::GeneStableId.find_by_stable_id(gene_stable_id)
  gene = gene_stable_id.gene
  seq_region = gene.seq_region

  return Ensembl::Core::Slice.new(seq_region, gene.seq_region_start - flanking_seq_length, gene.seq_region_end + flanking_seq_length, gene.seq_region_strand)
end

.fetch_by_region(coord_system_name, seq_region_name, start = nil, stop = nil, strand = 1, version = nil) ⇒ Object

DESCRIPTION

Create a Slice without first creating the SeqRegion object.

USAGE

my_slice_1 = Slice.fetch_by_region('chromosome','4',95000,98000,1)

Arguments:

  • coord_system: name of CoordSystem (required)

  • seq_region: name of SeqRegion (required)

  • start: start of Slice on SeqRegion (default = 1)

  • stop: stop of Slice on SeqRegion (default = end of SeqRegion)

  • strand: strand of Slice on SeqRegion

Returns

Ensembl::Core::Slice object



74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
# File 'lib/ensembl/core/slice.rb', line 74

def self.fetch_by_region(coord_system_name, seq_region_name, start = nil, stop = nil, strand = 1, version = nil)
  all_coord_systems = Ensembl::Core::CoordSystem.find_all_by_name(coord_system_name)
  coord_system = nil
  if version.nil? # Take the version with the lowest rank

    coord_system = all_coord_systems.sort_by{|cs| cs.version}.reverse.shift
  else
    coord_system = all_coord_systems.select{|cs| cs.version == version}[0]
  end
  unless coord_system.class == Ensembl::Core::CoordSystem
    message = "Couldn't find a Ensembl::Core::CoordSystem object with name '" + coord_system_name + "'"
    if ! version.nil?
      message += " and version '" + version + "'"
    end
    raise message
  end

  seq_region = Ensembl::Core::SeqRegion.find_by_name_and_coord_system_id(seq_region_name, coord_system.id)
  #seq_region = Ensembl::Core::SeqRegion.find_by_sql("SELECT * FROM seq_region WHERE name = '" + seq_region_name + "' AND coord_system_id = " + coord_system.id.to_s)[0]

  unless seq_region.class == Ensembl::Core::SeqRegion
    raise "Couldn't find a Ensembl::Core::SeqRegion object with the name '" + seq_region_name + "'"
  end

  return Ensembl::Core::Slice.new(seq_region, start, stop, strand)
end

.fetch_by_transcript_stable_id(transcript_stable_id, flanking_seq_length = 0) ⇒ Object

DESCRIPTION

Create a Slice based on a Transcript

USAGE

my_slice = Slice.fetch_by_transcript_stable_id('ENST00000383673')

Arguments:

  • transcript_stable_id: Ensembl transcript stable_id (required)

Returns

Ensembl::Core::Slice object



127
128
129
130
131
132
133
# File 'lib/ensembl/core/slice.rb', line 127

def self.fetch_by_transcript_stable_id(transcript_stable_id, flanking_seq_length = 0)
  transcript_stable_id = Ensembl::Core::TranscriptStableId.find_by_stable_id(transcript_stable_id)
  transcript = transcript_stable_id.transcript
  seq_region = transcript.seq_region

  return Ensembl::Core::Slice.new(seq_region, transcript.seq_region_start - flanking_seq_length, transcript.seq_region_end + flanking_seq_length, transcript.seq_region_strand)
end

Instance Method Details

#display_nameObject Also known as: to_s

DESCRIPTION

The display_name method returns a full name of this slice, containing the name of the coordinate system, the sequence region, start and stop positions on that sequence region and the strand. E.g. for a slice of bovine chromosome 4 from position 95000 to 98000 on the reverse strand, the display_name would look like: chromosome:4:Btau_3.1:95000:98000:-1

USAGE

puts my_slice.display_name

Arguments

none

Result

String



191
192
193
# File 'lib/ensembl/core/slice.rb', line 191

def display_name
  return [self.seq_region.coord_system.name, self.seq_region.coord_system.version, self.seq_region.name, self.start.to_s, self.stop.to_s, self.strand.to_s].join(':')
end

#dna_align_features(analysis_name = nil) ⇒ Object

DESCRIPTION

Get all DnaAlignFeatures that are located on a Slice for a given Analysis.

Pitfall: just looks at the CoordSystem that the Slice is located on. For example, if a Slice is located on a SeqRegion on the ‘chromosome’ CoordSystem, but all dna_align_features are annotated on SeqRegions of the ‘scaffold’ CoordSystem, this method will return an empty array.

USAGE

my_slice.dna_align_features('Vertrna').each do |feature|
  puts feature.to_yaml
end

Arguments:

  • code: name of analysis

Returns

array of DnaAlignFeature objects



561
562
563
564
565
566
567
568
# File 'lib/ensembl/core/slice.rb', line 561

def dna_align_features(analysis_name = nil)
  if analysis_name.nil?
    return DnaAlignFeature.find_by_sql('SELECT * FROM dna_align_feature WHERE seq_region_id = ' + self.seq_region.id.to_s + ' AND seq_region_start >= ' + self.start.to_s + ' AND seq_region_end <= ' + self.stop.to_s)
  else
    analysis = Analysis.find_by_logic_name(analysis_name)
    return DnaAlignFeature.find_by_sql('SELECT * FROM dna_align_feature WHERE seq_region_id = ' + self.seq_region.id.to_s + ' AND seq_region_start >= ' + self.start.to_s + ' AND seq_region_end <= ' + self.stop.to_s + ' AND analysis_id = ' + analysis.id.to_s)
  end
end

#excise(ranges) ⇒ Object

DESCRIPTION

The Slice#excise method removes a bit of a slice and returns the remainder as separate slices.

USAGE

original_slice = Slice.fetch_by_region('chromosome','X',1,10000)
new_slices = original_slice.excise([500..750, 1050..1075])
new_slices.each do |s|
  puts s.display_name
end

# result:
#   chromosome:X:1:499:1
#   chromosome:X:751:1049:1
#   chromosome:X:1076:10000:1

Arguments:

  • ranges: array of ranges (required)

Returns

array of Slice objects



277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
# File 'lib/ensembl/core/slice.rb', line 277

def excise(ranges)
  if ranges.class != Array
    raise RuntimeError, "Argument should be an array of ranges"
  end
  ranges.each do |r|
    if r.class != Range
      raise RuntimeError, "Argument should be an array of ranges"
    end
  end

  answer = Array.new
  previous_excised_stop = self.start - 1
  ranges.sort_by{|r| r.first}.each do |r|
    subslice_start = previous_excised_stop + 1
    if subslice_start <= r.first - 1
      answer.push(Slice.new(self.seq_region, subslice_start, r.first - 1))
    end
    previous_excised_stop = r.last
    if r.last > self.stop
      return answer
    end
  end
  subslice_start = previous_excised_stop + 1
  answer.push(Slice.new(self.seq_region, subslice_start, self.stop))
  return answer
end

#get_objects(target_class, table_name, inclusive = false) ⇒ Object

Don’t use this method yourself.



440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
# File 'lib/ensembl/core/slice.rb', line 440

def get_objects(target_class, table_name, inclusive = false)
  answer = Array.new

  
  # Get all the coord_systems with this type of features on them

  coord_system_ids_with_features = MetaCoord.find_all_by_table_name(table_name).collect{|mc| mc.coord_system_id}

  # Get the features of the original slice

  if coord_system_ids_with_features.include?(self.seq_region.coord_system_id)
    sql = ''
    if inclusive
      sql = "SELECT * FROM \#{table_name}\nWHERE seq_region_id = \#{self.seq_region.id.to_s}\nAND (( seq_region_start BETWEEN \#{self.start.to_s} AND \#{self.stop.to_s} )\nOR   ( seq_region_end BETWEEN \#{self.start.to_s} AND \#{self.stop.to_s} )\nOR   ( seq_region_start <= \#{self.start.to_s} AND seq_region_end >= \#{self.stop.to_s} )\n    )\n"
    else
      sql = "SELECT * FROM \#{table_name}\nWHERE seq_region_id = \#{self.seq_region.id.to_s}\nAND seq_region_start >= \#{self.start.to_s}\nAND seq_region_end <= \#{self.stop.to_s}   \n"
    end
    answer.push(target_class.find_by_sql(sql))
    coord_system_ids_with_features.delete(self.seq_region.coord_system_id)
  end

  # Transform the original slice to other coord systems and get those

  # features as well. At the moment, only 'direct' projections can be made.

  # Later, I'm hoping to add functionality for following a path from one

  # coord_system to another if they're not directly linked in the assembly

  # table.

  coord_system_ids_with_features.each do |target_coord_system_id|
    target_slices = self.project(CoordSystem.find(target_coord_system_id).name)
    target_slices.each do |slice|
      if slice.class == Slice
        if inclusive
          sql = "SELECT * FROM \#{table_name}\nWHERE seq_region_id = \#{slice.seq_region.id.to_s}\nAND (( seq_region_start BETWEEN \#{slice.start.to_s} AND \#{slice.stop.to_s} )\nOR   ( seq_region_end BETWEEN \#{slice.start.to_s} AND \#{slice.stop.to_s} )\nOR   ( seq_region_start <= \#{slice.start.to_s} AND seq_region_end >= \#{slice.stop.to_s} )\n    )\n"
        else
          sql = "SELECT * FROM \#{table_name}\nWHERE seq_region_id = \#{slice.seq_region.id.to_s}\nAND seq_region_start >= \#{slice.start.to_s}\nAND seq_region_end <= \#{slice.stop.to_s}   \n"
        end 
          answer.push(target_class.find_by_sql(sql))
      end
    end
  end

  answer.flatten!
  answer.uniq!

  return answer
end

#lengthObject

DESCRIPTION

Get the length of a slice

USAGE

chr4 = SeqRegion.find_by_name('4')
my_slice = Slice.new(chr4, 95000, 98000, -1)
puts my_slice.length

Arguments

none

Returns

Integer



175
176
177
# File 'lib/ensembl/core/slice.rb', line 175

def length
  return self.stop - self.start + 1
end

#misc_features(code) ⇒ Object

DESCRIPTION

Get all MiscFeatures that are located on a Slice for a given MiscSet.

Pitfall: just looks at the CoordSystem that the Slice is located on. For example, if a Slice is located on a SeqRegion on the ‘chromosome’ CoordSystem, but all misc_features are annotated on SeqRegions of the ‘scaffold’ CoordSystem, this method will return an empty array.

USAGE

my_slice.misc_features('encode').each do |feature|
  puts feature.to_yaml
end

Arguments:

  • code: code of MiscSet

Returns

array of MiscFeature objects



525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
# File 'lib/ensembl/core/slice.rb', line 525

def misc_features(code)
  answer = Array.new
  if code.nil?
    self.seq_region.misc_features.each do |mf|
      if mf.seq_region_start > self.start and mf.seq_region_end < self.stop
        answer.push(mf)
      end
    end
  else
    self.seq_region.misc_features.each do |mf|
      if mf.misc_sets[0].code == code
        if mf.seq_region_start > self.start and mf.seq_region_end < self.stop
          answer.push(mf)
        end
      end
    end
  end
  return answer
end

#overlaps?(other_slice) ⇒ Boolean

DESCRIPTION

The Slice#overlaps? method checks if this slice overlaps another one. The other slice has to be on the same coordinate system

USAGE

slice_a = Slice.fetch_by_region('chromosome','X',1,1000)
slice_b = Slice.fetch_by_region('chromosome','X',900,1500)
if slice_a.overlaps?(slice_b)
  puts "There slices overlap"
end

Arguments

another slice

Returns

true or false

Returns:

  • (Boolean)


209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
# File 'lib/ensembl/core/slice.rb', line 209

def overlaps?(other_slice)
  if ! other_slice.class == Slice
    raise RuntimeError, "The Slice#overlaps? method takes a Slice object as its arguments."
  end
  if self.seq_region.coord_system != other_slice.seq_region.coord_system
    raise RuntimeError, "The argument slice of Slice#overlaps? has to be in the same coordinate system, but were " + self.seq_region.coord_system.name + " and " + other_slice.seq_region.coord_system.name
  end

  self_range = self.start .. self.stop
  other_range = other_slice.start .. other_slice.stop

  if self_range.include?(other_slice.start) or other_range.include?(self.start)
    return true
  else
    return false
  end
end

#project(coord_system_name) ⇒ Object

DESCRIPTION

The Slice#project method is used to transfer coordinates from one coordinate system to another. Suppose you have a slice on a contig in human (let’s say on contig AC000031.6.1.38703) and you want to know the coordinates on the chromosome. This is a projection of coordinates from a higher ranked coordinate system to a lower ranked coordinate system. Projections can also be done from a chromosome to the contig level. However, it might be possible that more than one contig has to be included and that there exist gaps between the contigs. The output of this method therefore is an array of Slice and Gap objects.

At the moment, projections can only be done if the two coordinate systems are linked directly in the ‘assembly’ table.

USAGE

# Get a contig slice in cow and project to scaffold level
# (i.e. going from a high rank coord system to a lower rank coord
# system)
source_slice = Slice.fetch_by_region('contig', 'AAFC03020247', 42, 2007)
target_slices = source_slice.project('scaffold')
puts target_slices.length     #--> 1
puts target_slices[0].display_name #--> scaffold:ChrUn.003.3522:6570:8535:1

# Get a chromosome slice in cow and project to scaffold level
# (i.e. going from a low rank coord system to a higher rank coord
# system)
# The region 96652152..98000000 on BTA4 is covered by 2 scaffolds
# that are separated by a gap.
source_slice = Slice.fetch_by_region('chromosome','4', 96652152, 98000000)
target_slices = source_slice.project('scaffold')
puts target_slices.length     #--> 3
first_bit, second_bit, third_bit = target_slices
puts first_bit.display_name     #--> scaffold:Btau_3.1:Chr4.003.105:42:599579:1
puts second_bit.class       #--> Gap
puts third_bit.display_name     #--> scaffold:Btau_3.1:Chr4.003.106:1:738311:1

Arguments:

  • coord_system_name

    name of coordinate system to project

    coordinates to

Returns

an array consisting of Slices and, if necessary, Gaps



53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
# File 'lib/ensembl/core/project.rb', line 53

def project(coord_system_name)
  answer = Array.new # an array of slices

  source_coord_system = self.seq_region.coord_system
  target_coord_system = nil
  if coord_system_name == 'toplevel'
    target_coord_system = CoordSystem.find_toplevel
    coord_system_name = target_coord_system.name
  elsif coord_system_name == 'seqlevel'
    target_coord_system = CoordSystem.find_seqlevel
    coord_system_name = target_coord_system.name
  else
    target_coord_system = CoordSystem.find_by_name(coord_system_name)
  end

  if target_coord_system.rank < source_coord_system.rank
    # We're going from component to assembly, which is easy.

    assembly_links = self.seq_region.assembly_links_as_component(coord_system_name)
    
    if assembly_links.length == 0
      return []
    else
      assembly_links.each do |assembly_link|
        target_seq_region = assembly_link.asm_seq_region
        target_start = self.start + assembly_link.asm_start - assembly_link.cmp_start
        target_stop = self.stop + assembly_link.asm_start - assembly_link.cmp_start
        target_strand = self.strand * assembly_link.ori # 1x1=>1, 1x-1=>-1, -1x-1=>1

        
        answer.push(Slice.new(target_seq_region, target_start, target_stop, target_strand))
      end
    end
    
  else
    # If we're going from assembly to component, the answer of the target method

    # is an array consisting of Slices intermitted with Gaps.


    # ASSEMBLY_EXCEPTIONS

    # CAUTION: there are exceptions to the assembly (stored in the assembly_exception)

    # table which make things a little bit more difficult... For example,

    # in human, the assembly data for the pseudo-autosomal region (PAR) of

    # Y is *not* stored in the assembly table. Instead, there is a record

    # in the assembly_exception table that says: "For chr Y positions 1

    # to 2709520, use chr X:1-2709520 for the assembly data."

    # As a solution, what we'll do here, is split the assembly up in blocks:

    # if a slice covers both the PAR and the allosomal region, we'll make

    # two subslices (let's call them blocks not to intercede with the

    # Slice#subslices method) and project these independently.

    assembly_exceptions = AssemblyException.find_all_by_seq_region_id(self.seq_region.id)
    if assembly_exceptions.length > 0
      # Check if this bit of the original slice is covered in the

      # assembly_exception table.

      overlapping_exceptions = Array.new
      assembly_exceptions.each do |ae|
        if Slice.new(self.seq_region, ae.seq_region_start, ae.seq_region_end).overlaps?(self)
          if ae.exc_type == 'HAP'
            raise NotImplementedError, "The haplotype exceptions are not implemented (yet). You can't project this slice."
          end
          overlapping_exceptions.push(ae)
        end
      end

      if overlapping_exceptions.length > 0
        # First get all assembly blocks from chromosome Y

        source_assembly_blocks = self.excise(overlapping_exceptions.collect{|e| e.seq_region_start .. e.seq_region_end})
        # And insert the blocks of chromosome X

        all_assembly_blocks = Array.new #both for chr X and Y

        # First do all exceptions between the first and last block

        previous_block = nil
        source_assembly_blocks.sort_by{|b| b.start}.each do |b|
          if previous_block.nil?
            all_assembly_blocks.push(b)
            previous_block = b
            next
          end
          # Find the exception record

          exception = nil
          assembly_exceptions.each do |ae|
            if ae.seq_region_end == b.start - 1
              exception = ae
              break
            end
          end

          new_slice_start = exception.exc_seq_region_start + ( previous_block.stop - exception.seq_region_start )
          new_slice_stop = exception.exc_seq_region_start + ( b.start - exception.seq_region_start )
          new_slice_strand = self.strand * exception.ori
          new_slice = Slice.fetch_by_region(self.seq_region.coord_system.name, SeqRegion.find(exception.exc_seq_region_id).name, new_slice_start, new_slice_stop, new_slice_strand)

          all_assembly_blocks.push(new_slice)
          all_assembly_blocks.push(b)
          previous_block = b
        end

        # And then see if we have to add an additional one at the start or end

        first_block = source_assembly_blocks.sort_by{|b| b.start}[0]
        if first_block.start > self.start
          exception = assembly_exceptions.sort_by{|ae| ae.seq_region_start}[0]
          new_slice_start = exception.exc_seq_region_start + ( self.start - exception.seq_region_start )
          new_slice_stop = exception.exc_seq_region_start + ( first_block.start - 1 - exception.seq_region_start )
          new_slice_strand = self.strand * exception.ori
          new_slice = Slice.fetch_by_region(self.seq_region.coord_system.name, SeqRegion.find(exception.exc_seq_region_id).name, new_slice_start, new_slice_stop, new_slice_strand)

          all_assembly_blocks.unshift(new_slice)
        end

        last_block = source_assembly_blocks.sort_by{|b| b.start}[-1]
        if last_block.stop < self.stop
          exception = assembly_exceptions.sort_by{|ae| ae.seq_region_start}[-1]
          new_slice_start = exception.exc_seq_region_start + ( last_block.stop + 1 - exception.seq_region_start )
          new_slice_stop = exception.exc_seq_region_start + ( self.stop - exception.seq_region_start )
          new_slice_strand = self.strand * exception.ori
          new_slice = Slice.fetch_by_region(self.seq_region.coord_system.name, SeqRegion.find(exception.exc_seq_region_id).name, new_slice_start, new_slice_stop, new_slice_strand)

          all_assembly_blocks.shift(new_slice)
        end

        answer = Array.new
        all_assembly_blocks.each do |b|
          answer.push(b.project(coord_system_name))
        end
        answer.flatten!

        return answer
      end

    end
    # END OF ASSEMBLY_EXCEPTIONS


    # Get all AssemblyLinks starting from this assembly and for which

    # the cmp_seq_region.coord_system is what we want.

    assembly_links = self.seq_region.assembly_links_as_assembly(coord_system_name)

    # Now reject all the components that lie _before_ the source, then

    # reject all the components that lie _after_ the source.

    # Then sort based on their positions.

    sorted_overlapping_assembly_links = assembly_links.reject{|al| al.asm_end < self.start}.reject{|al| al.asm_start > self.stop}.sort_by{|al| al.asm_start}
    if sorted_overlapping_assembly_links.length == 0
      return []
    end

    # What we'll do, is create slices for all the underlying components,

    # including the first and the last one. At first, the first and last

    # components are added in their entirity and will only be cropped afterwards.

    previous_stop = nil
    sorted_overlapping_assembly_links.each_index do |i|
      this_link = sorted_overlapping_assembly_links[i]
      if i == 0
        answer.push(Slice.new(this_link.cmp_seq_region, this_link.cmp_start, this_link.cmp_end, this_link.ori))
        next
      end
      previous_link = sorted_overlapping_assembly_links[i-1]

      # If there is a gap with the previous link: add a gap

      if this_link.asm_start > ( previous_link.asm_end + 1 )
        gap_size = this_link.asm_start - previous_link.asm_end - 1
        answer.push(Gap.new(CoordSystem.find_by_name(coord_system_name), gap_size))
      end

      # And add the component itself as a Slice

      answer.push(Slice.new(this_link.cmp_seq_region, this_link.cmp_start, this_link.cmp_end, this_link.ori))
    end

    # Now see if we have to crop the first and/or last slice

    first_link = sorted_overlapping_assembly_links[0]
    if self.start > first_link.asm_start
      if first_link.ori == -1
        answer[0].stop = first_link.cmp_start + ( first_link.asm_end - self.start )
      else
        answer[0].start = first_link.cmp_start + ( self.start - first_link.asm_start )
      end
    end

    last_link = sorted_overlapping_assembly_links[-1]
    if self.stop < last_link.asm_end
      if last_link.ori == -1
        answer[-1].start = last_link.cmp_start + ( last_link.asm_end - self.stop)
      else
        answer[-1].stop = last_link.cmp_start + ( self.stop - last_link.asm_start )
      end
    end

    # And check if we have to add Ns at the front and/or back

    if self.start < first_link.asm_start
      gap_size = first_link.asm_start - self.start
      answer.unshift(Gap.new(CoordSystem.find_by_name(coord_system_name), gap_size))
    end
    if self.stop > last_link.asm_end
      gap_size = self.stop - last_link.asm_end
      answer.push(Gap.new(CoordSystem.find_by_name(coord_system_name), gap_size))
    end
  end
  return answer

end

#protein_align_features(analysis_name) ⇒ Object

DESCRIPTION

Get all ProteinAlignFeatures that are located on a Slice for a given Analysis.

Pitfall: just looks at the CoordSystem that the Slice is located on. For example, if a Slice is located on a SeqRegion on the ‘chromosome’ CoordSystem, but all protein_align_features are annotated on SeqRegions of the ‘scaffold’ CoordSystem, this method will return an empty array.

USAGE

my_slice.protein_align_features('Uniprot').each do |feature|
  puts feature.to_yaml
end

Arguments:

  • code: name of analysis

Returns

array of ProteinAlignFeature objects



586
587
588
589
590
591
592
593
# File 'lib/ensembl/core/slice.rb', line 586

def protein_align_features(analysis_name)
  if analysis_name.nil?
    return ProteinAlignFeature.find_by_sql('SELECT * FROM protein_align_feature WHERE seq_region_id = ' + self.seq_region.id.to_s + ' AND seq_region_start >= ' + self.start.to_s + ' AND seq_region_end <= ' + self.stop.to_s)
  else
    analysis = Analysis.find_by_logic_name(analysis_name)
    return ProteinAlignFeature.find_by_sql('SELECT * FROM protein_align_feature WHERE seq_region_id = ' + self.seq_region.id.to_s + ' AND seq_region_start >= ' + self.start.to_s + ' AND seq_region_end <= ' + self.stop.to_s + ' AND analysis_id = ' + analysis.id.to_s)
  end
end

#repeatmasked_seqObject

Raises:

  • (NotImplementedError)


357
358
359
# File 'lib/ensembl/core/slice.rb', line 357

def repeatmasked_seq
  raise NotImplementedError
end

#split(max_size = 100000, overlap = 0) ⇒ Object

DESCRIPTION

Creates overlapping subslices for a given Slice.

USAGE

my_slice.split(50000, 250).each do |sub_slice|
  puts sub_slice.display_name
end

Arguments:

  • max_size: maximal size of subslices (default: 100000)

  • overlap: overlap in bp between consecutive subslices (default: 0)

Returns

array of Ensembl::Core::Slice objects



389
390
391
392
393
394
395
396
397
398
# File 'lib/ensembl/core/slice.rb', line 389

def split(max_size = 100000, overlap = 0)
  sub_slices = Array.new
  i = 0
  self.start.step(self.length, max_size - overlap - 1) do |i|
    sub_slices.push(self.sub_slice(i, i + max_size - 1))
  end
  i -= (overlap + 1)
  sub_slices.push(self.sub_slice(i + max_size))
  return sub_slices
end

#sub_slice(start = self.start, stop = self.stop) ⇒ Object

DESCRIPTION

Take a sub_slice from an existing one.

USAGE

my_sub_slice = my_slice.sub_slice(400,500)

Arguments:

  • start: start of subslice relative to slice (default: start of slice)

  • stop: stop of subslice relative to slice (default: stop of slice)

Returns

Ensembl::Core::Slice object



372
373
374
# File 'lib/ensembl/core/slice.rb', line 372

def sub_slice(start = self.start, stop = self.stop)
  return self.class.new(self.seq_region, start, stop, self.strand)
end

#within?(other_slice) ⇒ Boolean

DESCRIPTION

The Slice#within? method checks if this slice is contained withing another one. The other slice has to be on the same coordinate system

USAGE

slice_a = Slice.fetch_by_region('chromosome','X',1,1000)
slice_b = Slice.fetch_by_region('chromosome','X',900,950)
if slice_b.overlaps?(slice_a)
  puts "Slice b is within slice a"
end

Arguments

another slice

Returns

true or false

Returns:

  • (Boolean)


240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
# File 'lib/ensembl/core/slice.rb', line 240

def within?(other_slice)
  if ! other_slice.class == Slice
    raise RuntimeError, "The Slice#overlaps? method takes a Slice object as its arguments."
  end
  if self.seq_region.coord_system != other_slice.seq_region.coord_system
    raise RuntimeError, "The argument slice of Slice#overlaps? has to be in the same coordinate system, but were " + self.seq_region.coord_system.name + " and " + other_slice.seq_region.coord_system.name
  end

  self_range = self.start .. self.stop
  other_range = other_slice.start .. other_slice.stop

  if other_range.include?(self.start) and other_range.include?(self.stop)
    return true
  else
    return false
  end
end