Class: Bio::Locations

Inherits:
Object show all
Includes:
Enumerable
Defined in:
lib/bio/location.rb

Overview

Description

The Bio::Locations class is a container for Bio::Location objects: creating a Bio::Locations object (based on a GenBank style position string) will spawn an array of Bio::Location objects.

Usage

 locations = Bio::Locations.new('join(complement(500..550), 600..625)')
 locations.each do |loc|
   puts "class = " + loc.class.to_s
   puts "range = #{loc.from}..#{loc.to} (strand = #{loc.strand})"
 end
 # Output would be:
 #   class = Bio::Location
 #   range = 500..550 (strand = -1)
 #   class = Bio::Location
 #   range = 600..625 (strand = 1)

# For the following three location strings, print the span and range
['one-of(898,900)..983',
 'one-of(5971..6308,5971..6309)',
 '8050..one-of(10731,10758,10905,11242)'].each do |loc|
    location = Bio::Locations.new(loc)
    puts location.span
    puts location.range
end

GenBank location descriptor classification

Definition of the position notation of the GenBank location format

According to the GenBank manual ‘gbrel.txt’, position notations were classified into 10 patterns - (A) to (J).

3.4.12.2 Feature Location

  The second column of the feature descriptor line designates the
location of the feature in the sequence. The location descriptor
begins at position 22. Several conventions are used to indicate
sequence location.

  Base numbers in location descriptors refer to numbering in the entry,
which is not necessarily the same as the numbering scheme used in the
published report. The first base in the presented sequence is numbered
base 1. Sequences are presented in the 5 to 3 direction.

Location descriptors can be one of the following:

(A) 1. A single base;

(B) 2. A contiguous span of bases;

(C) 3. A site between two bases;

(D) 4. A single base chosen from a range of bases;

(E) 5. A single base chosen from among two or more specified bases;

(F) 6. A joining of sequence spans;

(G) 7. A reference to an entry other than the one to which the feature
     belongs (i.e., a remote entry), followed by a location descriptor
     referring to the remote sequence;

(H) 8. A literal sequence (a string of bases enclosed in quotation marks).

Description commented with pattern IDs.

(C)   A site between two residues, such as an endonuclease cleavage site, is
    indicated by listing the two bases separated by a carat (e.g., 23^24).

(D)   A single residue chosen from a range of residues is indicated by the
    number of the first and last bases in the range separated by a single
    period (e.g., 23.79). The symbols < and > indicate that the end point
(I) of the range is beyond the specified base number.

(B)   A contiguous span of bases is indicated by the number of the first and
    last bases in the range separated by two periods (e.g., 23..79). The
(I) symbols < and > indicate that the end point of the range is beyond the
    specified base number. Starting and ending positions can be indicated
    by base number or by one of the operators described below.

      Operators are prefixes that specify what must be done to the indicated
    sequence to locate the feature. The following are the operators
    available, along with their most common format and a description.

(J) complement (location): The feature is complementary to the location
    indicated. Complementary strands are read 5 to 3.

(F) join (location, location, .. location): The indicated elements should
    be placed end to end to form one contiguous sequence.

(F) order (location, location, .. location): The elements are found in the
    specified order in the 5 to 3 direction, but nothing is implied about
    the rationality of joining them.

(F) group (location, location, .. location): The elements are related and
    should be grouped together, but no order is implied.

(E) one-of (location, location, .. location): The element can be any one,
  but only one, of the items listed.

Reduction strategy of the position notations

  • (A) Location n

  • (B) Location n..m

  • © Location n^m

  • (D) (n.m) => Location n

  • (E)

    • one-of(n,m,..) => Location n

    • one-of(n..m,..) => Location n..m

  • (F)

    • order(loc,loc,..) => join(loc, loc,..)

    • group(loc,loc,..) => join(loc, loc,..)

    • join(loc,loc,..) => Sequence

  • (G) ID:loc => Location with ID

  • (H) “atgc” => Location only with Sequence

  • (I)

    • <n => Location n with lt flag

    • >n => Location n with gt flag

    • <n..m => Location n..m with lt flag

    • n..>m => Location n..m with gt flag

    • <n..>m => Location n..m with lt, gt flag

  • (J) complement(loc) => Sequence

  • (K) replace(loc, str) => Location with replacement Sequence

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(position) ⇒ Locations

Parses a GenBank style position string and returns a Bio::Locations object, which contains a list of Bio::Location objects.

locations = Bio::Locations.new('join(complement(500..550), 600..625)')

Arguments:

  • (required) str: GenBank style position string

Returns

Bio::Locations object



346
347
348
349
350
351
352
353
354
# File 'lib/bio/location.rb', line 346

def initialize(position)
  @operator = nil
  if position.is_a? Array
    @locations = position
  else
    position   = gbl_cleanup(position)	# preprocessing
    @locations = gbl_pos2loc(position)	# create an Array of Bio::Location objects
  end
end

Instance Attribute Details

#locationsObject

(Array) An Array of Bio::Location objects



357
358
359
# File 'lib/bio/location.rb', line 357

def locations
  @locations
end

#operatorObject

(Symbol or nil) Operator. nil (means :join), :order, or :group (obsolete).



361
362
363
# File 'lib/bio/location.rb', line 361

def operator
  @operator
end

Instance Method Details

#==(other) ⇒ Object

If other is equal with the self, returns true. Otherwise, returns false.


Arguments:

  • (required) other: any object

Returns

true or false



381
382
383
384
385
386
387
388
389
390
# File 'lib/bio/location.rb', line 381

def ==(other)
  return true if super(other)
  return false unless other.instance_of?(self.class)
  if self.locations == other.locations and
      self.operator == other.operator then
    true
  else
    false
  end
end

#[](n) ⇒ Object

Returns nth Bio::Location object.



400
401
402
# File 'lib/bio/location.rb', line 400

def [](n)
  @locations[n]
end

#absolute(n, type = nil) ⇒ Object

Converts relative position in the locus to position in the whole of the DNA sequence.

This method can for example be used to relate positions in a DNA-sequence with those in RNA. In this use, the optional ‘:aa’-flag returns the position of the associated amino-acid rather than the nucleotide.

loc = Bio::Locations.new('complement(12838..13533)')
puts loc.absolute(10)          # => 13524
puts loc.absolute(10, :aa)     # => 13506

Arguments:

  • (required) position: nucleotide position within locus

  • :aa: flag to be used if position is a aminoacid position rather than a nucleotide position

Returns

position within the whole of the sequence



490
491
492
493
494
495
496
497
498
499
500
# File 'lib/bio/location.rb', line 490

def absolute(n, type = nil)
  case type
  when :location
    ;
  when :aa
    n = (n - 1) * 3 + 1
    rel2abs(n)
  else
    rel2abs(n)
  end
end

#eachObject

Iterates on each Bio::Location object.



393
394
395
396
397
# File 'lib/bio/location.rb', line 393

def each
  @locations.each do |x|
    yield(x)
  end
end

#equals?(other) ⇒ Boolean

Evaluate equality of Bio::Locations object.

Returns:

  • (Boolean)


364
365
366
367
368
369
370
371
372
373
# File 'lib/bio/location.rb', line 364

def equals?(other)
  if ! other.kind_of?(Bio::Locations)
    return nil
  end
  if self.sort == other.sort
    return true
  else
    return false
  end
end

#firstObject

Returns first Bio::Location object.



405
406
407
# File 'lib/bio/location.rb', line 405

def first
  @locations.first
end

#lastObject

Returns last Bio::Location object.



410
411
412
# File 'lib/bio/location.rb', line 410

def last
  @locations.last
end

#lengthObject Also known as: size

Returns a length of the spliced RNA.



429
430
431
432
433
434
435
436
437
438
439
# File 'lib/bio/location.rb', line 429

def length
  len = 0
  @locations.each do |x|
    if x.sequence
      len += x.sequence.size
    else
      len += (x.to - x.from + 1)
    end
  end
  len
end

#rangeObject

Similar to span, but returns a Range object min..max



423
424
425
426
# File 'lib/bio/location.rb', line 423

def range
  min, max = span
  min..max
end

#relative(n, type = nil) ⇒ Object

Converts absolute position in the whole of the DNA sequence to relative position in the locus.

This method can for example be used to relate positions in a DNA-sequence with those in RNA. In this use, the optional ‘:aa’-flag returns the position of the associated amino-acid rather than the nucleotide.

loc = Bio::Locations.new('complement(12838..13533)')
puts loc.relative(13524)        # => 10
puts loc.relative(13506, :aa)   # => 3

Arguments:

  • (required) position: nucleotide position within whole of the sequence

  • :aa: flag that lets method return position in aminoacid coordinates

Returns

position within the location



458
459
460
461
462
463
464
465
466
467
468
469
470
471
# File 'lib/bio/location.rb', line 458

def relative(n, type = nil)
  case type
  when :location
    ;
  when :aa
    if n = abs2rel(n)
      (n - 1) / 3 + 1
    else
      nil
    end
  else
    abs2rel(n)
  end
end

#spanObject

Returns an Array containing overall min and max position [min, max] of this Bio::Locations object.



416
417
418
419
420
# File 'lib/bio/location.rb', line 416

def span
  span_min = @locations.min { |a,b| a.from <=> b.from }
  span_max = @locations.max { |a,b| a.to   <=> b.to   }
  return span_min.from, span_max.to
end

#to_sObject

String representation.

Note: In some cases, it fails to detect whether “complement(join(…))” or “join(complement(..))”, and whether “complement(order(…))” or “order(complement(..))”.


Returns

String



511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
# File 'lib/bio/location.rb', line 511

def to_s
  return '' if @locations.empty?
  complement_join = false
  locs = @locations
  if locs.size >= 2 and locs.inject(true) do |flag, loc|
      # check if each location is complement
      (flag && (loc.strand == -1) && !loc.xref_id)
    end and locs.inject(locs[0].from) do |pos, loc|
      if pos then
        (pos >= loc.from) ? loc.from : false
      else
        false
      end
    end then
    locs = locs.reverse
    complement_join = true
  end
  locs = locs.collect do |loc|
    lt = loc.lt ? '<' : ''
    gt = loc.gt ? '>' : ''
    str = if loc.from == loc.to then
            "#{lt}#{gt}#{loc.from.to_i}"
          elsif loc.carat then
            "#{lt}#{loc.from.to_i}^#{gt}#{loc.to.to_i}"
          else
            "#{lt}#{loc.from.to_i}..#{gt}#{loc.to.to_i}"
          end
    if loc.xref_id and !loc.xref_id.empty? then
      str = "#{loc.xref_id}:#{str}"
    end
    if loc.strand == -1 and !complement_join then
      str = "complement(#{str})"
    end
    if loc.sequence then
      str = "replace(#{str},\"#{loc.sequence}\")"
    end
    str
  end
  if locs.size >= 2 then
    op = (self.operator || 'join').to_s
    result = "#{op}(#{locs.join(',')})"
  else
    result = locs[0]
  end
  if complement_join then
    result = "complement(#{result})"
  end
  result
end