Class: Bio::AssemblyGraphAlgorithms::BubblyAssembler::Bubble

Inherits:
Object
  • Object
show all
Includes:
FinishM::Logging
Defined in:
lib/assembly/bubbly_assembler.rb

Overview

Tim - use ‘waiting train’ algorithm (made up by me). Problems collect the nodes they visit, adding them to hashes of ‘ubiquitous’ and ‘visited’ nodes (metaphor: ‘train’ (problem) visiting ‘stations’ (nodes)). Each time a problem is dequeued, new problems are enqueued for all neighbours to the problem node (metaphor: ‘trains’ (problems) magically duplicate for each path to a new ‘station’ (node) (methaphor breaks a bit here)). At each step the algorithm dequeues a problem, prioritising problems by shortest distance of any path to the problem node, meaning if a a problem is enqueued for a node that is already known, then that problem is prioritised (metaphor: when a train leaves a station (problem is deqeued) other ‘trains’ will wait in case it catches up, or otherwise reaches a more distant station). If a new problem is enqueued for a problem node that is currently in enqueued, the new problem is added to known problems removed from queue, and when a problem is dequeued, its ubiquitous and visited nodes are set to the ubiquitous and visited nodes of all known problems for the node (metaphor: the carriages of all trains at a station are merged into one train). Cycles occur when a problem reaches a node that is in its visited nodes hash (metaphor: a station that one of the train carriages has previously visited). Queued cyclic problems are added to known problems and then dropped. Bubble is converged when all current problems have a ubiquitous node in common (metaphor: all carriages of all current trains have visited a station).

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from FinishM::Logging

#log

Constructor Details

#initialize(bubble_root, options = {}) ⇒ Bubble

Returns a new instance of Bubble.



496
497
498
499
500
501
502
503
# File 'lib/assembly/bubbly_assembler.rb', line 496

def initialize(bubble_root, options = {})
  @queue = DS::AnyPriorityQueue.new {|a,b| a<=b}
  @known_problems = {}
  @current_problems = Set.new
  @num_legit_forks = 0
  @max_cycles = options[:max_cycles] || DEFAULT_MAX_CYCLES
  @root = bubble_root
end

Instance Attribute Details

#converging_oriented_node_settableObject (readonly)

The DynamicProgrammingProblem this bubble converges on



491
492
493
# File 'lib/assembly/bubbly_assembler.rb', line 491

def converging_oriented_node_settable
  @converging_oriented_node_settable
end

#is_reverseObject (readonly)

The DynamicProgrammingProblem this bubble converges on



491
492
493
# File 'lib/assembly/bubbly_assembler.rb', line 491

def is_reverse
  @is_reverse
end

#num_legit_forksObject

how many legit forks have been explored



494
495
496
# File 'lib/assembly/bubbly_assembler.rb', line 494

def num_legit_forks
  @num_legit_forks
end

#rootObject (readonly)

The DynamicProgrammingProblem this bubble converges on



491
492
493
# File 'lib/assembly/bubbly_assembler.rb', line 491

def root
  @root
end

Instance Method Details

#[](index) ⇒ Object

This doesn’t make sense unless this is a converged bubble and the index == -1 because otherwise there is multiple answers



759
760
761
762
763
764
765
# File 'lib/assembly/bubbly_assembler.rb', line 759

def [](index)
  raise unless index == -1
  return Bio::Velvet::Graph::OrientedNodeTrail::OrientedNode.new(
    @converging_oriented_node_settable[0],
    @converging_oriented_node_settable[1]
    )
end

#circuitous?Boolean

Does this (coverged) bubble contain any circuits?

Returns:

  • (Boolean)


787
788
789
790
791
792
793
794
# File 'lib/assembly/bubbly_assembler.rb', line 787

def circuitous?
  raise unless converged?
  if @circuitous.nil?
    each_path({:max_cycles => 0}) {|| break if @circuitous}
    @circuitous ||= false
  end
  @circuitous
end

#converge_on(dynamic_programming_problem) ⇒ Object

Finish off the bubble, assuming convergent_on? the given problem == true



562
563
564
565
566
567
# File 'lib/assembly/bubbly_assembler.rb', line 562

def converge_on(dynamic_programming_problem)
  @converging_oriented_node_settable = dynamic_programming_problem.to_settable
  #free some memory
  @queue = nil
  @current_problems = nil
end

#converged?Boolean

Returns:

  • (Boolean)


730
731
732
# File 'lib/assembly/bubbly_assembler.rb', line 730

def converged?
  !@converging_oriented_node_settable.nil?
end

#convergent_on?(dynamic_programming_problem) ⇒ Boolean

return true if the given problem converges the bubble, else false

Returns:

  • (Boolean)


552
553
554
555
556
557
558
559
# File 'lib/assembly/bubbly_assembler.rb', line 552

def convergent_on?(dynamic_programming_problem)
  settable =  dynamic_programming_problem.to_settable

  @queue.each do |problem| #convergent until not
    return false unless ubiquitous_oriented_nodes(problem).include? settable
  end
  return true
end

#converging_oriented_nodeObject

Return the OrientedNode that converges this bubble, behaviour undefined if bubble is not converged



736
737
738
# File 'lib/assembly/bubbly_assembler.rb', line 736

def converging_oriented_node
  @known_problems[@converging_oriented_node_settable][0].path[-1]
end

#coverageObject

Coverage of a bubble is the coverage of each node in the bubble each weighted by their length



798
799
800
801
802
803
804
805
806
807
# File 'lib/assembly/bubbly_assembler.rb', line 798

def coverage
  sum = 0.0
  length = 0
  oriented_nodes do |onode|
    node_length = onode.node.length_alone
    sum += onode.node.coverage * node_length
    length += node_length
  end
  return sum / length
end

#each_path(options = {}) ⇒ Object

Iterate over the paths returning each as an OrientedNodeTrail. Assumes the path is convergent.



610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
# File 'lib/assembly/bubbly_assembler.rb', line 610

def each_path(options = {})
  raise unless converged?
  max_cycles = options[:max_cycles] || @max_cycles

  # Metric used to prioritise each_path
  comparator = lambda do |problem1, problem2|
    onode1 = nil
    onode2 = nil
    if problem1.path.length == 1 and problem2.path.length > 1
      # Here the comparison cannot be made on 2nd last node coverages
      # since one of the paths goes straight from the initial to the terminal
      # node. Choose instead based on if the second last node has higher or lower
      # coverage than the final node
      onode1 = problem1.path[-1]
      onode2 = problem2.path[-2]
    elsif problem2.path.length == 1 and problem1.path.length > 1
      onode1 = problem1.path[-2]
      onode2 = problem2.path[-1]
    else
      onode1 = problem1.path[-2]
      onode2 = problem2.path[-2]
    end
    #log.debug "Comparing nodes #{onode1.node.node_id} and #{onode2.node.node_id}" if log.debug?

    if onode1.node.coverage == onode2.node.coverage
      -(onode1.node.node_id <=> onode2.node.node_id)
    else
      onode1.node.coverage <=> onode2.node.coverage
    end
  end

  log.debug "Iterating through each path of bubble" if log.debug?

  # Tim - use stack and push paths with lowest coverage first
  stack = DS::Stack.new
  counter = Bio::AssemblyGraphAlgorithms::SingleCoherentPathsBetweenNodesFinder::CycleCounter.new max_cycles
  initial_solution = @known_problems[@converging_oriented_node_settable][0]
  stack.push [initial_solution.path, []]
  converging_onode = converging_oriented_node
  #log.debug "Pushed to stack #{initial_solution.path.to_shorthand}" if log.debug?


  while path_parts = stack.pop
    direct_node_trail = path_parts[0]
    second_part = path_parts[1]
    #log.debug "Popped #{direct_node_trail.to_shorthand} and [#{second_part.collect{|o| o.to_shorthand}.join(',') }]" if log.debug?


    if direct_node_trail.trail.length == 0

      # check for cycles through bubble root
      if second_part.include? @root
        #log.debug "Found cycle through bubble root." if log.debug?
        @circuitous = true unless @circuitous
        if max_cycles == 0 or max_cycles < counter.path_cycle_count([@root]+second_part)
          #log.debug "Not finishing cyclic path with too many cycles." if log.debug?
          next
        end
      end

      yield_path = Bio::Velvet::Graph::OrientedNodeTrail.new
      yield_path.trail = second_part
      if @is_reverse
        yield_path = yield_path.reverse
      end
      log.debug "Yielded #{yield_path.to_shorthand}" if log.debug?
      yield yield_path
    else
      # go down the path, looking for other paths
      head_onode = direct_node_trail.trail[-1]
      new_second_part = [head_onode]+second_part
      if second_part.length > 1 and head_onode == converging_oriented_node
        #log.debug "Ignoring path with cycle through converged node." if log.debug?
        next
      end
      if second_part.include? head_onode
        #log.debug "Cycle at node #{head_onode.node_id} in path #{second_part.collect{|onode| onode.node.node_id}.join(',')}." if log.debug?
        @circuitous = true unless @circuitous
        if max_cycles == 0 or max_cycles < counter.path_cycle_count(new_second_part)
          #log.debug "Not finishing cyclic path with too many cycles." if log.debug?
          next
        end
      end

      new_problems = @known_problems[head_onode.to_settable]
      #log.debug "Found new problems: #{new_problems.collect{|prob| prob.to_shorthand}.join(' ') }" if log.debug?

      problem_leads = Set.new
      filtered_problems = new_problems.reject do |new_problem|
        # Only enqueue paths where the second-to-head onode is not already queued
        unless new_problem.path.length < 2
          lead_settable = new_problem.path[-2].to_settable
          if problem_leads.include? lead_settable
            #log.debug "Ignoring duplicate neighbour problem #{new_problem.to_shorthand}" if log.debug?
            next true
          end
          problem_leads << lead_settable
        end
        false
      end

      filtered_problems.sort(&comparator).each do |new_problem|
        # TODO: deal with circuits
        new_trail = Bio::Velvet::Graph::OrientedNodeTrail.new
        new_trail.trail = new_problem.path[0...-1]
        #log.debug "Enqueuing #{new_trail.to_shorthand} and [#{new_second_part.collect{|o| o.to_shorthand}.join(',') }]" if log.debug?
        stack.push [new_trail, new_second_part]
      end
    end
  end
end

#enqueue(dynamic_programming_problem) ⇒ Object



536
537
538
539
540
541
542
543
544
545
546
547
548
# File 'lib/assembly/bubbly_assembler.rb', line 536

def enqueue(dynamic_programming_problem)
  settable = dynamic_programming_problem.to_settable


  @known_problems[settable] ||= []
  @known_problems[settable].push dynamic_programming_problem

  # don't requeue current problem or circular problem
  unless dynamic_programming_problem.circular_path_detected == true or @current_problems.include? settable
    @queue.enqueue dynamic_programming_problem, shortest_problem_distance(dynamic_programming_problem)
    @current_problems << settable
  end
end

#num_known_problemsObject



603
604
605
# File 'lib/assembly/bubbly_assembler.rb', line 603

def num_known_problems
  @known_problems.length
end

#oriented_nodesObject

yield or failing that return an Array of the list of oriented_nodes found in at least one path in this (presumed converged) bubble



571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
# File 'lib/assembly/bubbly_assembler.rb', line 571

def oriented_nodes
  raise unless converged?
  seen_nodes = {}
  stack = DS::Stack.new
  initial_solution = @known_problems[@converging_oriented_node_settable][0]
  converging_onode = initial_solution.path[-1]
  stack.push converging_onode

  while onode = stack.pop
    settable = onode.to_settable
    next if seen_nodes.key?(settable)

    if block_given?
      if @is_reverse
        yield onode.reverse
      else
        yield onode
      end
    end

    seen_nodes[settable] = onode

    # queue neighbours for paths that don't contain the converging onode
    @known_problems[settable].each do |dpp|
      stack.push dpp.path[-2] unless dpp.path.length < 2 or dpp.path[0...-1].include? converging_onode
    end
  end

  return nil if block_given?
  return seen_nodes.values
end

#pathsObject



722
723
724
725
726
727
728
# File 'lib/assembly/bubbly_assembler.rb', line 722

def paths
  to_return = []
  each_path do |path|
    to_return.push path
  end
  to_return
end

#reference_trail(max_cycles = @max_cycles) ⇒ Object

Return one trail that exemplifies the paths through this bubble. Current method of path selection is simply greedy, taking the highest coverage node at each fork (or failing that the node with the lower node_id).



770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
# File 'lib/assembly/bubbly_assembler.rb', line 770

def reference_trail(max_cycles = @max_cycles)
  raise unless converged?

  converging_onode = converging_oriented_node
  log.debug "Finding reference trail from node #{converging_onode.node.node_id}" if log.debug?

  reference_trail = nil
  each_path do |path|
    #break when first path is found
    reference_trail = path
    break
  end

  return reference_trail
end

#reverse!Object



752
753
754
755
# File 'lib/assembly/bubbly_assembler.rb', line 752

def reverse!
  @is_reverse ||= false
  @is_reverse = !@is_reverse
end

#shiftObject

Return the next closest dynamic programming problem, removing it from the bubble



507
508
509
510
511
512
513
514
515
# File 'lib/assembly/bubbly_assembler.rb', line 507

def shift
  prob = @queue.shift
  unless prob.nil?
    prob.ubiquitous_oriented_nodes = ubiquitous_oriented_nodes(prob)
    prob.visited_oriented_nodes = visited_oriented_nodes(prob)
    @current_problems.delete prob.to_settable
  end
  return prob
end

#shortest_problem_distance(prob) ⇒ Object



531
532
533
534
# File 'lib/assembly/bubbly_assembler.rb', line 531

def shortest_problem_distance(prob)
  # prioritise by the shortest distance for current problem
  @known_problems[prob.to_settable].collect{|prob| prob.distance}.min
end

#to_shorthandObject



740
741
742
743
744
745
746
747
748
749
750
# File 'lib/assembly/bubbly_assembler.rb', line 740

def to_shorthand
  shorts = []
  if converged?
    shorts = paths.sort{|a,b| a.to_shorthand <=> b.to_shorthand }.collect{|path| path.to_shorthand}
  else
    @queue.each do |problem|
      shorts.push problem.to_shorthand
    end
  end
  return "{#{shorts.join('|') }}"
end

#ubiquitous_oriented_nodes(prob) ⇒ Object



524
525
526
527
528
529
# File 'lib/assembly/bubbly_assembler.rb', line 524

def ubiquitous_oriented_nodes(prob)
  #only ubiquitous nodes from relevant problems
  @known_problems[prob.to_settable].reduce(prob.ubiquitous_oriented_nodes) do |memo, problem|
    memo & problem.ubiquitous_oriented_nodes
  end
end

#visited_oriented_nodes(prob) ⇒ Object



517
518
519
520
521
522
# File 'lib/assembly/bubbly_assembler.rb', line 517

def visited_oriented_nodes(prob)
  #all visited nodes for relevant problems
  @known_problems[prob.to_settable].reduce(prob.ubiquitous_oriented_nodes) do |memo, problem|
    memo + problem.ubiquitous_oriented_nodes
  end
end