Class: Bio::AssemblyGraphAlgorithms::BubblyAssembler::Bubble

Inherits:

Object

Object
Bio::AssemblyGraphAlgorithms::BubblyAssembler::Bubble

show all

Includes:: FinishM::Logging

Defined in:: lib/assembly/bubbly_assembler.rb

Overview

Tim - use ‘waiting train’ algorithm (made up by me). Problems collect the nodes they visit, adding them to hashes of ‘ubiquitous’ and ‘visited’ nodes (metaphor: ‘train’ (problem) visiting ‘stations’ (nodes)). Each time a problem is dequeued, new problems are enqueued for all neighbours to the problem node (metaphor: ‘trains’ (problems) magically duplicate for each path to a new ‘station’ (node) (methaphor breaks a bit here)). At each step the algorithm dequeues a problem, prioritising problems by shortest distance of any path to the problem node, meaning if a a problem is enqueued for a node that is already known, then that problem is prioritised (metaphor: when a train leaves a station (problem is deqeued) other ‘trains’ will wait in case it catches up, or otherwise reaches a more distant station). If a new problem is enqueued for a problem node that is currently in enqueued, the new problem is added to known problems removed from queue, and when a problem is dequeued, its ubiquitous and visited nodes are set to the ubiquitous and visited nodes of all known problems for the node (metaphor: the carriages of all trains at a station are merged into one train). Cycles occur when a problem reaches a node that is in its visited nodes hash (metaphor: a station that one of the train carriages has previously visited). Queued cyclic problems are added to known problems and then dropped. Bubble is converged when all current problems have a ubiquitous node in common (metaphor: all carriages of all current trains have visited a station).

Instance Attribute Summary collapse

#converging_oriented_node_settable ⇒ Object readonly

The DynamicProgrammingProblem this bubble converges on.
#is_reverse ⇒ Object readonly

The DynamicProgrammingProblem this bubble converges on.
#num_legit_forks ⇒ Object

how many legit forks have been explored.
#root ⇒ Object readonly

The DynamicProgrammingProblem this bubble converges on.

Instance Method Summary collapse

#[](index) ⇒ Object

This doesn’t make sense unless this is a converged bubble and the index == -1 because otherwise there is multiple answers.
#circuitous? ⇒ Boolean

Does this (coverged) bubble contain any circuits?.
#converge_on(dynamic_programming_problem) ⇒ Object

Finish off the bubble, assuming convergent_on? the given problem == true.
#converged? ⇒ Boolean
#convergent_on?(dynamic_programming_problem) ⇒ Boolean

return true if the given problem converges the bubble, else false.
#converging_oriented_node ⇒ Object

Return the OrientedNode that converges this bubble, behaviour undefined if bubble is not converged.
#coverage ⇒ Object

Coverage of a bubble is the coverage of each node in the bubble each weighted by their length.
#each_path(options = {}) ⇒ Object

Iterate over the paths returning each as an OrientedNodeTrail.
#enqueue(dynamic_programming_problem) ⇒ Object
#initialize(bubble_root, options = {}) ⇒ Bubble constructor

A new instance of Bubble.
#num_known_problems ⇒ Object
#oriented_nodes ⇒ Object

yield or failing that return an Array of the list of oriented_nodes found in at least one path in this (presumed converged) bubble.
#paths ⇒ Object
#reference_trail(max_cycles = @max_cycles) ⇒ Object

Return one trail that exemplifies the paths through this bubble.
#reverse! ⇒ Object
#shift ⇒ Object

Return the next closest dynamic programming problem, removing it from the bubble.
#shortest_problem_distance(prob) ⇒ Object
#to_shorthand ⇒ Object
#ubiquitous_oriented_nodes(prob) ⇒ Object
#visited_oriented_nodes(prob) ⇒ Object

Methods included from FinishM::Logging

#log

Constructor Details

#initialize(bubble_root, options = {}) ⇒ `Bubble`

Returns a new instance of Bubble.

# File 'lib/assembly/bubbly_assembler.rb', line 496

def initialize(bubble_root, options = {})
  @queue = DS::AnyPriorityQueue.new {|a,b| a<=b}
  @known_problems = {}
  @current_problems = Set.new
  @num_legit_forks = 0
  @max_cycles = options[:max_cycles] || DEFAULT_MAX_CYCLES
  @root = bubble_root
end

Instance Attribute Details

#converging_oriented_node_settable ⇒ `Object` (readonly)

The DynamicProgrammingProblem this bubble converges on



491
492
493

# File 'lib/assembly/bubbly_assembler.rb', line 491

def converging_oriented_node_settable
  @converging_oriented_node_settable
end

#is_reverse ⇒ `Object` (readonly)

The DynamicProgrammingProblem this bubble converges on



491
492
493

# File 'lib/assembly/bubbly_assembler.rb', line 491

def is_reverse
  @is_reverse
end

#num_legit_forks ⇒ `Object`

how many legit forks have been explored



494
495
496

# File 'lib/assembly/bubbly_assembler.rb', line 494

def num_legit_forks
  @num_legit_forks
end

#root ⇒ `Object` (readonly)

The DynamicProgrammingProblem this bubble converges on



491
492
493

# File 'lib/assembly/bubbly_assembler.rb', line 491

def root
  @root
end

Instance Method Details

#[](index) ⇒ `Object`

This doesn’t make sense unless this is a converged bubble and the index == -1 because otherwise there is multiple answers

# File 'lib/assembly/bubbly_assembler.rb', line 759

def [](index)
  raise unless index == -1
  return Bio::Velvet::Graph::OrientedNodeTrail::OrientedNode.new(
    @converging_oriented_node_settable[0],
    @converging_oriented_node_settable[1]
    )
end

#circuitous? ⇒ `Boolean`

Does this (coverged) bubble contain any circuits?

Returns:

(Boolean)

# File 'lib/assembly/bubbly_assembler.rb', line 787

def circuitous?
  raise unless converged?
  if @circuitous.nil?
    each_path({:max_cycles => 0}) {|| break if @circuitous}
    @circuitous ||= false
  end
  @circuitous
end

#converge_on(dynamic_programming_problem) ⇒ `Object`

Finish off the bubble, assuming convergent_on? the given problem == true

# File 'lib/assembly/bubbly_assembler.rb', line 562

def converge_on(dynamic_programming_problem)
  @converging_oriented_node_settable = dynamic_programming_problem.to_settable
  #free some memory
  @queue = nil
  @current_problems = nil
end

#converged? ⇒ `Boolean`

Returns:

(Boolean)



730
731
732

# File 'lib/assembly/bubbly_assembler.rb', line 730

def converged?
  !@converging_oriented_node_settable.nil?
end

#convergent_on?(dynamic_programming_problem) ⇒ `Boolean`

return true if the given problem converges the bubble, else false

Returns:

(Boolean)

# File 'lib/assembly/bubbly_assembler.rb', line 552

def convergent_on?(dynamic_programming_problem)
  settable =  dynamic_programming_problem.to_settable

  @queue.each do |problem| #convergent until not
    return false unless ubiquitous_oriented_nodes(problem).include? settable
  end
  return true
end

#converging_oriented_node ⇒ `Object`

Return the OrientedNode that converges this bubble, behaviour undefined if bubble is not converged



736
737
738

# File 'lib/assembly/bubbly_assembler.rb', line 736

def converging_oriented_node
  @known_problems[@converging_oriented_node_settable][0].path[-1]
end

#coverage ⇒ `Object`

Coverage of a bubble is the coverage of each node in the bubble each weighted by their length

# File 'lib/assembly/bubbly_assembler.rb', line 798

def coverage
  sum = 0.0
  length = 0
  oriented_nodes do |onode|
    node_length = onode.node.length_alone
    sum += onode.node.coverage * node_length
    length += node_length
  end
  return sum / length
end

#each_path(options = {}) ⇒ `Object`

Iterate over the paths returning each as an OrientedNodeTrail. Assumes the path is convergent.

# File 'lib/assembly/bubbly_assembler.rb', line 610

def each_path(options = {})
  raise unless converged?
  max_cycles = options[:max_cycles] || @max_cycles

  # Metric used to prioritise each_path
  comparator = lambda do |problem1, problem2|
    onode1 = nil
    onode2 = nil
    if problem1.path.length == 1 and problem2.path.length > 1
      # Here the comparison cannot be made on 2nd last node coverages
      # since one of the paths goes straight from the initial to the terminal
      # node. Choose instead based on if the second last node has higher or lower
      # coverage than the final node
      onode1 = problem1.path[-1]
      onode2 = problem2.path[-2]
    elsif problem2.path.length == 1 and problem1.path.length > 1
      onode1 = problem1.path[-2]
      onode2 = problem2.path[-1]
    else
      onode1 = problem1.path[-2]
      onode2 = problem2.path[-2]
    end
    #log.debug "Comparing nodes #{onode1.node.node_id} and #{onode2.node.node_id}" if log.debug?

    if onode1.node.coverage == onode2.node.coverage
      -(onode1.node.node_id <=> onode2.node.node_id)
    else
      onode1.node.coverage <=> onode2.node.coverage
    end
  end

  log.debug "Iterating through each path of bubble" if log.debug?

  # Tim - use stack and push paths with lowest coverage first
  stack = DS::Stack.new
  counter = Bio::AssemblyGraphAlgorithms::SingleCoherentPathsBetweenNodesFinder::CycleCounter.new max_cycles
  initial_solution = @known_problems[@converging_oriented_node_settable][0]
  stack.push [initial_solution.path, []]
  converging_onode = converging_oriented_node
  #log.debug "Pushed to stack #{initial_solution.path.to_shorthand}" if log.debug?


  while path_parts = stack.pop
    direct_node_trail = path_parts[0]
    second_part = path_parts[1]
    #log.debug "Popped #{direct_node_trail.to_shorthand} and [#{second_part.collect{|o| o.to_shorthand}.join(',') }]" if log.debug?


    if direct_node_trail.trail.length == 0

      # check for cycles through bubble root
      if second_part.include? @root
        #log.debug "Found cycle through bubble root." if log.debug?
        @circuitous = true unless @circuitous
        if max_cycles == 0 or max_cycles < counter.path_cycle_count([@root]+second_part)
          #log.debug "Not finishing cyclic path with too many cycles." if log.debug?
          next
        end
      end

      yield_path = Bio::Velvet::Graph::OrientedNodeTrail.new
      yield_path.trail = second_part
      if @is_reverse
        yield_path = yield_path.reverse
      end
      log.debug "Yielded #{yield_path.to_shorthand}" if log.debug?
      yield yield_path
    else
      # go down the path, looking for other paths
      head_onode = direct_node_trail.trail[-1]
      new_second_part = [head_onode]+second_part
      if second_part.length > 1 and head_onode == converging_oriented_node
        #log.debug "Ignoring path with cycle through converged node." if log.debug?
        next
      end
      if second_part.include? head_onode
        #log.debug "Cycle at node #{head_onode.node_id} in path #{second_part.collect{|onode| onode.node.node_id}.join(',')}." if log.debug?
        @circuitous = true unless @circuitous
        if max_cycles == 0 or max_cycles < counter.path_cycle_count(new_second_part)
          #log.debug "Not finishing cyclic path with too many cycles." if log.debug?
          next
        end
      end

      new_problems = @known_problems[head_onode.to_settable]
      #log.debug "Found new problems: #{new_problems.collect{|prob| prob.to_shorthand}.join(' ') }" if log.debug?

      problem_leads = Set.new
      filtered_problems = new_problems.reject do |new_problem|
        # Only enqueue paths where the second-to-head onode is not already queued
        unless new_problem.path.length < 2
          lead_settable = new_problem.path[-2].to_settable
          if problem_leads.include? lead_settable
            #log.debug "Ignoring duplicate neighbour problem #{new_problem.to_shorthand}" if log.debug?
            next true
          end
          problem_leads << lead_settable
        end
        false
      end

      filtered_problems.sort(&comparator).each do |new_problem|
        # TODO: deal with circuits
        new_trail = Bio::Velvet::Graph::OrientedNodeTrail.new
        new_trail.trail = new_problem.path[0...-1]
        #log.debug "Enqueuing #{new_trail.to_shorthand} and [#{new_second_part.collect{|o| o.to_shorthand}.join(',') }]" if log.debug?
        stack.push [new_trail, new_second_part]
      end
    end
  end
end

#enqueue(dynamic_programming_problem) ⇒ `Object`

# File 'lib/assembly/bubbly_assembler.rb', line 536

def enqueue(dynamic_programming_problem)
  settable = dynamic_programming_problem.to_settable


  @known_problems[settable] ||= []
  @known_problems[settable].push dynamic_programming_problem

  # don't requeue current problem or circular problem
  unless dynamic_programming_problem.circular_path_detected == true or @current_problems.include? settable
    @queue.enqueue dynamic_programming_problem, shortest_problem_distance(dynamic_programming_problem)
    @current_problems << settable
  end
end

#num_known_problems ⇒ `Object`



603
604
605

# File 'lib/assembly/bubbly_assembler.rb', line 603

def num_known_problems
  @known_problems.length
end

#oriented_nodes ⇒ `Object`

yield or failing that return an Array of the list of oriented_nodes found in at least one path in this (presumed converged) bubble

# File 'lib/assembly/bubbly_assembler.rb', line 571

def oriented_nodes
  raise unless converged?
  seen_nodes = {}
  stack = DS::Stack.new
  initial_solution = @known_problems[@converging_oriented_node_settable][0]
  converging_onode = initial_solution.path[-1]
  stack.push converging_onode

  while onode = stack.pop
    settable = onode.to_settable
    next if seen_nodes.key?(settable)

    if block_given?
      if @is_reverse
        yield onode.reverse
      else
        yield onode
      end
    end

    seen_nodes[settable] = onode

    # queue neighbours for paths that don't contain the converging onode
    @known_problems[settable].each do |dpp|
      stack.push dpp.path[-2] unless dpp.path.length < 2 or dpp.path[0...-1].include? converging_onode
    end
  end

  return nil if block_given?
  return seen_nodes.values
end

#paths ⇒ `Object`

# File 'lib/assembly/bubbly_assembler.rb', line 722

def paths
  to_return = []
  each_path do |path|
    to_return.push path
  end
  to_return
end

#reference_trail(max_cycles = @max_cycles) ⇒ `Object`

Return one trail that exemplifies the paths through this bubble. Current method of path selection is simply greedy, taking the highest coverage node at each fork (or failing that the node with the lower node_id).

# File 'lib/assembly/bubbly_assembler.rb', line 770

def reference_trail(max_cycles = @max_cycles)
  raise unless converged?

  converging_onode = converging_oriented_node
  log.debug "Finding reference trail from node #{converging_onode.node.node_id}" if log.debug?

  reference_trail = nil
  each_path do |path|
    #break when first path is found
    reference_trail = path
    break
  end

  return reference_trail
end

#reverse! ⇒ `Object`

# File 'lib/assembly/bubbly_assembler.rb', line 752

def reverse!
  @is_reverse ||= false
  @is_reverse = !@is_reverse
end

#shift ⇒ `Object`

Return the next closest dynamic programming problem, removing it from the bubble

# File 'lib/assembly/bubbly_assembler.rb', line 507

def shift
  prob = @queue.shift
  unless prob.nil?
    prob.ubiquitous_oriented_nodes = ubiquitous_oriented_nodes(prob)
    prob.visited_oriented_nodes = visited_oriented_nodes(prob)
    @current_problems.delete prob.to_settable
  end
  return prob
end

#shortest_problem_distance(prob) ⇒ `Object`

# File 'lib/assembly/bubbly_assembler.rb', line 531

def shortest_problem_distance(prob)
  # prioritise by the shortest distance for current problem
  @known_problems[prob.to_settable].collect{|prob| prob.distance}.min
end

#to_shorthand ⇒ `Object`

# File 'lib/assembly/bubbly_assembler.rb', line 740

def to_shorthand
  shorts = []
  if converged?
    shorts = paths.sort{|a,b| a.to_shorthand <=> b.to_shorthand }.collect{|path| path.to_shorthand}
  else
    @queue.each do |problem|
      shorts.push problem.to_shorthand
    end
  end
  return "{#{shorts.join('|') }}"
end

#ubiquitous_oriented_nodes(prob) ⇒ `Object`

# File 'lib/assembly/bubbly_assembler.rb', line 524

def ubiquitous_oriented_nodes(prob)
  #only ubiquitous nodes from relevant problems
  @known_problems[prob.to_settable].reduce(prob.ubiquitous_oriented_nodes) do |memo, problem|
    memo & problem.ubiquitous_oriented_nodes
  end
end

#visited_oriented_nodes(prob) ⇒ `Object`

# File 'lib/assembly/bubbly_assembler.rb', line 517

def visited_oriented_nodes(prob)
  #all visited nodes for relevant problems
  @known_problems[prob.to_settable].reduce(prob.ubiquitous_oriented_nodes) do |memo, problem|
    memo + problem.ubiquitous_oriented_nodes
  end
end

Class: Bio::AssemblyGraphAlgorithms::BubblyAssembler::Bubble

Overview

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from FinishM::Logging

Constructor Details

#initialize(bubble_root, options = {}) ⇒ Bubble

Instance Attribute Details

#converging_oriented_node_settable ⇒ Object (readonly)

#is_reverse ⇒ Object (readonly)

#num_legit_forks ⇒ Object

#root ⇒ Object (readonly)

Instance Method Details

#[](index) ⇒ Object

#circuitous? ⇒ Boolean

#converge_on(dynamic_programming_problem) ⇒ Object

#converged? ⇒ Boolean

#convergent_on?(dynamic_programming_problem) ⇒ Boolean

#converging_oriented_node ⇒ Object

#coverage ⇒ Object

#each_path(options = {}) ⇒ Object

#enqueue(dynamic_programming_problem) ⇒ Object

#num_known_problems ⇒ Object

#oriented_nodes ⇒ Object

#paths ⇒ Object

#reference_trail(max_cycles = @max_cycles) ⇒ Object

#reverse! ⇒ Object

#shift ⇒ Object

#shortest_problem_distance(prob) ⇒ Object

#to_shorthand ⇒ Object

#ubiquitous_oriented_nodes(prob) ⇒ Object

#visited_oriented_nodes(prob) ⇒ Object

#initialize(bubble_root, options = {}) ⇒ `Bubble`

#converging_oriented_node_settable ⇒ `Object` (readonly)

#is_reverse ⇒ `Object` (readonly)

#num_legit_forks ⇒ `Object`

#root ⇒ `Object` (readonly)

#[](index) ⇒ `Object`

#circuitous? ⇒ `Boolean`

#converge_on(dynamic_programming_problem) ⇒ `Object`

#converged? ⇒ `Boolean`

#convergent_on?(dynamic_programming_problem) ⇒ `Boolean`

#converging_oriented_node ⇒ `Object`

#coverage ⇒ `Object`

#each_path(options = {}) ⇒ `Object`

#enqueue(dynamic_programming_problem) ⇒ `Object`

#num_known_problems ⇒ `Object`

#oriented_nodes ⇒ `Object`

#paths ⇒ `Object`

#reference_trail(max_cycles = @max_cycles) ⇒ `Object`

#reverse! ⇒ `Object`

#shift ⇒ `Object`

#shortest_problem_distance(prob) ⇒ `Object`

#to_shorthand ⇒ `Object`

#ubiquitous_oriented_nodes(prob) ⇒ `Object`

#visited_oriented_nodes(prob) ⇒ `Object`