Class: Bio::AssemblyGraphAlgorithms::ContigPrinter

Inherits:
Object
  • Object
show all
Includes:
FinishM::Logging
Defined in:
lib/assembly/contig_printer.rb

Defined Under Namespace

Classes: AnchoredConnection, PrintableConnection, Variant

Instance Method Summary collapse

Methods included from FinishM::Logging

#log

Instance Method Details

#one_connection_between_two_contigs(graph, contig1, anchored_connection, contig2, sequences) ⇒ Object

Like ready_two_contigs_and_connections except assumes that there is only a single connection between the two sides



162
163
164
165
# File 'lib/assembly/contig_printer.rb', line 162

def one_connection_between_two_contigs(graph, contig1, anchored_connection, contig2, sequences)
  raise "programming error: only one path expected here" if anchored_connection.paths.length > 1
  return ready_two_contigs_and_connections(graph, contig1, anchored_connection, contig2, sequences)[0]
end

#ready_two_contigs_and_connections(graph, contig1, anchored_connection, contig2, sequences) ⇒ Object

Given two contigs, return a consensus path and variants of the path.

        ---------->         <--------           start and end probes (ends of probe sequences may not form part of final path). Directions not variable.
--------------------->NNNN------------------->  original sequence to be gapfilled (contig1, NNNN, contig2). Directions not variable
    -----------                 ------->        path across the gap. Direction not variable
               \               /
                --------------
    ---------->|<-----|----->|--------->        nodes that make up the path (directions and boundaries variable)
  stage1|           stage2           |stage3    stages of sequence construction in this method

Much like one_connection_between_two_contigs except can handle multiple connections (but cannot handle 0 connections)



67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
# File 'lib/assembly/contig_printer.rb', line 67

def ready_two_contigs_and_connections(graph, contig1, anchored_connection, contig2, sequences)
  to_return = ''
  variants = []

  log.debug "Working with anchored_connection: #{anchored_connection.inspect}" if log.debug?

  # Stage1 - contig1 before the path begins
  to_return = nil
  if anchored_connection.start_probe_contig_offset == 0
    # 0 is a special case because negative 0 doesn't make sense
    to_return = contig1
  else
    to_return = contig1[0...-(anchored_connection.start_probe_contig_offset)]
  end
  log.debug "After first chunk of sequence added, sequence is #{to_return.length}bp long" if log.debug?

  # Stage2 - path sequence, beginning and ending with
  # beginning and ending probes
  begin
    example_path = anchored_connection.paths[0]

    # Find start index
    begin_onode = example_path[0]
    begin_noded_read = anchored_connection.start_probe_noded_read
    raise if begin_noded_read.nil?
    extra_bit_on_start = ''
    if begin_noded_read.start_coord != 0
      log.warn "Unexpectedly the start of the start probe not did not form part of the path, which is a little suspicious"
      extra_bit_on_start = sequences[begin_noded_read.read_id][0...begin_noded_read.start_coord]
    end
    offset_of_begin_probe_on_path = nil
    # xor read direction on node, and node direction on path
    if (begin_noded_read.direction == true) ^ begin_onode.starts_at_start?
      offset_of_begin_probe_on_path = begin_onode.node.corresponding_contig_length - begin_noded_read.offset_from_start_of_node
      # extra bit on read needs to be reverse complemented
      extra_bit_on_start = Bio::Sequence::NA.new(extra_bit_on_start).reverse_complement.to_s.upcase unless extra_bit_on_start == ''
    else
      offset_of_begin_probe_on_path = begin_noded_read.offset_from_start_of_node
    end

    path_sequence, variants = sequences_to_variants_conservative(
      anchored_connection.paths.collect do |path|
        seq = nil
        begin
          seq = path.sequence
        rescue Bio::Velvet::Graph::OrientedNodeTrail::InsufficientLengthException => e
          log.warn "Failed to join two contigs together because of inability to get sequence out of a trail of nodes. In the past this has been caused by low coverage thus making finishM inappropriate, so returning an unconnected contig now. However, this may be legitimate in the case of an unlucky misassembly at both ends of the contigs being joined, so please report this error to the author."
          return nil, nil
        end
        seq
      end
      )
    log.debug "Reference path has a sequence length #{path_sequence.length}" if log.debug?

    # Correct variants' positions to be relative to the full contig,
    # not just the path sequence
    variants.each do |variant|
      variant.position = variant.position - offset_of_begin_probe_on_path + to_return.length + 1
    end

    # Find end index
    end_onode = example_path[-1]
    end_noded_read = anchored_connection.end_probe_noded_read
    raise if end_noded_read.nil?
    extra_bit_on_end = ''
    if end_noded_read.start_coord != 0
      log.warn "Unexpectedly the end of the end probe not did not form part of the path, which is a little suspicious"
      extra_bit_on_end = sequences[end_noded_read.read_id][0...end_noded_read.start_coord]
    end
    # Potentially the example_path has a different length than the reference sequence in bp.
    # Correct this ? Or not a bug? confused. I hate this method. TODO. There is a test for this which is unwritten but it fails
    offset_of_end_node_on_path = example_path[0...-1].reduce(0){|sum, onode| sum += onode.node.length_alone}
    if (end_noded_read.direction == false) ^ end_onode.starts_at_start?
      offset_of_end_node_on_path += end_noded_read.offset_from_start_of_node
      extra_bit_on_end = Bio::Sequence::NA.new(extra_bit_on_end).reverse_complement.to_s.upcase unless extra_bit_on_end == ''
    else
      offset_of_end_node_on_path += end_onode.node.corresponding_contig_length - end_noded_read.offset_from_start_of_node
    end

    log.debug "Found start index #{offset_of_begin_probe_on_path} and end index #{offset_of_end_node_on_path}" if log.debug?
    to_return += extra_bit_on_start+
      path_sequence[offset_of_begin_probe_on_path...offset_of_end_node_on_path]+
      extra_bit_on_end
    log.debug "After path chunk of sequence added, sequence is #{to_return.length}bp long" if log.debug?
  end #end stage 2

  # Stage 3
  to_return += contig2[anchored_connection.end_probe_contig_offset..-1]
  log.debug "After last chunk of sequence added, sequence is #{to_return.length}bp long" if log.debug?

  return to_return, variants
end