Class: Bio::AssemblyGraphAlgorithms::ContigPrinter
- Inherits:
-
Object
- Object
- Bio::AssemblyGraphAlgorithms::ContigPrinter
- Includes:
- FinishM::Logging
- Defined in:
- lib/assembly/contig_printer.rb
Defined Under Namespace
Classes: AnchoredConnection, PrintableConnection, Variant
Instance Method Summary collapse
-
#one_connection_between_two_contigs(graph, contig1, anchored_connection, contig2, sequences) ⇒ Object
Like ready_two_contigs_and_connections except assumes that there is only a single connection between the two sides.
-
#ready_two_contigs_and_connections(graph, contig1, anchored_connection, contig2, sequences) ⇒ Object
Given two contigs, return a consensus path and variants of the path.
Methods included from FinishM::Logging
Instance Method Details
#one_connection_between_two_contigs(graph, contig1, anchored_connection, contig2, sequences) ⇒ Object
Like ready_two_contigs_and_connections except assumes that there is only a single connection between the two sides
162 163 164 165 |
# File 'lib/assembly/contig_printer.rb', line 162 def one_connection_between_two_contigs(graph, contig1, anchored_connection, contig2, sequences) raise "programming error: only one path expected here" if anchored_connection.paths.length > 1 return ready_two_contigs_and_connections(graph, contig1, anchored_connection, contig2, sequences)[0] end |
#ready_two_contigs_and_connections(graph, contig1, anchored_connection, contig2, sequences) ⇒ Object
Given two contigs, return a consensus path and variants of the path.
----------> <-------- start and end probes (ends of probe sequences may not form part of final path). Directions not variable.
--------------------->NNNN-------------------> original sequence to be gapfilled (contig1, NNNN, contig2). Directions not variable
----------- -------> path across the gap. Direction not variable
\ /
--------------
---------->|<-----|----->|---------> nodes that make up the path (directions and boundaries variable)
stage1| stage2 |stage3 stages of sequence construction in this method
Much like one_connection_between_two_contigs except can handle multiple connections (but cannot handle 0 connections)
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 |
# File 'lib/assembly/contig_printer.rb', line 67 def ready_two_contigs_and_connections(graph, contig1, anchored_connection, contig2, sequences) to_return = '' variants = [] log.debug "Working with anchored_connection: #{anchored_connection.inspect}" if log.debug? # Stage1 - contig1 before the path begins to_return = nil if anchored_connection.start_probe_contig_offset == 0 # 0 is a special case because negative 0 doesn't make sense to_return = contig1 else to_return = contig1[0...-(anchored_connection.start_probe_contig_offset)] end log.debug "After first chunk of sequence added, sequence is #{to_return.length}bp long" if log.debug? # Stage2 - path sequence, beginning and ending with # beginning and ending probes begin example_path = anchored_connection.paths[0] # Find start index begin_onode = example_path[0] begin_noded_read = anchored_connection.start_probe_noded_read raise if begin_noded_read.nil? extra_bit_on_start = '' if begin_noded_read.start_coord != 0 log.warn "Unexpectedly the start of the start probe not did not form part of the path, which is a little suspicious" extra_bit_on_start = sequences[begin_noded_read.read_id][0...begin_noded_read.start_coord] end offset_of_begin_probe_on_path = nil # xor read direction on node, and node direction on path if (begin_noded_read.direction == true) ^ begin_onode.starts_at_start? offset_of_begin_probe_on_path = begin_onode.node.corresponding_contig_length - begin_noded_read.offset_from_start_of_node # extra bit on read needs to be reverse complemented extra_bit_on_start = Bio::Sequence::NA.new(extra_bit_on_start).reverse_complement.to_s.upcase unless extra_bit_on_start == '' else offset_of_begin_probe_on_path = begin_noded_read.offset_from_start_of_node end path_sequence, variants = sequences_to_variants_conservative( anchored_connection.paths.collect do |path| seq = nil begin seq = path.sequence rescue Bio::Velvet::Graph::OrientedNodeTrail::InsufficientLengthException => e log.warn "Failed to join two contigs together because of inability to get sequence out of a trail of nodes. In the past this has been caused by low coverage thus making finishM inappropriate, so returning an unconnected contig now. However, this may be legitimate in the case of an unlucky misassembly at both ends of the contigs being joined, so please report this error to the author." return nil, nil end seq end ) log.debug "Reference path has a sequence length #{path_sequence.length}" if log.debug? # Correct variants' positions to be relative to the full contig, # not just the path sequence variants.each do |variant| variant.position = variant.position - offset_of_begin_probe_on_path + to_return.length + 1 end # Find end index end_onode = example_path[-1] end_noded_read = anchored_connection.end_probe_noded_read raise if end_noded_read.nil? extra_bit_on_end = '' if end_noded_read.start_coord != 0 log.warn "Unexpectedly the end of the end probe not did not form part of the path, which is a little suspicious" extra_bit_on_end = sequences[end_noded_read.read_id][0...end_noded_read.start_coord] end # Potentially the example_path has a different length than the reference sequence in bp. # Correct this ? Or not a bug? confused. I hate this method. TODO. There is a test for this which is unwritten but it fails offset_of_end_node_on_path = example_path[0...-1].reduce(0){|sum, onode| sum += onode.node.length_alone} if (end_noded_read.direction == false) ^ end_onode.starts_at_start? offset_of_end_node_on_path += end_noded_read.offset_from_start_of_node extra_bit_on_end = Bio::Sequence::NA.new(extra_bit_on_end).reverse_complement.to_s.upcase unless extra_bit_on_end == '' else offset_of_end_node_on_path += end_onode.node.corresponding_contig_length - end_noded_read.offset_from_start_of_node end log.debug "Found start index #{offset_of_begin_probe_on_path} and end index #{offset_of_end_node_on_path}" if log.debug? to_return += extra_bit_on_start+ path_sequence[offset_of_begin_probe_on_path...offset_of_end_node_on_path]+ extra_bit_on_end log.debug "After path chunk of sequence added, sequence is #{to_return.length}bp long" if log.debug? end #end stage 2 # Stage 3 to_return += contig2[anchored_connection.end_probe_contig_offset..-1] log.debug "After last chunk of sequence added, sequence is #{to_return.length}bp long" if log.debug? return to_return, variants end |