Class: RGFA

Inherits:
Object show all
Includes:
Comments, Connectivity, Containments, Headers, LinearPaths, Lines, Links, LoggerSupport, Multiplication, Paths, RGL, Segments, RGFATools
Defined in:
lib/rgfa.rb,
lib/rgfa.rb,
lib/rgfatools.rb

Overview

Main class of the RGFA library.

RGFA provides a representation of a GFA graph. It supports creating a graph from scratch, input and output from/to file or strings, as well as several operations on the graph. The examples below show how to create a RGFA object from scratch or from a GFA file, write the RGFA to file, output the string representation or a statistics report, and control the validation level.

Interacting with the graph

Examples:

Creating an empty RGFA object

gfa = RGFA.new

Parsing and writing GFA format

gfa = RGFA.from_file(filename) # parse GFA file
gfa.to_file(filename) # write to GFA file
puts gfa # show GFA representation of RGFA object

Basic statistics report

puts gfa.info # print report
puts gfa.info(short = true) # compact format, in one line

Validation

gfa = RGFA.from_file(filename, validate: 1) # default level is 2
gfa.validate = 3 # change validation level
gfa.turn_off_validations # equivalent to gfa.validate = 0
gfa.validate! # run post-validations (e.g. check segment names in links)

Defined Under Namespace

Modules: Comments, Connectivity, Containments, FieldParser, FieldValidator, FieldWriter, Headers, LinearPaths, Lines, Links, LoggerSupport, Multiplication, Paths, Segments, Sequence Classes: ByteArray, CIGAR, DuplicatedLabelError, Error, FieldArray, Line, LineMissingError, Logger, NumericArray, OrientedSegment, SegmentEnd, SegmentEndsPath, SegmentInfo

Constant Summary

Constants included from RGFATools::Multiplication

RGFATools::Multiplication::LINKS_DISTRIBUTION_POLICY

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from RGFATools::PBubbles

#remove_p_bubble, #remove_p_bubbles

Methods included from RGFATools::LinearPaths

#merge_linear_path

Methods included from RGFATools::SuperfluousLinks

#enforce_all_mandatory_links, #enforce_segment_mandatory_links, #remove_self_link, #remove_self_links

Methods included from RGFATools::Multiplication

#multiply_extended, #multiply_with_rgfatools

Methods included from RGFATools::InvertibleSegments

#randomly_orient_invertible, #randomly_orient_invertibles

Methods included from RGFATools::CopyNumber

#apply_copy_number, #apply_copy_numbers, #compute_copy_numbers, #delete_low_coverage_segments, #set_count_unit_length, #set_default_count_tag

Methods included from RGFATools::Artifacts

#remove_dead_ends, #remove_small_components

Methods included from LoggerSupport

#enable_progress_logging, #progress_log, #progress_log_end, #progress_log_init

Methods included from Multiplication

#multiply

Methods included from Connectivity

#connected_components, #connectivity, #cut_link?, #cut_segment?, #segment_connected_component, #split_connected_components

Methods included from LinearPaths

#linear_path, #linear_paths, #merge_linear_path, #merge_linear_paths

Methods included from Paths

#delete_path, #path, #path!, #paths, #paths_with

Methods included from Containments

#contained_in, #containing, #containment, #containment!, #containments, #containments_between, #delete_containment

Methods included from Links

#delete_link, #delete_other_links, #link, #link!, #link_from_to, #link_from_to!, #links, #links_between, #links_from, #links_from_to, #links_of, #links_to, #neighbours

Methods included from Segments

#connected_segments, #delete_segment, #segment, #segment!, #segments, #unconnect_segments

Methods included from Headers

#delete_headers, #header, #headers

Methods included from Lines

#<<, #rename, #rm

Methods included from Comments

#comments, #delete_comment, #delete_comments

Constructor Details

#initialize(validate: 2) ⇒ RGFA

Returns a new instance of RGFA.

Parameters:

  • validate (Integer) (defaults to: 2)

    (defaults to: 2) the validation level; see “Validation level” under RGFA::Line#initialize.



107
108
109
110
111
112
113
114
115
116
117
118
119
# File 'lib/rgfa.rb', line 107

def initialize(validate: 2)
  @validate = validate
  init_headers
  @segments = {}
  @links = []
  @containments = []
  @paths = {}
  @comments = []
  @segments_first_order = false
  @progress = false
  @default = {:count_tag => :RC, :unit_length => 1}
  @extensions_enabled = false
end

Instance Attribute Details

#validateObject

Returns the value of attribute validate.



101
102
103
# File 'lib/rgfa.rb', line 101

def validate
  @validate
end

Class Method Details

.from_file(filename, validate: 2) ⇒ RGFA

Creates a RGFA instance parsing the file with specified filename

Parameters:

  • filename (String)
  • validate (Integer) (defaults to: 2)

    (defaults to: 2) the validation level; see “Validation level” under RGFA::Line#initialize.

Returns:

Raises:

  • if file cannot be opened for reading



207
208
209
210
211
# File 'lib/rgfa.rb', line 207

def self.from_file(filename, validate: 2)
  gfa = RGFA.new(validate: validate)
  gfa.read_file(filename)
  return gfa
end

Instance Method Details

#==(other) ⇒ Boolean

Compare two RGFA instances.

Returns:

  • (Boolean)

    are the lines of the two instances equivalent?



288
289
290
291
292
293
294
# File 'lib/rgfa.rb', line 288

def ==(other)
  segments == other.segments and
    links == other.links and
    containments == other.containments and
    headers == other.headers and
    paths == other.paths
end

#cloneRGFA

Create a copy of the RGFA instance.

Returns:



175
176
177
178
179
180
181
# File 'lib/rgfa.rb', line 175

def clone
  cpy = to_s.to_rgfa(validate: 0)
  cpy.validate = @validate
  cpy.enable_progress_logging if @progress
  cpy.require_segments_first_order if @segments_first_order
  return cpy
end

#disable_extensionsvoid

This method returns an undefined value.

Disable RGFATools extensions of RGFA methods



98
99
100
# File 'lib/rgfatools.rb', line 98

def disable_extensions
  @extensions_enabled = false
end

#enable_extensionsvoid

This method returns an undefined value.

Enable RGFATools extensions of RGFA methods



92
93
94
# File 'lib/rgfatools.rb', line 92

def enable_extensions
  @extensions_enabled = true
end

#info(short = false) ⇒ String

Output basic statistics about the graph’s sequence and topology information.

Compact output has the following keys:

  • ns: number of segments

  • nl: number of links

  • cc: number of connected components

  • de: number of dead ends

  • tl: total length of segment sequences

  • 50: N50 segment sequence length

Normal output outputs a table with the same information, plus some additional one: the length of the largest component, as well as the shortest and largest and 1st/2nd/3rd quartiles of segment sequence length.

Parameters:

  • short (boolean) (defaults to: false)

    compact output as a single text line

Returns:

  • (String)

    sequence and topology information collected from the graph.



242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
# File 'lib/rgfa.rb', line 242

def info(short = false)
  q, n50, tlen = lenstats
  nde = n_dead_ends()
  pde = "%.2f%%" % ((nde.to_f*100) / (segments.size*2))
  cc = connected_components()
  cc.map!{|c|c.map{|sn|segment!(sn).length!}.inject(:+)}
  if short
    return "ns=#{segments.size}\t"+
           "nl=#{links.size}\t"+
           "cc=#{cc.size}\t"+
           "de=#{nde}\t"+
           "tl=#{tlen}\t"+
           "50=#{n50}"
  end
  retval = []
  retval << "Segment count:               #{segments.size}"
  retval << "Links count:                 #{links.size}"
  retval << "Total length (bp):           #{tlen}"
  retval << "Dead ends:                   #{nde}"
  retval << "Percentage dead ends:        #{pde}"
  retval << "Connected components:        #{cc.size}"
  retval << "Largest component (bp):      #{cc.last}"
  retval << "N50 (bp):                    #{n50}"
  retval << "Shortest segment (bp):       #{q[0]}"
  retval << "Lower quartile segment (bp): #{q[1]}"
  retval << "Median segment (bp):         #{q[2]}"
  retval << "Upper quartile segment (bp): #{q[3]}"
  retval << "Longest segment (bp):        #{q[4]}"
  return retval
end

#n_dead_endsInteger

Counts the dead ends.

Dead ends are here defined as segment ends without connections.

Returns:

  • (Integer)

    number of dead ends in the graph



279
280
281
282
283
284
# File 'lib/rgfa.rb', line 279

def n_dead_ends
  segments.inject(0) do |n,s|
    [:E, :B].each {|e| n+= 1 if links_of([s.name, e]).empty?}
    n
  end
end

#path_namesArray<Symbol>

List all names of path lines in the graph

Returns:



145
146
147
# File 'lib/rgfa.rb', line 145

def path_names
  @paths.keys.compact
end

#read_file(filename) ⇒ self

Populates a RGFA instance reading from file with specified filename

Parameters:

Returns:

  • (self)

Raises:

  • if file cannot be opened for reading



187
188
189
190
191
192
193
194
195
196
197
198
199
200
# File 'lib/rgfa.rb', line 187

def read_file(filename)
  if @progress
    linecount = `wc -l #{filename}`.strip.split(" ")[0].to_i
    progress_log_init(:read_file, "lines", linecount,
                      "Parse file with #{linecount} lines")
  end
  File.foreach(filename) do |line|
    self << line.chomp
    progress_log(:read_file) if @progress
  end
  progress_log_end(:read_file) if @progress
  validate! if @validate >= 1
  self
end

#require_segments_first_ordervoid

This method returns an undefined value.

Require that the links, containments and paths referring to a segment are added after the segment. Default: do not require any particular ordering.



126
127
128
# File 'lib/rgfa.rb', line 126

def require_segments_first_order
  @segments_first_order = true
end

#segment_namesArray<Symbol>

List all names of segments in the graph

Returns:



139
140
141
# File 'lib/rgfa.rb', line 139

def segment_names
  @segments.keys.compact
end

#to_file(filename) ⇒ void

This method returns an undefined value.

Write RGFA to file with specified filename; overwrites it if it exists

Parameters:

Raises:

  • if file cannot be opened for writing



218
219
220
# File 'lib/rgfa.rb', line 218

def to_file(filename)
  File.open(filename, "w") {|f| each_line {|l| f.puts l}}
end

#to_rgfaself

Return the gfa itself

Returns:

  • (self)


169
170
171
# File 'lib/rgfa.rb', line 169

def to_rgfa
  self
end

#to_sString

Creates a string representation of RGFA conforming to the current specifications

Returns:



161
162
163
164
165
# File 'lib/rgfa.rb', line 161

def to_s
  s = ""
  each_line {|line| s << line.to_s; s << "\n"}
  return s
end

#turn_off_validationsvoid

This method returns an undefined value.

Set the validation level to 0. See “Validation level” under RGFA::Line#initialize.



133
134
135
# File 'lib/rgfa.rb', line 133

def turn_off_validations
  @validate = 0
end

#validate!void

This method returns an undefined value.

Post-validation of the RGFA

Raises:

  • if validation fails



152
153
154
155
156
# File 'lib/rgfa.rb', line 152

def validate!
  validate_segment_references!
  validate_path_links!
  return nil
end