Class: Bio::Ngs::Cufflinks::Compare

Inherits:
Object
  • Object
show all
Includes:
Command::Wrapper
Defined in:
lib/bio/appl/ngs/cufflinks.rb

Overview

cuffcompare v1.0.2 (2335)


Usage: cuffcompare [-r <reference_mrna.gtf>] [-R] [-T] [-V] [-s <seq_path>]

   [-o <outprefix>] [-p <cprefix>]
   {-i <input_gtf_list> | <input1.gtf> [<input2.gtf> .. <inputN.gtf>]}

Cuffcompare provides classification, reference annotation mapping and various
statistics for Cufflinks transfrags.
Cuffcompare clusters and tracks transfrags across multiple samples, writing
matching transcripts (intron chains) into <outprefix>.tracking, and a GTF
file <outprefix>.combined.gtf containing a nonredundant set of transcripts
across all input files (with a single representative transfrag chosen
for each clique of matching transfrags across samples).

Options: -i provide a text file with a list of Cufflinks GTF files to process instead

of expecting them as command line arguments (useful when a large number
of GTF files should be processed)

-r a set of known mRNAs to use as a reference for assessing

the accuracy of mRNAs or gene models given in <input.gtf>

-R for -r option, reduce the set of reference transcripts to

only those found to overlap any of the input loci

-M discard (ignore) single-exon transfrags and reference transcripts -N discard (ignore) single-exon reference transcripts

-s <seq_path> can be a multi-fasta file with all the genomic sequences or

a directory containing multiple single-fasta files (one file per contig);
lower case bases will be used to classify input transcripts as repeats

-d max distance (range) for grouping transcript start sites (100) -p the name prefix to use for consensus transcripts in the

<outprefix>.combined.gtf file (default: 'TCONS')

-C include the “contained” transcripts in the .combined.gtf file -G generic GFF input file(s) (do not assume Cufflinks GTF) -T do not generate .tmap and .refmap files for each input file -V verbose processing mode (showing all GFF parsing warnings)

Class Method Summary collapse

Methods included from Command::Wrapper

#class_name, #default_options, included, #initialize, #normalize_params, #options, #options=, #output, #params, #params=, #path, #path=, #pipe_ahead, #pipe_ahead=, #pipe_ahead?, #program, #reset_params, #run, #sub_program, #thor_task, #to_cmd_ary, #use_aliases?

Class Method Details

.build_compare_kb(gtf) ⇒ Object

Dump an hash of associations from a GTF file generated from CuffCompare gene_id: transcript_id, gene_name, oid, nearest_ref

gene_id example: :XLOC_000001=>{:gene_name=>:RP11-304M2.1, :transcripts=>{:TCONS_00000001=>{:oid=>:ENST00000519787, :nearest_ref=>:ENST00000519787}}}

the others are just plain hash transcript_id: gene_id, gene_name, oid, nearest_ref gene_name: gene_id, transcript_id, oid, nearest_ref oid: gene_id, transcript_id, gene_name, nearest_ref nearest_ref: gene_id, transcript_id, gene_name, oid Note:exons and coordinates are not saved.



555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
# File 'lib/bio/appl/ngs/cufflinks.rb', line 555

def build_compare_kb(gtf)
  unless File.exists?(gtf)
    STDERR.puts "File #{gtf} doesn't exist."
    return nil
  end

  dict = {} #build an hash with the combinations of data extracted from GTF file, XLOC, TCONS, ENST, SYMBOL
  File.open(gtf,'r') do |f|
    f.lines do |line|
      line=~/gene_id (.*?);/
      gene_id = $1.gsub(/"/,'').to_sym
      line=~/transcript_id (.*?);/
      transcript_id = $1.gsub(/"/,'').to_sym
      line=~/gene_name (.*?);/
      gene_name = $1.gsub(/"/,'').to_sym
      line=~/oId (.*?);/
      oid=$1.gsub(/"/,'').to_sym
      line=~/nearest_ref (.*?);/
      nearest_ref = $1.gsub(/"/,'').to_sym
      unless dict.key?(gene_id)
        dict[gene_id]={:gene_name=>gene_name,:transcripts=>{}}
      end
      unless dict[gene_id][:transcripts].key?(transcript_id)
        dict[gene_id][:transcripts][transcript_id]={:odi=>oid, :nearest_ref=>nearest_ref}
      end
      dict[transcript_id]={:gene_id=>gene_id, :gene_name=>gene_name, :odi=>oid, :nearest_ref=>nearest_ref}
      dict[gene_name]={:gene_id=>gene_id, :transcript_id=>transcript_id, :odi=>oid, :nearest_ref=>nearest_ref}
      dict[oid]={:gene_id=>gene_id, :transcript_id=>transcript_id, :gene_name=>gene_name, :nearest_ref=>nearest_ref}
      dict[nearest_ref]={:gene_id=>gene_id, :transcript_id=>transcript_id, :odi=>oid, :gene_name=>gene_name}
    end#lines
  end#file
  kb_filename = kb_name(gtf)
  File.open(kb_filename,'w') do |fkb|
    #fkb.write(dict.to_json)
    Marshal.dump(dict,fkb)
  end #fkb
  dict
end

.exists_kb?(gtf) ⇒ Boolean

Returns:

  • (Boolean)


542
543
544
# File 'lib/bio/appl/ngs/cufflinks.rb', line 542

def exists_kb?(gtf)
  File.exists?(kb_name(gtf))
end

.kb_name(gtf) ⇒ Object



538
539
540
# File 'lib/bio/appl/ngs/cufflinks.rb', line 538

def kb_name(gtf)
  gtf.sub(/\.[a-zA-Z0-9]*$/,".kb")
end

.load_compare_kb(gtf) ⇒ Object

Return the hash of associations gene_id: transcript_id, gene_name, oid, nearest_ref transcript_id: gene_id, gene_name, oid, nearest_ref gene_name: gene_id, transcript_id, oid, nearest_ref oid: gene_id, transcript_id, gene_name, nearest_ref nearest_ref: gene_id, transcript_id, gene_name, oid



600
601
602
603
604
605
606
# File 'lib/bio/appl/ngs/cufflinks.rb', line 600

def load_compare_kb(gtf)
  #TODO rescue Exceptions
  kb_filename = kb_name(gtf)
  gtf_kb = File.open(kb_filename,'r') do |kb_dump|
    Marshal.load(kb_dump)
  end
end