Class: Mascot::DAT::Peptides
- Inherits:
-
Object
- Object
- Mascot::DAT::Peptides
- Includes:
- Enumerable
- Defined in:
- lib/mascot/dat/peptides.rb
Overview
A parser for the peptide spectrum match results of a Mascot DAT file. As opposed to the other sections of a DAT file, you don’t really want to access this section as one big chunk in memory. It is often quite large and needs to be accessed using Enumerable methods.
From the Mascot documentation, the following represents a reasonably complete PSM
q1_p1_db=01 # two digit integer of the search DB index, zero filled and retarded.
q1_p1=missed cleavages, (–1 indicates no match)
peptide Mr,
delta,
number of ions matched,
peptide string,
peaks used from Ions1,
variable modifications string,
ions score,
ion series found,
peaks used from Ions2,
peaks used from Ions3;
“accession string”:frame number:start:end:multiplicity, # data for first protein
“accession string”:frame number:start:end:multiplicity, # data for second protein, etc.
q1_p1_et_mods=modification mass,
neutral loss mass,
modification description
q1_p1_primary_nl=neutral loss string
q1_p1_drange=startPos:endPos
q1_p1_terms=residue,residue:residue,residue # flanking AA for each protien, in order
Instance Attribute Summary collapse
-
#byteoffset ⇒ Object
readonly
A hash of the index positions for the peptide PSM matches.
-
#endbytepos ⇒ Object
readonly
A hash of the index positions for the peptide PSM matches.
-
#psmidx ⇒ Object
readonly
A hash of the index positions for the peptide PSM matches.
Instance Method Summary collapse
-
#each ⇒ Object
Iterate through all of the PSM entries in the DAT file.
- #index_psm_positions ⇒ Object
-
#initialize(dat_file, byteoffset, cache_psm_index = true) ⇒ Peptides
constructor
To create a peptides enumerable, you need to pass in the dat file handle and the byte offset of the peptides section.
-
#next_psm ⇒ Object
Returns the next PSM from the DAT file.
-
#psm(q, p) ⇒ Object
Return a specific PSM identified for query
qand peptide numberp. - #rewind ⇒ Object
Constructor Details
#initialize(dat_file, byteoffset, cache_psm_index = true) ⇒ Peptides
To create a peptides enumerable, you need to pass in the dat file handle and the byte offset of the peptides section.
39 40 41 42 43 44 45 46 47 48 49 50 |
# File 'lib/mascot/dat/peptides.rb', line 39 def initialize(dat_file, byteoffset, cache_psm_index=true) @byteoffset = byteoffset @endbytepos = nil @file = dat_file @file.pos = @byteoffset @curr_psm = [1,1] @psmidx = [] @cache_psm_index = cache_psm_index index_psm_positions() end |
Instance Attribute Details
#byteoffset ⇒ Object (readonly)
A hash of the index positions for the peptide PSM matches. Keys arr
35 36 37 |
# File 'lib/mascot/dat/peptides.rb', line 35 def byteoffset @byteoffset end |
#endbytepos ⇒ Object (readonly)
A hash of the index positions for the peptide PSM matches. Keys arr
35 36 37 |
# File 'lib/mascot/dat/peptides.rb', line 35 def endbytepos @endbytepos end |
#psmidx ⇒ Object (readonly)
A hash of the index positions for the peptide PSM matches. Keys arr
35 36 37 |
# File 'lib/mascot/dat/peptides.rb', line 35 def psmidx @psmidx end |
Instance Method Details
#each ⇒ Object
Iterate through all of the Mascot::DAT::PSM entries in the DAT file.
117 118 119 120 121 122 123 124 |
# File 'lib/mascot/dat/peptides.rb', line 117 def each @file.pos = @byteoffset while @file.pos < @endbytepos psm = next_psm() next if psm.nil? yield psm end end |
#index_psm_positions ⇒ Object
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
# File 'lib/mascot/dat/peptides.rb', line 52 def index_psm_positions # create an in-memroy index of PSM byteoffsets q,p = 0 @boundary_line = @file.readline @boundary = Regexp.new(@boundary_line) @file.each do |line| break if line =~ @boundary if @cache_psm_index line =~ /q(\d+)_p(\d+)/ i,j = $1.to_i, $2.to_i next if q == i && p == j unless @psmidx[i].kind_of? Array q = i @psmidx[q] = [] end @psmidx[i][j] = @file.pos - line.length q,p = i,j end end @endbytepos = @file.pos - @boundary_line.length rewind end |
#next_psm ⇒ Object
Returns the next Mascot::DAT::PSM from the DAT file. If there is no other PSM, then it returns nil.
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 |
# File 'lib/mascot/dat/peptides.rb', line 90 def next_psm return nil if @file.pos >= @endbytepos # get the initial values for query & rank tmp = [] tmp << @file.readline.chomp k,v = tmp[0].split "=" # skip when there are no peptides (value equals -1) return nil if v == "-1" tmp[0] =~ /q(\d+)_p(\d+)/ q = $1 p = $2 tmp_pos = @file.pos @file.each do |l| break if l =~ @boundary break unless l =~ /^q#{q}_p#{p}/ tmp << l.chomp tmp_pos = @file.pos end @file.pos = tmp_pos Mascot::DAT::PSM.parse(tmp) end |
#psm(q, p) ⇒ Object
Return a specific Mascot::DAT::PSM identified for query q and peptide number p
83 84 85 86 |
# File 'lib/mascot/dat/peptides.rb', line 83 def psm q,p @file.pos = @psmidx[q][p] next_psm end |
#rewind ⇒ Object
75 76 77 |
# File 'lib/mascot/dat/peptides.rb', line 75 def rewind @file.pos = @byteoffset + @boundary_line.length end |