Class: Mascot::DAT::Peptides

Inherits:
Object
  • Object
show all
Includes:
Enumerable
Defined in:
lib/mascot/dat/peptides.rb

Overview

A parser for the peptide spectrum match results of a Mascot DAT file. As opposed to the other sections of a DAT file, you don’t really want to access this section as one big chunk in memory. It is often quite large and needs to be accessed using Enumerable methods.

From the Mascot documentation, the following represents a reasonably complete PSM

q1_p1_db=01  # two digit integer of the search DB index, zero filled and retarded.
q1_p1=missed cleavages, (–1 indicates no match)
      peptide Mr,
      delta,
      number of ions matched,
      peptide string,
      peaks used from Ions1,
      variable modifications string,
      ions score,
      ion series found,
      peaks used from Ions2,
      peaks used from Ions3;
      “accession string”:frame number:start:end:multiplicity, # data for first protein
      “accession string”:frame number:start:end:multiplicity, # data for second protein, etc.
q1_p1_et_mods=modification mass,
              neutral loss mass,
              modification description
q1_p1_primary_nl=neutral loss string
q1_p1_drange=startPos:endPos
q1_p1_terms=residue,residue:residue,residue # flanking AA for each protien, in order

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(dat_file, byteoffset, cache_psm_index = true) ⇒ Peptides

To create a peptides enumerable, you need to pass in the dat file handle and the byte offset of the peptides section.



39
40
41
42
43
44
45
46
47
48
49
50
# File 'lib/mascot/dat/peptides.rb', line 39

def initialize(dat_file, byteoffset, cache_psm_index=true)
  @byteoffset = byteoffset
  @endbytepos = nil

  @file = dat_file

  @file.pos = @byteoffset
  @curr_psm = [1,1]
  @psmidx = []
  @cache_psm_index = cache_psm_index
  index_psm_positions()
end

Instance Attribute Details

#byteoffsetObject (readonly)

A hash of the index positions for the peptide PSM matches. Keys arr



35
36
37
# File 'lib/mascot/dat/peptides.rb', line 35

def byteoffset
  @byteoffset
end

#endbyteposObject (readonly)

A hash of the index positions for the peptide PSM matches. Keys arr



35
36
37
# File 'lib/mascot/dat/peptides.rb', line 35

def endbytepos
  @endbytepos
end

#psmidxObject (readonly)

A hash of the index positions for the peptide PSM matches. Keys arr



35
36
37
# File 'lib/mascot/dat/peptides.rb', line 35

def psmidx
  @psmidx
end

Instance Method Details

#eachObject

Iterate through all of the Mascot::DAT::PSM entries in the DAT file.

Returns:

  • Enumerator



117
118
119
120
121
122
123
124
# File 'lib/mascot/dat/peptides.rb', line 117

def each
  @file.pos = @byteoffset
  while @file.pos < @endbytepos
    psm = next_psm()
    next if psm.nil?
    yield psm
  end
end

#index_psm_positionsObject



52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# File 'lib/mascot/dat/peptides.rb', line 52

def index_psm_positions
  # create an in-memroy index of PSM byteoffsets
  q,p  = 0
  @boundary_line = @file.readline
  @boundary   = Regexp.new(@boundary_line)
  @file.each do |line|
    break if line =~ @boundary
    if @cache_psm_index
      line =~ /q(\d+)_p(\d+)/
      i,j = $1.to_i, $2.to_i
      next if q == i && p == j
      unless @psmidx[i].kind_of? Array
        q = i
        @psmidx[q] = []
      end
      @psmidx[i][j] = @file.pos - line.length
      q,p = i,j
    end
  end
  @endbytepos = @file.pos - @boundary_line.length
  rewind
end

#next_psmObject

Returns the next Mascot::DAT::PSM from the DAT file. If there is no other PSM, then it returns nil.

Returns:

  • Mascot::DAT::PSM



90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
# File 'lib/mascot/dat/peptides.rb', line 90

def next_psm
  return nil if @file.pos >= @endbytepos
  # get the initial values for query & rank
  tmp = []
  tmp << @file.readline.chomp
  k,v = tmp[0].split "="
  # skip when there are no peptides (value equals -1)
  return nil if v == "-1"

  tmp[0] =~ /q(\d+)_p(\d+)/
  q = $1
  p = $2

  tmp_pos = @file.pos
  @file.each do |l|
    break if l =~ @boundary
    break unless l =~ /^q#{q}_p#{p}/
    tmp << l.chomp
    tmp_pos = @file.pos
  end
  @file.pos = tmp_pos

  Mascot::DAT::PSM.parse(tmp)
end

#psm(q, p) ⇒ Object

Return a specific Mascot::DAT::PSM identified for query q and peptide number p

Parameters:

  • q

    Fixnum

  • p

    Fixnum

Returns:

  • Mascot::DAT::PSM



83
84
85
86
# File 'lib/mascot/dat/peptides.rb', line 83

def psm q,p
  @file.pos  =  @psmidx[q][p]
  next_psm
end

#rewindObject



75
76
77
# File 'lib/mascot/dat/peptides.rb', line 75

def rewind
  @file.pos = @byteoffset + @boundary_line.length
end