Class: Mascot::DAT::Peptides

Inherits:
Object
  • Object
show all
Includes:
Enumerable
Defined in:
lib/mascot/dat/peptides.rb

Overview

A iterator for the peptide spectrum match results of a Mascot DAT file. As opposed to the other sections of a DAT file, you don’t really want to access this section in memory at once. It is often quite large and needs to be accessed using the provided Enumerable or random access methods.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(dat, section_label, cache_psm_index = true) ⇒ Peptides

Returns a new instance of Peptides.

Parameters:

  • dat (Mascot::DAT)

    Source DAT file

  • section_label (Symbol)

    Section header, one of :peptides or :decoy_peptides

  • cache_psm_index (Boolean) (defaults to: true)

    Whether to cache the PSM index



20
21
22
23
24
25
26
27
28
29
30
31
32
# File 'lib/mascot/dat/peptides.rb', line 20

def initialize(dat, , cache_psm_index=true)
  # create our own filehandle, since other operations may interfere with the
  @dat = Mascot::DAT.open(dat.dat_file.path)
  @filehandle = @dat.dat_file
  @section_label = 
  self.rewind
  @curr_psm = [1,1]
  @psmidx = {}
  @endbytepos = Float::INFINITY
  if cache_psm_index
    index_psm_positions()
  end
end

Instance Attribute Details

#psmidxHash{ Fixnum => Hash{ Fixnum => Fixnum }} (readonly)

A nested Hash index of the byte offset positions for the peptide-spectrum-match entries. The keys of the index are the query and peptide rank (Fixnum), the structure of which is:

{ query_number => { peptide_rank => byte_position } }

To access a particular entry, it is better to use the #psm method.

Returns:

  • (Hash{ Fixnum => Hash{ Fixnum => Fixnum }})

    The nested hash of query peptide match byte offsets



15
16
17
# File 'lib/mascot/dat/peptides.rb', line 15

def psmidx
  @psmidx
end

Instance Method Details

#each {|Mascot::DAT::PSM| ... } ⇒ Object

Iterate through all of the Mascot::DAT::PSM entries in the DAT file.

Yields:



87
88
89
90
91
92
# File 'lib/mascot/dat/peptides.rb', line 87

def each
  self.rewind
  while psm = self.next_psm
    yield psm
  end
end

#next_psmMascot::DAT::PSM, NilClass

Returns the next Mascot::DAT::PSM from the DAT file. If there is no other PSM, then it returns nil.

Returns:



57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
# File 'lib/mascot/dat/peptides.rb', line 57

def next_psm
  if @filehandle.pos >= @endbytepos
    return nil
  end
  # get the initial values for query & rank
  buffer = [@filehandle.readline.chomp]
  buffer[0] =~ /q(\d+)_p(\d+)/
  q,p = $1, $2
  @curr_psm = [q,p]
  prev_pos = @filehandle.pos
  @filehandle.each do |l|
    l.chomp!
    # break if we have reached the boundary
    if l =~ @boundary
      @endbytepos = @filehandle.pos - @dat.boundary_string.length
      break
    end
    # break if we are on another PSM
    break unless l =~ /^q#{q}_p#{p}/
    buffer << l
    prev_pos = @filehandle.pos
  end
  # rewind the cursor to the last hit
  @filehandle.pos = prev_pos
  # return the new PSM
  Mascot::DAT::PSM.new(buffer)
end

#psm(query_number, rank) ⇒ Mascot::DAT::PSM

Return a specific Mascot::DAT::PSM identified for query q and peptide number p

Examples:

my_dat.peptides.psm(1,1) # => Mascot::DAT::PSM for query 1 peptide 1

Parameters:

  • query_number (Fixnum)
  • rank (Fixnum)

Returns:

Raises:

  • (Exception)

    if given an invalid q,p coordinate



46
47
48
49
50
51
52
53
# File 'lib/mascot/dat/peptides.rb', line 46

def psm query_number,rank
  if @psmidx[query_number] and @psmidx[query_number][rank]
    @filehandle.pos  =  @psmidx[query_number][rank]
    next_psm
  else
    raise Exception.new "Invalid PSM specification (#{q},#{p})"
  end
end

#rewindObject

Rewind the cursor to the start of the peptides section (e.g. q1_p1=…)



35
36
37
38
# File 'lib/mascot/dat/peptides.rb', line 35

def rewind
  @dat.goto(@section_label)
  1.upto(2) { @filehandle.readline }
end