Class: Sequest::PepXML

Inherits:

Object

Object
Sequest::PepXML

Includes:: SpecIDXML

Defined in:: lib/ms/sequest/pepxml.rb,
lib/ms/sequest/pepxml.rb

Defined Under Namespace

Classes: AAModification, MSMSPipelineAnalysis, MSMSRunSummary, Modifications, Parameters, SearchDatabase, SearchHit, SearchResult, SearchSummary, SpectrumQuery, TerminalModification

Constant Summary collapse

DEF_VERSION =

Default_Options =

{
  :out_path => '.',
  #:backup_db_path => '.',
  # a PepXML option
  :pepxml_version => DEF_VERSION,  
  ## MSMSRunSummary options:
  # string must be recognized in sample_enzyme.rb 
  # or create your own SampleEnzyme object
  :ms_manufacturer => 'ThermoFinnigan',
  :ms_model => 'LCQ Deca XP Plus',
  :ms_ionization => 'ESI',
  :ms_mass_analyzer => 'Ion Trap',
  :ms_detector => 'UNKNOWN',
  :ms_data => '.',      # path to ms data files (raw or mzxml)
  :raw_data_type => "raw",
  :raw_data => ".mzXML", ## even if you don't have it?
  ## SearchSummary options:
  :out_data_type => "out", ## may be srf?? don't think pepxml recognizes this yet
  :out_data => ".tgz", ## may be srf??
  :copy_mzxml => false, # copy the mzxml file to the out_path (create it if necessary)
  :print => false, # print the objects to file
}

Class Attribute Summary collapse

.pepxml_version ⇒ Object

Returns the value of attribute pepxml_version.

Instance Attribute Summary collapse

#avg_parent ⇒ Object

Returns the value of attribute avg_parent.
#base_name ⇒ Object

the full path name (no extension).
#h_plus ⇒ Object

Returns the value of attribute h_plus.
#msms_pipeline_analysis ⇒ Object

Returns the value of attribute msms_pipeline_analysis.
#pepxml_version ⇒ Object

Returns the value of attribute pepxml_version.

Class Method Summary collapse

._prot_num_and_first_prot_by_pep(pep_array) ⇒ Object

updates the private attrs _num_prots and _first_prot on bioworks pep objects.
.base_name_noext(file) ⇒ Object

given any kind of filename (from windows or whatever) returns the base of the filename with no file extension.
.make_base_name(path, filename) ⇒ Object

combines filename in a manner consistent with the path.
.new_from_srf(srf, opts = {}) ⇒ Object

will dynamically set :ms_model and :ms_mass_analyzer from srf info (ignoring defaults or anything passed in) for LTQ Orbitrap and LCQ Deca XP See SRF::Sequest::PepXML::Default_Options hash for defaults unless given, the out_path will be given as the path of the srf_file srf may be an object or a filename.
.set_from_bioworks(bioworks_file, opts = {}) ⇒ Object

takes an .srg or bioworks.xml file if possible, ensures that an mzXML file is present for each pepxml file :print => true, will print files NOTES: num_tol_term and num_missing_cleavages are both calculated from the sample_enzyme.
.set_from_bioworks_xml(bioworks, params, opts = {}) ⇒ Object

Takes bioworks 3.2/3.3 xml output (with no filters) Returns a list of PepXML objects params = sequest.params file bioworks = bioworks.xml exported multi-consensus view file pepxml_version = 0 for tpp 1.2.3 pepxml_version = 18 for tpp 2.8.2, 2.8.3, 2.9.2.

Instance Method Summary collapse

#date ⇒ Object
#doctype ⇒ Object

for pepxml_version == 0.
#fragment_mass_type ⇒ Object
#header ⇒ Object
#initialize(pepxml_version = DEF_VERSION, sequest_params_obj = nil) ⇒ PepXML constructor

msms_pipeline_analysis is set to the result of the yielded block and set_mono_or_avg is called with params if given.
#precursor_mass_type ⇒ Object
#set_mono_or_avg(sequest_params_obj) ⇒ Object

sets @h_plus and @avg_parent from the sequest params object.
#spectrum_queries ⇒ Object

returns an array of spectrum queries.
#style_sheet ⇒ Object
#summary_xml ⇒ Object
#to_pepxml(file = nil) ⇒ Object

outputs pepxml, (to file if given).
#xml_version ⇒ Object

Constructor Details

#initialize(pepxml_version = DEF_VERSION, sequest_params_obj = nil) ⇒ `PepXML`

msms_pipeline_analysis is set to the result of the yielded block and set_mono_or_avg is called with params if given

# File 'lib/ms/sequest/pepxml.rb', line 189

def initialize(pepxml_version=DEF_VERSION, sequest_params_obj=nil)
  self.class.pepxml_version = pepxml_version
  if sequest_params_obj
    set_mono_or_avg(sequest_params_obj)
  end
  if block_given?
    @msms_pipeline_analysis = yield
    @base_name = @msms_pipeline_analysis.msms_run_summary.base_name
  end
end

Class Attribute Details

.pepxml_version ⇒ `Object`

Returns the value of attribute pepxml_version.



169
170
171

# File 'lib/ms/sequest/pepxml.rb', line 169

def pepxml_version
  @pepxml_version
end

Instance Attribute Details

#avg_parent ⇒ `Object`

Returns the value of attribute avg_parent.



178
179
180

# File 'lib/ms/sequest/pepxml.rb', line 178

def avg_parent
  @avg_parent
end

#base_name ⇒ `Object`

the full path name (no extension)



176
177
178

# File 'lib/ms/sequest/pepxml.rb', line 176

def base_name
  @base_name
end

#h_plus ⇒ `Object`

Returns the value of attribute h_plus.



177
178
179

# File 'lib/ms/sequest/pepxml.rb', line 177

def h_plus
  @h_plus
end

#msms_pipeline_analysis ⇒ `Object`

Returns the value of attribute msms_pipeline_analysis.



174
175
176

# File 'lib/ms/sequest/pepxml.rb', line 174

def msms_pipeline_analysis
  @msms_pipeline_analysis
end

#pepxml_version ⇒ `Object`

Returns the value of attribute pepxml_version.



174
175
176

# File 'lib/ms/sequest/pepxml.rb', line 174

def pepxml_version
  @pepxml_version
end

Class Method Details

._prot_num_and_first_prot_by_pep(pep_array) ⇒ `Object`

updates the private attrs _num_prots and _first_prot on bioworks pep objects. Ideally, we’d like these attributes to reside elsewhere, but for memory concerns, this is best for now.

# File 'lib/ms/sequest/pepxml.rb', line 242

def self._prot_num_and_first_prot_by_pep(pep_array)
  pep_array.hash_by(:aaseq).each do |aasq, pep_arr|
    prts = []
    pep_arr.each { |pep| prts.push( *(pep.prots) ) }
    prts.uniq!
    _size = prts.size 
    pep_arr.each do |pep|
      pep._num_prots = _size
      pep._first_prot = prts.first
    end
  end
end

.base_name_noext(file) ⇒ `Object`

given any kind of filename (from windows or whatever) returns the base of the filename with no file extension

# File 'lib/ms/sequest/pepxml.rb', line 769

def self.base_name_noext(file)
  file.gsub!("\\", '/')
  File.basename(file).sub(/\.[\w^\.]+$/, '')
end

.make_base_name(path, filename) ⇒ `Object`

combines filename in a manner consistent with the path

# File 'lib/ms/sequest/pepxml.rb', line 744

def self.make_base_name(path, filename)
  sep = '/'
  if path.split('/').size < path.split("\\").size
    sep = "\\"
  end
  if path.split('').last == sep
    path + File.basename(filename)
  else
    path + sep + File.basename(filename)
  end
end

.new_from_srf(srf, opts = {}) ⇒ `Object`

will dynamically set :ms_model and :ms_mass_analyzer from srf info (ignoring defaults or anything passed in) for LTQ Orbitrap and LCQ Deca XP See SRF::Sequest::PepXML::Default_Options hash for defaults unless given, the out_path will be given as the path of the srf_file srf may be an object or a filename

# File 'lib/ms/sequest/pepxml.rb', line 285

def self.new_from_srf(srf, opts={})
  opts = Default_Options.merge(opts)

  ## read the srf file
  if srf.is_a? String
    srf = SRF.new(srf)
  end

  ## set the outpath
  out_path = opts.delete(:out_path)

  params = srf.params

  ## check to see if we need backup_db
  backup_db_path = opts.delete(:backup_db_path)
  if !File.exist?(params.database) && backup_db_path
    params.database_path = backup_db_path
  end

  #######################################################################
  # PREPARE THE OPTIONS:
  #######################################################################
  ## remove items from the options hash that don't belong to 
  ppxml_version = opts.delete(:pepxml_version)
  out_data_type = opts.delete(:out_data_type)
  out_data = opts.delete(:out_data)

  ## Extract meta info from srf
  bn_noext = base_name_noext(srf.header.raw_filename)
  opts[:ms_model] = srf.header.model
  case opts[:ms_model]
  when /Orbitrap/
    opts[:ms_mass_analyzer] = 'Orbitrap'
  when /LCQ Deca XP/
    opts[:ms_mass_analyzer] = 'Ion Trap'
  end

  ## Create the base name
  full_base_name_no_ext = make_base_name( File.expand_path(out_path), bn_noext)
  opts[:base_name] = full_base_name_no_ext

  ## Create the search summary:
  search_summary_options = {
    :search_database => Sequest::PepXML::SearchDatabase.new(params),
    :base_name => full_base_name_no_ext,
    :out_data_type => out_data_type,
    :out_data => out_data
  }
  modifications_string = srf.header.modifications
  search_summary = Sequest::PepXML::SearchSummary.new( params, modifications_string, search_summary_options)

  # create the sample enzyme from the params object:
  sample_enzyme_obj = 
    if opts[:sample_enzyme]
      opts[:sample_enzyme]
    else
      params.sample_enzyme
    end
  opts[:sample_enzyme] = sample_enzyme_obj

  ## Create the pepxml obj and top level objects
  pepxml_obj = Sequest::PepXML.new(ppxml_version, params) 
  pipeline = Sequest::PepXML::MSMSPipelineAnalysis.new({:date=>nil,:summary_xml=> bn_noext +'.xml'})
  pepxml_obj.msms_pipeline_analysis = pipeline
  pipeline.msms_run_summary = Sequest::PepXML::MSMSRunSummary.new(opts)
  pipeline.msms_run_summary.search_summary = search_summary
  modifications_obj = search_summary.modifications

  ## name some common variables we'll need
  h_plus = pepxml_obj.h_plus
  avg_parent = pepxml_obj.avg_parent


  ## COPY MZXML FILES IF NECESSARY
  if opts[:copy_mzxml]
    mzxml_pathname_noext = File.join(opts[:ms_data], bn_noext)
    to_copy = MS::Converter::MzXML.file_to_mzxml(mzxml_pathname_noext)
    if to_copy
      FileUtils.cp to_copy, out_path
    else
      puts "Couldn't file mzXML file with base: #{mzxml_pathname_noext}"
      puts "Perhaps you need to specifiy the location of the raw data"
      puts "or need an mzXML converter (readw or t2x)"
      exit
    end
  end


  #######################################################################
  # CREATE the spectrum_queries_ar
  #######################################################################
  srf_index = srf.index
  out_files = srf.out_files
  spectrum_queries_arr = Array.new(srf.dta_files.size)
  files_with_hits_index = 0  ## will end up being 1 indexed

  deltacn_orig = opts[:deltacn_orig]
  deltacn_index = 
    if deltacn_orig ; 20
    else 19
    end

  srf.dta_files.each_with_index do |dta_file,dta_i|
    next if out_files[dta_i].num_hits == 0
    files_with_hits_index += 1

    precursor_neutral_mass = dta_file.mh - h_plus

    (start_scan, end_scan, charge) = srf_index[dta_i]
    sq_hash = {
      :spectrum => [bn_noext, start_scan, end_scan, charge].join('.'),
      :start_scan => start_scan,
      :end_scan => end_scan,
      :precursor_neutral_mass => precursor_neutral_mass,
      :assumed_charge => charge.to_i,
      :pepxml_version => ppxml_version,
      :index => files_with_hits_index,
    }

    spectrum_query = Sequest::PepXML::SpectrumQuery.new(sq_hash)


    hits = out_files[dta_i].hits

    search_hits = 
      if opts[:all_hits]
        Array.new(out_files[dta_i].num_hits)  # all hits
      else
        Array.new(1)  # top hit only
      end

    (0...(search_hits.size)).each do |hit_i|
      hit = hits[hit_i]
      # under the modified deltacn schema (like bioworks)
      # Get proper deltacn and deltacnstar
      # under new srf, deltacn is already corrected for what prophet wants,
      # deltacn_orig_updated is how to access the old one
      # Prophet deltacn is not the same as the native Sequest deltacn
      # It is the deltacn of the second best hit!

      ## mass calculations:
      calc_neutral_pep_mass = hit[0] - h_plus


      sequence = hit.sequence

      #  NEED TO MODIFY SPLIT SEQUENCE TO DO MODS!
      ## THIS IS ALL INNER LOOP, so we make every effort at speed here:
      (prevaa, pepseq, nextaa) = SpecID::Pep.prepare_sequence(sequence)
      # 0=mh 1=deltacn_orig 2=sp 3=xcorr 4=id 5=num_other_loci 6=rsp 7=ions_matched 8=ions_total 9=sequence 10=prots 11=deltamass 12=ppm 13=aaseq 14=base_name 15=first_scan 16=last_scan 17=charge 18=srf 19=deltacn 20=deltacn_orig_updated

      sh_hash = {
        :hit_rank => hit_i+1,
        :peptide => pepseq,
        :peptide_prev_aa => prevaa,
        :peptide_next_aa => nextaa,
        :protein => hit[10].first.reference.split(" ").first, 
        :num_tot_proteins => hit[10].size,
        :num_matched_ions => hit[7],
        :tot_num_ions => hit[8],
        :calc_neutral_pep_mass => calc_neutral_pep_mass,
        :massdiff => precursor_neutral_mass - calc_neutral_pep_mass, 
        :num_tol_term => sample_enzyme_obj.num_tol_term(sequence),
        :num_missed_cleavages => sample_enzyme_obj.num_missed_cleavages(pepseq),
        :is_rejected => 0,
        # These are search score attributes:
        :xcorr => hit[3],
        :deltacn => hit[deltacn_index],
        :spscore => hit[2],
        :sprank => hit[6],
        :modification_info => modifications_obj.modification_info(SpecID::Pep.split_sequence(sequence)[1]),
      }
      unless deltacn_orig
        sh_hash[:deltacnstar] = 
          if hits[hit_i+1].nil?  # no next hit? then its deltacnstar == 1
          '1'
          else
          '0'
          end
      end
      search_hits[hit_i] = Sequest::PepXML::SearchHit.new(sh_hash) # there can be multiple hits
    end

    search_result = Sequest::PepXML::SearchResult.new
    search_result.search_hits = search_hits
    spectrum_query.search_results = [search_result]
    spectrum_queries_arr[files_with_hits_index] = spectrum_query
  end
  spectrum_queries_arr.compact!

  pipeline.msms_run_summary.spectrum_queries = spectrum_queries_arr 
  pepxml_obj.base_name = pipeline.msms_run_summary.base_name
  pipeline.msms_run_summary.spectrum_queries =  spectrum_queries_arr 

  pepxml_obj
end

.set_from_bioworks(bioworks_file, opts = {}) ⇒ `Object`

takes an .srg or bioworks.xml file if possible, ensures that an mzXML file is present for each pepxml file :print => true, will print files NOTES: num_tol_term and num_missing_cleavages are both calculated from the sample_enzyme. Thus, a No_Enzyme search may still pass in a :sample_enzyme option to get these calculated.

# File 'lib/ms/sequest/pepxml.rb', line 488

def self.set_from_bioworks(bioworks_file, opts={})
opts = Default_Options.merge(opts)
## Create the out_path directory if necessary

  unless File.exist? opts[:out_path]
    FileUtils.mkpath(opts[:out_path])
  end
  unless File.directory? opts[:out_path]
    abort "#{opts[:out_path]} must be a directory!"
  end

  spec_id = SpecID.new(bioworks_file)
  pepxml_objs = 
    if spec_id.is_a? Bioworks
      abort("must have opts[:params] set!") unless opts[:params]
      set_from_bioworks_xml(bioworks_file, opts[:params], opts)
    elsif spec_id.is_a? SRFGroup
      spec_id.srfs.map do |srf|
        new_from_srf(srf, opts) 
      end
    else
      abort "invalid object"
    end

  if opts[:print]
    pepxml_objs.each do |obj|
      obj.to_pepxml(obj.base_name + ".xml")
    end
  end
  pepxml_objs
end

.set_from_bioworks_xml(bioworks, params, opts = {}) ⇒ `Object`

Takes bioworks 3.2/3.3 xml output (with no filters) Returns a list of PepXML objects params = sequest.params file bioworks = bioworks.xml exported multi-consensus view file pepxml_version = 0 for tpp 1.2.3 pepxml_version = 18 for tpp 2.8.2, 2.8.3, 2.9.2

# File 'lib/ms/sequest/pepxml.rb', line 527

def self.set_from_bioworks_xml(bioworks, params, opts={})
  opts = Default_Options.merge(opts)
  pepxml_version, ms_manufacturer, ms_model, ms_ionization, ms_mass_analyzer, ms_detector, raw_data_type, raw_data, out_data_type, out_data, ms_data, out_path = opts.values_at(:pepxml_version, :ms_manufacturer, :ms_model, :ms_ionization, :ms_mass_analyzer, :ms_detector, :raw_data_type, :raw_data, :out_data_type, :out_data, :ms_data, :out_path)



  unless out_path
    out_path = '.'
  end

  supported_versions = [18]

  unless supported_versions.include?(opts[:pepxml_version]) 
    abort "pepxml_version: #{pepxml_version} not currently supported.  Current support is for versions #{supported_versions.join(', ')}"
  end

  ## Turn params and bioworks_obj into objects if necessary:
  # Params:
  if params.class == Sequest::Params  # OK!
  elsif params.class == String ; params = Sequest::Params.new(params)
  else                         ; abort "Don't recognize #{params} as object or string!"
  end
  # Bioworks:
  if bioworks.class == Bioworks  # OK!
  elsif bioworks.class == String ; bioworks = SpecID.new(bioworks)
  else                           ; abort "Don't recognize #{bioworks} as object or string!"
  end

  sample_enzyme_obj = 
    if opts[:sample_enzyme]
      opts[:sample_enzyme]
    else
      params.sample_enzyme
    end

  #puts "bioworks.peps.size: #{bioworks.peps.size}"; #puts "bioworks.prots.size: #{bioworks.prots.size}"; #puts "Bioworks.version: #{bioworks.version}"

  ## TURN THIS ON IF YOU THINK YOU MIGHT NOT BE GETTING PEPTIDES from
  ## bioworks
  #bioworks.peps.each { |pep| if pep.class != Bioworks::Pep ; puts "trying to pass as pep: "; p pep; abort "NOT a pep!" end }

  ## check to see if we need backup_db

  backup_db_path = opts.delete(:backup_db_path)
  if !File.exist?(params.database) && backup_db_path
    params.database_path = backup_db_path
  end

  ## Start
  split_bio_objs = []

  ## (num_prots_by_pep, prot_by_pep) = 
  #num_prots_by_pep.each do |k,v| puts "k: #{k} v: #{v}\n"; break end ; prot_by_pep.each do |k,v| puts "k: #{k} v: #{v}" ; break end ; abort "HERE"

  modifications_string = bioworks.modifications

  ## Create a hash of spectrum_query arrays by filename (this very big block):
  spectrum_queries_by_base_name = {}
  # Hash by the filenames to split into filenames:
  pepxml_objects = bioworks.peps.hash_by(:base_name).map do |base_name, pep_arr|

    search_summary = Sequest::PepXML::SearchSummary.new(params, modifications_string, {:search_database => Sequest::PepXML::SearchDatabase.new(params), :out_data_type => out_data_type, :out_data => out_data})
    modifications_obj = search_summary.modifications

    pepxml_obj = Sequest::PepXML.new(pepxml_version, params)
    full_base_name_no_ext = self.make_base_name( File.expand_path(out_path), base_name)

    case pepxml_version
    when 18
      pipeline =  Sequest::PepXML::MSMSPipelineAnalysis.new({:date=>nil,:summary_xml=>base_name+'.xml'})
      msms_run_summary = Sequest::PepXML::MSMSRunSummary.new({
        :base_name => full_base_name_no_ext,
        :ms_manufacturer => ms_manufacturer,
        :ms_model => ms_model,
        :ms_ionization => ms_ionization,
        :ms_mass_analyzer => ms_mass_analyzer,
        :ms_detector => ms_detector,
        :raw_data_type => raw_data_type,
        :raw_data => raw_data,
        :sample_enzyme => sample_enzyme_obj, # usually, params.sample_enzyme,
        :search_summary => search_summary,
      }) 
      pipeline.msms_run_summary = msms_run_summary
      pepxml_obj.msms_pipeline_analysis = pipeline
      pepxml_obj.msms_pipeline_analysis.msms_run_summary.search_summary.base_name =  full_base_name_no_ext
      pepxml_obj.base_name = full_base_name_no_ext
      pepxml_obj 
    end

    # Create a hash by pep object containing num_tot_proteins
    # This is only valid if all hits are present (no previous thresholding)
    # Since out2summary only acts on one folder at a time,
    # we should only do it for one folder at a time! (that's why we do this
    # here instead of globally)
    self._prot_num_and_first_prot_by_pep(pep_arr)
    prec_mz_arr = nil
    case x = bioworks.version
    when /3.2/ 
      calc_prec_by = :prec_mz_arr
      # get the precursor_mz array for this filename
      mzxml_file = MS::Converter::MzXML.file_to_mzxml(File.join(ms_data, base_name))
      prec_mz_arr = MS::MSRun.precursor_mz_by_scan_num(mzxml_file)
    when /3.3/
      calc_prec_by = :deltamass
    else
      abort "invalid BioworksBrowser version: #{x}"
    end

    if opts[:copy_mzxml]
      to_copy = MS::Converter::MzXML.file_to_mzxml(File.join(ms_data, base_name))
      if to_copy
        FileUtils.cp to_copy, out_path
      end
    end


    spectrum_queries_ar = pep_arr.hash_by(:first_scan, :last_scan, :charge).map do |key,arr|


      # Sort_by_rank and take the top hit (to mimick out2summary):

      arr = arr.sort_by {|pep| pep.xcorr.to_f } # ascending
      top_pep = arr.pop
      second_hit = arr.last # needed for deltacnstar


      case calc_prec_by
      when :prec_mz_arr
        precursor_neutral_mass = Sequest::PepXML::SpectrumQuery.calc_precursor_neutral_mass(calc_prec_by, top_pep.first_scan.to_i, top_pep.last_scan.to_i, prec_mz_arr, top_pep.charge, pepxml_obj.avg_parent)
      when :deltamass
        precursor_neutral_mass = Sequest::PepXML::SpectrumQuery.calc_precursor_neutral_mass(calc_prec_by, top_pep.mass.to_f, top_pep.deltamass.to_f, pepxml_obj.avg_parent)
      end

      calc_neutral_pep_mass = (top_pep.mass.to_f - pepxml_obj.h_plus)

      # deltacn & star:
      # (NOTE: OLD?? out2summary wants the deltacn of the 2nd best hit.)
      if second_hit 
        #top_pep.deltacn = second_hit.deltacn 
        deltacnstar = '0'
      else 
        top_pep.deltacn = '1.0'
        deltacnstar = '1'
      end
      # Create the nested structure of queries{results{hits}}
      # (Ruby's blocks work beautifully for things like this)
      spec_query = Sequest::PepXML::SpectrumQuery.new({
        :spectrum => [top_pep.base_name, top_pep.first_scan, top_pep.last_scan, top_pep.charge].join("."),
        :start_scan => top_pep.first_scan,
        :end_scan => top_pep.last_scan,
        :precursor_neutral_mass => precursor_neutral_mass,
        :assumed_charge => top_pep.charge,
        :pepxml_version => pepxml_version,
      }) 


      search_result = Sequest::PepXML::SearchResult.new 
      #puts "set MASSDIFF: "
      #p precursor_neutral_mass - calc_neutral_pep_mass
      ## Calculate some interdependent values;
      # NOTE: the bioworks mass is reallyf M+H if two or more scans went
      # into the search_hit; calc_neutral_pep_mass is simply the avg of
      # precursor masses adjusted to be neutral
      (prevaa, pepseq, nextaa) = SpecID::Pep.prepare_sequence(top_pep.sequence)
      (num_matched_ions, tot_num_ions) = Sequest::PepXML::SearchHit.split_ions(top_pep.ions)
      search_hit = Sequest::PepXML::SearchHit.new({
        :hit_rank => 1,
        :peptide => pepseq,
        :peptide_prev_aa => prevaa,
        :peptide_next_aa => nextaa,
        :protein => top_pep._first_prot.reference.split(" ").first, 
        :num_tot_proteins => top_pep._num_prots,
        :num_matched_ions => num_matched_ions,
        :tot_num_ions => tot_num_ions,
        :calc_neutral_pep_mass => calc_neutral_pep_mass,
        :massdiff => precursor_neutral_mass - calc_neutral_pep_mass,
        :num_tol_term => sample_enzyme_obj.num_tol_term(top_pep.sequence),
        :num_missed_cleavages => sample_enzyme_obj.num_missed_cleavages(pepseq),
        :is_rejected => 0,
        # These are search score attributes:
        :xcorr => top_pep.xcorr,
        :deltacn => top_pep.deltacn,
        :deltacnstar => deltacnstar,
        :spscore => top_pep.sp,
        :sprank => top_pep.rsp,
        :modification_info => modifications_obj.modification_info(SpecID::Pep.split_sequence(top_pep.sequence)[1]),
        :spectrum_query => spec_query,
      })
      search_result.search_hits = [search_hit] # there can be multiple search hits
      spec_query.search_results = [search_result]  # can be multiple search_results
      spec_query
    end

    # create an index by spectrum as results end up typically in out2summary
    # (I really dislike this order, however)
    spectrum_queries_ar = spectrum_queries_ar.sort_by {|pep| pep.spectrum }
    spectrum_queries_ar.each_with_index {|res,index| res.index = "#{index + 1}" }
    pipeline.msms_run_summary.spectrum_queries = spectrum_queries_ar
    pepxml_obj
  end ## collects pepxml_objs
  # summary_xml is the short basename of the pepxml file (e.g., "020.xml")
  pepxml_objects.sort_by {|obj| obj.summary_xml }
end

Instance Method Details

#date ⇒ `Object`



213
214
215

# File 'lib/ms/sequest/pepxml.rb', line 213

def date
  Time.new.to_s
end

#doctype ⇒ `Object`

for pepxml_version == 0



222
223
224

# File 'lib/ms/sequest/pepxml.rb', line 222

def doctype
  '<!DOCTYPE msms_pipeline_analysis SYSTEM "/usr/bin/msms_analysis3.dtd">' + "\n"
end

#fragment_mass_type ⇒ `Object`



739
740
741

# File 'lib/ms/sequest/pepxml.rb', line 739

def fragment_mass_type
  @params.fragment_mass_type
end

#header ⇒ `Object`

# File 'lib/ms/sequest/pepxml.rb', line 233

def header
  case self.class.pepxml_version
  when 18 ; xml_version + style_sheet
  end
end

#precursor_mass_type ⇒ `Object`



735
736
737

# File 'lib/ms/sequest/pepxml.rb', line 735

def precursor_mass_type
  @params.precursor_mass_type
end

#set_mono_or_avg(sequest_params_obj) ⇒ `Object`

sets @h_plus and @avg_parent from the sequest params object

# File 'lib/ms/sequest/pepxml.rb', line 201

def set_mono_or_avg(sequest_params_obj)
  case sequest_params_obj.precursor_mass_type
  when "monoisotopic" ; @avg_parent = false
  else ; @avg_parent = true
  end

  case @avg_parent
  when true ; @h_plus = SpecID::AVG[:h_plus]
  when false ; @h_plus = SpecID::MONO[:h_plus]
  end
end

#spectrum_queries ⇒ `Object`

returns an array of spectrum queries



183
184
185

# File 'lib/ms/sequest/pepxml.rb', line 183

def spectrum_queries
  msms_pipeline_analysis.msms_run_summary.spectrum_queries
end

#style_sheet ⇒ `Object`

# File 'lib/ms/sequest/pepxml.rb', line 226

def style_sheet
  case self.class.pepxml_version
  when 18
  '<?xml-stylesheet type="text/xsl" href="/tools/bin/TPP/tpp/schema/pepXML_std.xsl"?>'
  end
end

#summary_xml ⇒ `Object`



731
732
733

# File 'lib/ms/sequest/pepxml.rb', line 731

def summary_xml
  base_name + ".xml"
end

#to_pepxml(file = nil) ⇒ `Object`

outputs pepxml, (to file if given)

# File 'lib/ms/sequest/pepxml.rb', line 757

def to_pepxml(file=nil)
  string = header
  string << @msms_pipeline_analysis.to_pepxml

  if file
    File.open(file, "w") do |fh| fh.print string end
  end
  string
end

#xml_version ⇒ `Object`



217
218
219

# File 'lib/ms/sequest/pepxml.rb', line 217

def xml_version 
  '<?xml version="1.0" encoding="UTF-8"?>' + "\n"
end

Class: Sequest::PepXML

Defined Under Namespace

Constant Summary collapse

Class Attribute Summary collapse

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(pepxml_version = DEF_VERSION, sequest_params_obj = nil) ⇒ PepXML

Class Attribute Details

.pepxml_version ⇒ Object

Instance Attribute Details

#avg_parent ⇒ Object

#base_name ⇒ Object

#h_plus ⇒ Object

#msms_pipeline_analysis ⇒ Object

#pepxml_version ⇒ Object

Class Method Details

._prot_num_and_first_prot_by_pep(pep_array) ⇒ Object

.base_name_noext(file) ⇒ Object

.make_base_name(path, filename) ⇒ Object

.new_from_srf(srf, opts = {}) ⇒ Object

.set_from_bioworks(bioworks_file, opts = {}) ⇒ Object

.set_from_bioworks_xml(bioworks, params, opts = {}) ⇒ Object

Instance Method Details

#date ⇒ Object

#doctype ⇒ Object

#fragment_mass_type ⇒ Object

#header ⇒ Object

#precursor_mass_type ⇒ Object

#set_mono_or_avg(sequest_params_obj) ⇒ Object

#spectrum_queries ⇒ Object

#style_sheet ⇒ Object

#summary_xml ⇒ Object

#to_pepxml(file = nil) ⇒ Object

#xml_version ⇒ Object