Class: SpecID::Filter
- Inherits:
-
Object
- Object
- SpecID::Filter
- Defined in:
- lib/spec_id/filter.rb
Constant Summary collapse
- NUM_PROT_FPPR_ITERATIONS =
10
Class Method Summary collapse
Instance Method Summary collapse
-
#calculate_pep_fppr(pephits_ar, methods, args, false_pephits_ar = nil) ⇒ Object
methods should be passed in like this ‘cysteine’ for cysteine_fppr all methods should return [number_false, fppr] returns a hash (by method) for each set of pephits if :dcy is given as a method, then expects the false pephits array.
- #combined_score(filter_args) ⇒ Object
-
#cys_fppr(pephits, cys_bg_freq, cys_containing_freq) ⇒ Object
returns [#FP, FPPR].
-
#dcy_fppr(pephits, false_pephits) ⇒ Object
returns [#FP, FPPR].
- #file_to_prefiltered_spec_id(file, opt) ⇒ Object
- #filter_legend(fppr_methods) ⇒ Object
- #filter_params_string(filter_args, fppr_methods) ⇒ Object
-
#filter_round(spec_ids, filter_args, args) ⇒ Object
fpr is a SpecID obj that is the false positives cysteines holds an aafreqs object or nil.
- #filter_spec_id(spec_id, filter_args, args) ⇒ Object
-
#fppr_by_cysteines(ac_num_with_cys, exp_num_with_cys, total_peptides, mean_fraction_true_cys = nil, std_fraction_true_cys = nil) ⇒ Object
(actual # with cys, expected # with cys, total#peptides, mean_fraction_of_cysteines_true, std) PepHit© = Peptide containing cysteine # Total PepHit© # Observed Bad Pep © —————— proportional_to ———————- # Total PepHit # Total Bad PepHit (X) returns the fppr and the total number false.
-
#fraction_false_by_cysteines(pephits, cys_bg_freq, cys_containing_freq) ⇒ Object
returns [total_number_false, fppr, fraction_expected] also takes a hash of pephits keyed on :aaseq.
-
#fraction_false_by_true_pos(pephits, true_pos_aaseqs_ar) ⇒ Object
returns [total_number_false, fppr] pephits can be an array or a hash of peptides keyed on :aaseq.
-
#get_cys_freq(arg) ⇒ Object
takes a fasta file or a string ( to be cast as a float ).
-
#get_options(argv) ⇒ Object
if good arguments, returns [files_array, options] else prints an error argument and returns nil.
- #interactive_help ⇒ Object
- #out(string) ⇒ Object
-
#prep_reply(reply, base) ⇒ Object
assumes its already chomped updates the 5 globals.
-
#protein_fppr(num_peps_per_protein, number_false_peptides, num_iterations = 10) ⇒ Object
num_peps_per_protein is an array of the number of peptides per protein hit (these are the true hits) assumes that the number follows a gaussian distribution (binomial distributions tend toward gaussians, I believe, at large N) returns [mean_num_wrong, mean_fppr, stdev_num_wrong, stdev_fppr] fppr.
- #report_cysteines ⇒ Object
- #run_from_argv(argv) ⇒ Object
-
#short(num) ⇒ Object
prints shortened number for display.
- #tmm_fppr(pephits) ⇒ Object
- #to_table(spec_id, args, normal_results, fppr_results, groups_reporting, fppr_methods, cat_labels) ⇒ Object
- #tps_fppr(pephits, true_pos_aaseqs_ar) ⇒ Object
Class Method Details
.run_from_argv(argv) ⇒ Object
156 157 158 159 |
# File 'lib/spec_id/filter.rb', line 156 def self.run_from_argv(argv) obj = self.new obj.run_from_argv(argv) end |
Instance Method Details
#calculate_pep_fppr(pephits_ar, methods, args, false_pephits_ar = nil) ⇒ Object
methods should be passed in like this ‘cysteine’ for cysteine_fppr all methods should return [number_false, fppr] returns a hash (by method) for each set of pephits if :dcy is given as a method, then expects the false pephits array
575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 |
# File 'lib/spec_id/filter.rb', line 575 def calculate_pep_fppr(pephits_ar, methods, args, false_pephits_ar=nil) cnt = 0 pephits_ar.map do |ph| hash = {} methods.each do |mth| case mth when :dcy hash[mth.to_sym] = send("#{mth}_fppr".to_sym, ph, false_pephits_ar[cnt]) when :cys hash[mth.to_sym] = send("#{mth}_fppr".to_sym, ph, *(args[:cys]) ) when :tps hash[mth.to_sym] = send("#{mth}_fppr".to_sym, ph, (args[:tps]) ) else hash[mth.to_sym] = send("#{mth}_fppr".to_sym, ph) end end cnt += 1 hash end end |
#combined_score(filter_args) ⇒ Object
709 710 711 712 |
# File 'lib/spec_id/filter.rb', line 709 def combined_score(filter_args) (x1, x2, x3, deltacn, ppm) = filter_args combined_score = x1 + x2 + x3 + 20.0*deltacn + 4000.0*(1.0/ppm) end |
#cys_fppr(pephits, cys_bg_freq, cys_containing_freq) ⇒ Object
returns [#FP, FPPR]
562 563 564 565 |
# File 'lib/spec_id/filter.rb', line 562 def cys_fppr(pephits, cys_bg_freq, cys_containing_freq) (total_num_false, cys_fprate, fraction_of_expected) = fraction_false_by_cysteines(pephits, cys_bg_freq, cys_containing_freq) [total_num_false, cys_fprate] end |
#dcy_fppr(pephits, false_pephits) ⇒ Object
returns [#FP, FPPR]
552 553 554 555 |
# File 'lib/spec_id/filter.rb', line 552 def dcy_fppr(pephits, false_pephits) fps = false_pephits.size [fps, fps.to_f/pephits.size] end |
#file_to_prefiltered_spec_id(file, opt) ⇒ Object
757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 |
# File 'lib/spec_id/filter.rb', line 757 def file_to_prefiltered_spec_id(file, opt) spec_id = nil marshal_file = file + ".prefiltered.msh" if File.exist?(marshal_file) File.open(marshal_file) do |fh| spec_id = Marshal.load(fh) end else spec_id = SpecID.new(file) spec_id.passed_in_filename = file spec_id.top_peps_prefilter! ## marshal it! if opt.marshal File.open(marshal_file, "w") do |fh| Marshal.dump(spec_id,fh) end end end spec_id end |
#filter_legend(fppr_methods) ⇒ Object
494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 |
# File 'lib/spec_id/filter.rb', line 494 def filter_legend(fppr_methods) lines = [] lines << "Note: protein FPPR values are probably optimistic" lines << "[this implementation assumes an equal likelihood that a false peptide" lines << " comes from a protein with more hits as one with less (which is probably" lines << " not the case)]" lines << "* = deltacn_star = peptides with deltacn > 1.0 (no sibling hits)" if fppr_methods.size > 0 lines << "Following are methods for determining false identification rate:" lines << ['dcy=decoy', 'cys=cysteine', 'tps=known_true_positives'].join(" ") ## when tmm is implemented: #lines << ['dcy=decoy', 'cys=cysteine', 'tmm=transmembrane', 'tps=known_true_positives'].join(" ") end lines.join("\n") end |
#filter_params_string(filter_args, fppr_methods) ⇒ Object
678 679 680 681 682 683 684 685 686 687 688 |
# File 'lib/spec_id/filter.rb', line 678 def filter_params_string(filter_args, fppr_methods) (x1, x2, x3, deltacn, ppm) = filter_args st = [] st << "==========================================================================" st << " xcorr(1,2,3) >= #{x1},#{x2},#{x3} || deltacn >= #{deltacn} || ppm <= #{ppm} " st << '' st.join("\n") #st = [] #st << ["xcorr(1,2,3) >= #{x1},#{x2},#{x3}", "deltacn >= #{deltacn}", "ppm <= #{ppm}"].join("\t") #st end |
#filter_round(spec_ids, filter_args, args) ⇒ Object
fpr is a SpecID obj that is the false positives cysteines holds an aafreqs object or nil
598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 |
# File 'lib/spec_id/filter.rb', line 598 def filter_round(spec_ids, filter_args, args) # push fpr on the end for the calculations ## FILTER the NORMAL spec_id objects little_tables = [] spec_ids.each_with_index do |spec_id, i| normal_results = filter_spec_id(spec_id, filter_args, args) ## FILTER the FALSE objects (if given) false_results = if args[:dcy] little_args_hash = args.dup false_results = filter_spec_id(args[:dcy][i], filter_args, little_args_hash) end ## HOW TO CALCULATE FPPR FOR EVERYTHING: # pephits Fpephits C/Tpephits TPpephits # uniqaa Funiqaa C/Tuniqaa TPuniqaa # prothits ProtFPR(Fpephits, prothits) ProtFPR(C/Tpephits, prothits) ProtFPR(total-TPpephits, prothits) # OccProthits ProtFPR(Funiqaa, OccProthits) ProtFPR(C/Tuniqaa, OccProthits) ProtFPR(total-TPuniqaa, OccProthits) # C/T = cystein or Transmembrane method ## set up false results array if args[:dcy] fr_ar = [false_results[:pephits], false_results[:aaseq]] else fr_ar = nil end (pephits_fppr_results, aaseq_fppr_results) = calculate_pep_fppr([normal_results[:pephits], normal_results[:aaseq]], @fppr_methods, args, fr_ar) ## NORMAL prothits ## update prothits peptides updated_proteins = SpecID.passing_proteins(normal_results[:pephits], :update) pep_cnt_arr = updated_proteins.map {|v| v.peps.size } ## update occams prothits if args[:occams_razor] updated_occams_protein_triplets = SpecID::occams_razor(updated_proteins, true) occams_pep_cnt_arr = updated_occams_protein_triplets.map {|v| v[1].size } occams_prots = updated_occams_protein_triplets.map {|v| v[0] } normal_results[:occams_razor] = occams_prots end ## note that the original prot.peps arrays are obliterated by this. ## we would need to re-update if someone wanted these prothits_fppr_results = {} occams_results = {} @fppr_methods.each do |mth| prothits_fppr_results[mth] = protein_fppr(pep_cnt_arr, pephits_fppr_results[mth].first.ceil.to_i, NUM_PROT_FPPR_ITERATIONS) occams_results[mth] = protein_fppr(occams_pep_cnt_arr, aaseq_fppr_results[mth].first.ceil.to_i, NUM_PROT_FPPR_ITERATIONS) if args[:occams_razor] end fppr_results = { :pephits => pephits_fppr_results, :aaseq => aaseq_fppr_results, :prothits => prothits_fppr_results, } fppr_results[:occams_razor] = occams_results if args[:occams_razor] ## CHANGE ALL RESULTS INTO PERCENTAGES: fppr_results.each do |bk,hash| hash.each do |k,val| hash[k][1] = 100.0 * val[1] end end little_tables[i] = to_table( spec_id, args, normal_results, fppr_results, @groups_reporting, @fppr_methods, @cat_labels) end out filter_params_string(filter_args, @fppr_methods) little_tables.each do |tbl| out tbl.to_formatted_string(nil, ' ') out "-----------------------------------------------\n" end #big_table(spec_ids, filter_args, args, normal_results, groups_reporting, fppr_results, cat_labels) end |
#filter_spec_id(spec_id, filter_args, args) ⇒ Object
537 538 539 540 541 542 543 544 545 546 547 548 549 |
# File 'lib/spec_id/filter.rb', line 537 def filter_spec_id(spec_id, filter_args, args) results_hash = {} # that second argument is to update protein peptides pephits = spec_id.filter_sequest(filter_args) results_hash[:prothits] = SpecID.passing_proteins(pephits, :no_update) results_hash[:pephits] = pephits results_hash[:dcn_cnt] = pephits.select{|v| v.deltacn > 1.0}.size # be aware that this is a hash keyed by aaseq and values of arrays of # peptides sharing the same aaseq! results_hash[:aaseq] = pephits.hash_by(:aaseq) results_hash end |
#fppr_by_cysteines(ac_num_with_cys, exp_num_with_cys, total_peptides, mean_fraction_true_cys = nil, std_fraction_true_cys = nil) ⇒ Object
(actual # with cys, expected # with cys, total#peptides, mean_fraction_of_cysteines_true, std) PepHit© = Peptide containing cysteine
# Total PepHit(C) # Observed Bad Pep (C)
------------------ proportional_to ----------------------
# Total PepHit # Total Bad PepHit (X)
returns the fppr and the total number false
378 379 380 381 382 383 384 385 386 387 388 |
# File 'lib/spec_id/filter.rb', line 378 def fppr_by_cysteines(ac_num_with_cys, exp_num_with_cys, total_peptides, mean_fraction_true_cys=nil, std_fraction_true_cys=nil) # the number of bona fide BAD cysteine hits # (some of the cysteine hits (~5%) are true positives) ac_num_with_cys -= exp_num_with_cys * mean_fraction_true_cys if mean_fraction_true_cys if ac_num_with_cys < 0.0 ; ac_num_with_cys = 0.0 end total_number_false = (ac_num_with_cys * total_peptides).to_f/exp_num_with_cys fppr = total_number_false / total_peptides [fppr, total_number_false] end |
#fraction_false_by_cysteines(pephits, cys_bg_freq, cys_containing_freq) ⇒ Object
returns [total_number_false, fppr, fraction_expected] also takes a hash of pephits keyed on :aaseq
464 465 466 467 468 469 470 |
# File 'lib/spec_id/filter.rb', line 464 def fraction_false_by_cysteines(pephits, cys_bg_freq, cys_containing_freq) (ac, exp) = SpecID::AAFreqs.new.actual_and_expected_number_containing_cysteines(pephits, cys_bg_freq) fraction_of_expected = ac.to_f/exp (cys_fprate, total_num_false) = fppr_by_cysteines(ac, exp, pephits.size, cys_containing_freq) [total_num_false, cys_fprate, fraction_of_expected] end |
#fraction_false_by_true_pos(pephits, true_pos_aaseqs_ar) ⇒ Object
returns [total_number_false, fppr] pephits can be an array or a hash of peptides keyed on :aaseq
515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 |
# File 'lib/spec_id/filter.rb', line 515 def fraction_false_by_true_pos(pephits, true_pos_aaseqs_ar) if pephits.is_a? Hash seqs = pephits.keys else seqs = pephits.map do |v| v.aaseq end end real_tps = 0 real_fps = 0 # could also do with partition seqs.each do |pep_aaseq| if true_pos_aaseqs_ar.any? {|prot_aaseq| prot_aaseq.include? pep_aaseq} real_tps += 1 else real_fps += 1 end end real_fppr = real_fps.to_f/pephits.size [real_fps, real_fppr] end |
#get_cys_freq(arg) ⇒ Object
takes a fasta file or a string ( to be cast as a float )
301 302 303 304 305 306 307 |
# File 'lib/spec_id/filter.rb', line 301 def get_cys_freq(arg) if File.exist? arg SpecID::AAFreqs.new(arg).aafreqs[:C] else arg.to_f end end |
#get_options(argv) ⇒ Object
if good arguments, returns [files_array, options] else prints an error argument and returns nil
316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 |
# File 'lib/spec_id/filter.rb', line 316 def (argv) dup_argv = argv.dup opt = OpenStruct.new opt.x1 = 1.0 opt.x2 = 1.5 opt.x3 = 2.0 opt.c = 0.1 opt.ppm = 1000.0 opt.false = false opts = OptionParser.new do |op| op. = "usage: #{File.basename(__FILE__)} [OPTS] <bioworks.xml | bioworks.srg>" op.separator("prints number of peptides/proteins ID'd at given thresholds") op.separator "only top hit (by xcorr) per scan+charge is considered" #op.separator("** 'dcn*' is the number of peptides with deltacn == 1.1") #op.separator(" (these are peptides who are the only hit with xcorr > 0)") op.separator "" op.on("-1", "--xcorr1 N", Float, "xcorr for +1 charge d: #{opt.x1}") {|v| opt.x1 = v} op.on("-2", "--xcorr2 N", Float, "xcorr for +2 charge d: #{opt.x2}") {|v| opt.x2 = v} op.on("-3", "--xcorr3 N", Float, "xcorr for +3 charge d: #{opt.x3}") {|v| opt.x3 = v} op.on("-c", "--deltacn N", Float, ">= deltacn d: #{opt.c}") {|v| opt.c = v} op.on("-p", "--ppm N", Float, "<= ppm d: #{opt.ppm}") {|v| opt.ppm = v} op.separator " if bioworks.xml, = 10^6deltamass/mass" op.on("-i", "--interactive", "interactive filtering") {|v| opt.i = v} op.on("-f", "--false a,b,c", Array, "prot prefixes or filenames of decoys") {|v| opt.false = v} op.separator(" last given will apply to remaining files") op.on("-y", "--cys <fasta_file|freq,[bkg]>", Array, "report fpr by expected cysteine freq") do |v| v[0] = get_cys_freq(v[0]) opt.cys = v end op.separator(" freq = freq of cysteine as amino acid") op.separator(" [bkg] = freq of cys containing peps d: 0.0") op.on("--filters_file <file>", "(no -i) file with list of interactive input") {|v| opt.filters_file = v} op.on("-t", "--tps <fasta>", "fasta file containing true hits") {|v| opt.tps = v } #op.on("--tmm <toppred.out>", "toppred.out file with transmembr. topology") {|v| opt.tps = v } op.on("--yaml", "spits out yaml-ized data") {|v| opt.tabulate = v } op.on("--combined_score", "shows the combined score") {|v| opt.combined_score = v } op.on("--marshal", "will write marshaled data or read existing") {|v| opt.marshal = v } op.on("--log <file>", "also writes all output to file") {|v| opt.log = v } op.on("--protein_summary", "writes passing proteins to .summary.html files") {|v| opt.protein_summary = v } op.on("-z", "--occams_razor", "will show minimal set of proteins") {|v| opt.occams_razor = v } end opts.parse!(dup_argv) if dup_argv.size < 1 puts opts return nil end [dup_argv, opt] end |
#interactive_help ⇒ Object
778 779 780 781 782 783 784 785 786 787 788 789 790 791 |
# File 'lib/spec_id/filter.rb', line 778 def interactive_help string = [] string << "********************************************************" string << "INTERACTIVE FILTERING HELP:" string << "enter: <x1> <x2> <x3> <dcn> <ppm>" string << "or : x1:<x1> x2:<x2> x3:<x3> dcn:<dcn> ppm:<ppm>" string << "or : dcn:<dcn>" string << "or : <x1> <x2> ppm:<ppm>" string << "etc..." string << "<enter> to (re)run current values" string << "'q' to quit" string << "********************************************************" string.join("\n") end |
#out(string) ⇒ Object
293 294 295 296 297 298 |
# File 'lib/spec_id/filter.rb', line 293 def out(string) puts string if @logfh @logfh.puts string end end |
#prep_reply(reply, base) ⇒ Object
assumes its already chomped updates the 5 globals
716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 |
# File 'lib/spec_id/filter.rb', line 716 def prep_reply(reply, base) if reply == 'q' ; exit ; end if reply =~ /^\s*$/ base elsif reply arr = reply.split(/\s+/) to_change = [] to_change_hash = {} arr.each do |it| if it.include? ':' (k,v) = it.split(':') to_change_hash[k] = v else to_change << it end end to_change.each_with_index do |tc,i| begin base[i] = tc.to_f rescue NoMethodError out "BAD ARG: #{tc}" return false end end to_change_hash.each do |k,v| case k when 'x1' ; base[0] = v when 'x2' ; base[1] = v when 'x3' ; base[2] = v when 'dcn' ; base[3] = v when 'ppm' ; base[4] = v else out "BAD ARG: #{k}:#{v}" end end base.map {|v| v.to_f } else false end end |
#protein_fppr(num_peps_per_protein, number_false_peptides, num_iterations = 10) ⇒ Object
num_peps_per_protein is an array of the number of peptides per protein hit (these are the true hits) assumes that the number follows a gaussian distribution (binomial distributions tend toward gaussians, I believe, at large N) returns [mean_num_wrong, mean_fppr, stdev_num_wrong, stdev_fppr] fppr
395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 |
# File 'lib/spec_id/filter.rb', line 395 def protein_fppr( num_peps_per_protein, number_false_peptides, num_iterations=10) ## Check for more false peptides than peptides in our proteins: total_protein_peps = 0 contained = num_peps_per_protein.each do |num| total_protein_peps += num end ## All peptides will be wrong every time! ## which means all proteins will be wrong every time! if number_false_peptides >= total_protein_peps # [all proteins wrong, fppr=1.0 return [num_peps_per_protein.size, 1.0, 0.0, 0.0] end num_prots = num_peps_per_protein.size sample = VecD.new(num_iterations) # indexed by peptide_number, pointing to a protein's peptide_count # we shuffle the indices and then walk along until we are finished # then we count how many proteins still have peptides # we create an array to hold the peptide number for each protein, then we # can reference the same entity when subtracting the peptides in the # algorithm cont_pep_num_per_prot_ars = (0...num_iterations).map do |i| total_protein_peps = 0 contained = num_peps_per_protein.map do |num| [num] end end cont_num_by_pep_index_ars = cont_pep_num_per_prot_ars.map do |ar| index_count = 0 pc_ar = [] ar.each do |contained_num| contained_num.first.times do pc_ar[index_count] = contained_num index_count += 1 end end pc_ar end indices = (0...(cont_num_by_pep_index_ars.first.size)).map {|x| x } (0...num_iterations).each do |i| num_false = 0 indices.shuffle! pc = cont_num_by_pep_index_ars[i] number_false_peptides.times do |shuffle_index| #big_i = indices[shuffle_index] pc[indices[shuffle_index]][0] -= 1 end cont_pep_num_per_prot_ars[i].each do |contained_pep_count| if contained_pep_count.first == 0 num_false += 1 end end sample[i] = num_false end (mean_num_wrong, stdev) = sample.sample_stats mean_fppr = mean_num_wrong / num_prots stdev_fppr = stdev / num_prots [mean_num_wrong, mean_fppr, stdev, stdev_fppr] end |
#report_cysteines ⇒ Object
472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 |
# File 'lib/spec_id/filter.rb', line 472 def report_cysteines #### UNDERWAY::: cys_tps = pep_nums[i] - total_num_false puts "CYSTEINE FPR: " puts " (# peps containing >= 1 cysteines)" puts " actual: #{ac}" puts "fraction of expected: #{short(fraction_of_expected)}" puts " expected # FP's: " + short(total_num_false) puts " estimated FPR: " + short( 100.0*cys_fprate ) + " % " puts "combined_score = x1 + x2 + x3 + 20.0*deltacn + 4000.0*(1.0/ppm)" puts "Combined Score & FPR" puts "#{combined_score}\t#{cys_fprate}" puts "Combined Score & fraction of expected" #puts "#{combined_score} #{fraction_of_expected}" to_write_cys_find = ["WRITE_CYS_FIND:", combined_score, fraction_of_expected] puts to_write_cys_find.join("\t") if WRITE_CYS_FIND puts(['TABULATE:', combined_score, pep_tps, pep_fpr, cys_tps, cys_fprate, '', x1, x2, x3, deltacn, ppm].join("\t")) if opt.tabulate end |
#run_from_argv(argv) ⇒ Object
161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 |
# File 'lib/spec_id/filter.rb', line 161 def run_from_argv(argv) reply = (argv) return unless reply files, opt = reply #files = ARGV.map {|file| file } #ARGV.clear $stderr.puts "reading files (can take a minute or two for large files)..." if $VERBOSE spec_ids = files.map do |file| spec_id = file_to_prefiltered_spec_id(file, opt) spec_id end ## the options hash hash = {} if opt.cys if opt.cys[1] opt.cys[1] = opt.cys[1].to_f else opt.cys[1] = 0.0 end hash[:cys] = opt.cys end hash[:tps] = if opt.tps Fasta.new.read_file(opt.tps).prots.map do |prot| prot.aaseq.chomp end end hash[:dcy] = if opt.false new_spec_ids = [] prefixes_or_files = SpecID.extend_args(opt.false, files.size) false_spec_ids = spec_ids.zip(prefixes_or_files).map do |spec_id, prefix_or_file| if File.exist? prefix_or_file new_spec_ids << spec_id file_to_prefiltered_spec_id(prefix_or_file, opt) else (tps, fps) = spec_id.classify_by_prefix(:peps, prefix_or_file) fps_specid = spec_id.class.new tps_specid = spec_id.class.new fps_specid.peps = fps tps_specid.peps = tps new_spec_ids << tps_specid fps_specid end end spec_ids = new_spec_ids false_spec_ids end defaults = { :dcy => nil, # { spec_id => false_spec_id } :cys => nil, # [cys_background_freq, cys_containing_freq] :tps => nil, :tmm => nil, :occams_razor => opt.occams_razor, } args = defaults.merge hash base_args = [opt.x1, opt.x2, opt.x3, opt.c, opt.ppm] #################################################### <-- @fppr_methods = [:tmm, :tps, :cys, :dcy].select do |x| args[x] end @groups_reporting = [:pephits, :aaseq, :prothits] @groups_reporting.push( :occams_razor ) if args[:occams_razor] @cat_labels = { :pephits => 'pep_hits', :prothits => 'prot_hits', :aaseq => 'uniq_aa_hits', :occams_razor => 'occams_prot_hits', } #################################################### <-- if opt.log @logfh = File.open(opt.log, 'w') else @logfh = nil end ######################################### # PRINT FILTER LEGEND out filter_legend(@fppr_methods) ######################################### if opt.filters_file lines = IO.readlines(opt.filters_file) lines.each do |line| line.chomp! answer = prep_reply(line, base_args) next if answer == false base_args = answer filter_round(spec_ids, base_args, args) end elsif opt.i ## CLEAR ARGV (since otherwise, gets reads it!) ARGV.clear out interactive_help reply = "nil" loop do b = base_args out "#{b[0]} #{b[1]} #{b[2]} dcn:#{b[3]} ppm:#{b[4]}" loop do reply = gets.chomp answer = prep_reply(reply, base_args) if answer == false out interactive_help else base_args = answer filter_round(spec_ids, base_args, args) break end end end else filter_round(spec_ids, base_args, args) end if opt.log @logfh.close end end |
#short(num) ⇒ Object
prints shortened number for display
310 311 312 |
# File 'lib/spec_id/filter.rb', line 310 def short(num) sprintf( "%.3f",num) end |
#tmm_fppr(pephits) ⇒ Object
557 558 559 |
# File 'lib/spec_id/filter.rb', line 557 def tmm_fppr(pephits) abort "NEED TO IMPLEMENT" end |
#to_table(spec_id, args, normal_results, fppr_results, groups_reporting, fppr_methods, cat_labels) ⇒ Object
690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 |
# File 'lib/spec_id/filter.rb', line 690 def to_table(spec_id, args, normal_results, fppr_results, groups_reporting, fppr_methods, cat_labels) #table is in the form: { column heading => [ values ] } title = spec_id.passed_in_filename col_labels = ['num', *(fppr_methods.map{|v| "#{v}%" })] row_labels = groups_reporting.map {|grp| cat_labels[grp]} dt = groups_reporting.map do |grp| line = [normal_results[grp].size] fppr_methods.each do |mth| line << fppr_results[grp][mth][1] end line end Table.new(dt, row_labels, col_labels, title) #puts(['TABULATE:', combined_score, pep_tps, pep_fppr, real_tps, real_fppr, '', x1, x2, x3, deltacn, ppm].join("\t")) if opt.tabulate end |
#tps_fppr(pephits, true_pos_aaseqs_ar) ⇒ Object
567 568 569 |
# File 'lib/spec_id/filter.rb', line 567 def tps_fppr(pephits, true_pos_aaseqs_ar) fraction_false_by_true_pos(pephits, true_pos_aaseqs_ar) end |