Class: BioDSL::MaskSeq
- Inherits:
-
Object
- Object
- BioDSL::MaskSeq
- Defined in:
- lib/BioDSL/commands/mask_seq.rb
Overview
Mask sequences in the stream based on quality scores.
mask_seq
masks sequences in the stream using either hard masking or soft masking (default). Hard masking is replacing residues with corresponding quality score below a specified quality_min
with an N, while soft is replacing such residues with lower case. The sequences are values to SEQ keys and the quality scores are values to SCORES keys. The SCORES are encoded as ranges of ASCII characters from ā!ā to āIā indicating scores from 0 to 40.
Usage
mask_seq([quality_min: <uint>[, mask: <:soft|:hard>]])
Options
-
quality_min: <uint> - Minimum quality (default=20).
-
mask: <string> - Soft or Hard mask (default=soft).
Examples
Consider the following FASTQ entry in the file test.fq:
@HWI-EAS157_20FFGAAXX:2:1:888:434
TTGGTCGCTCGCTCCGCGACCTCAGATCAGACGTGGGCGAT
+HWI-EAS157_20FFGAAXX:2:1:888:434
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHI
We can read in these sequence using read_fastq
and then soft mask the sequence with mask_seq like this:
BD.new.read_fastq(input: "test.fq").mask_seq.dump.run
{:SEQ_NAME=>"HWI-EAS157_20FFGAAXX:2:1:888:434",
:SEQ=>"ttggtcgctcgctccgcgacCTCAGATCAGACGTGGGCGAT",
:SEQ_LEN=>41,
:SCORES=>"!\"\#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHI"}
Using the quality_min
option we can change the cutoff:
BD.new.read_fastq(input: "test.fq").mask_seq(quality_min: 25).dump.run
{:SEQ_NAME=>"HWI-EAS157_20FFGAAXX:2:1:888:434",
:SEQ=>"ttggtcgctcgctccgcgacctcagATCAGACGTGGGCGAT",
:SEQ_LEN=>41,
:SCORES=>"!\"\#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHI"}
Using the mask
option for hard masking:
BD.new.read_fastq(input: "test.fq").mask_seq(mask: :hard).dump.run
{:SEQ_NAME=>"HWI-EAS157_20FFGAAXX:2:1:888:434",
:SEQ=>"NNNNNNNNNNNNNNNNNNNNCTCAGATCAGACGTGGGCGAT",
:SEQ_LEN=>41,
:SCORES=>"!\"\#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHI"}
Constant Summary collapse
- STATS =
%i(records_in records_out sequences_in sequences_out residues_in residues_out masked)
Instance Method Summary collapse
-
#initialize(options) ⇒ MaskSeq
constructor
Constructor for MaskSeq.
-
#lmb ⇒ Proc
Return command lambda for mask_seq.
Constructor Details
#initialize(options) ⇒ MaskSeq
Constructor for MaskSeq.
95 96 97 98 99 100 101 102 |
# File 'lib/BioDSL/commands/mask_seq.rb', line 95 def initialize() @options = defaults @mask = [:mask].to_sym end |
Instance Method Details
#lmb ⇒ Proc
Return command lambda for mask_seq.
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
# File 'lib/BioDSL/commands/mask_seq.rb', line 107 def lmb lambda do |input, output, status| status_init(status, STATS) input.each do |record| @status[:records_in] += 1 mask_seq(record) if record[:SEQ] && record[:SCORES] output << record @status[:records_out] += 1 end @status[:masked_percent] = (100 * @status[:masked].to_f / @status[:residues_in]).round(2) end end |