Module: Ms::Fasta
- Defined in:
- lib/ms/fasta.rb
Overview
A convenience class for working with fasta formatted sequence databases. the file which includes this class also includes Enumerable with Bio::FlatFile so you can do things like this:
accessions = Ms::Fasta.open("file.fasta") do |fasta|
fasta.map(&:accession)
end
A few aliases are added to Bio::FastaFormat
entry.header == entry.definition
entry.sequence == entry.seq
Ms::Fasta.new accepts both an IO object or a String (a fasta formatted string itself)
# taking an io object:
File.open("file.fasta") do |io|
fasta = Ms::Fasta.new(io)
... do something with it
end
# taking a string
string = ">id1 a simple header\nAAASDDEEEDDD\n>id2 header again\nPPPPPPWWWWWWTTTTYY\n"
fasta = Ms::Fasta.new(string)
(simple, not_simple) = fasta.partition {|entry| entry.header =~ /simple/ }
Class Method Summary collapse
-
.foreach(file, &block) ⇒ Object
yields each Bio::FastaFormat object in turn.
-
.new(io) ⇒ Object
takes an IO object or a string that is the fasta data itself.
-
.open(file, &block) ⇒ Object
opens the flatfile and yields a Bio::FlatFile object.
-
.protein_lengths_and_descriptions(file) ⇒ Object
returns two hashes [id_to_length, id_to_description] faster (~4x) than official route.
Class Method Details
.foreach(file, &block) ⇒ Object
yields each Bio::FastaFormat object in turn
46 47 48 49 50 |
# File 'lib/ms/fasta.rb', line 46 def self.foreach(file, &block) Bio::FlatFile.open(Bio::FastaFormat, file) do |fasta| fasta.each(&block) end end |
.new(io) ⇒ Object
takes an IO object or a string that is the fasta data itself
53 54 55 56 |
# File 'lib/ms/fasta.rb', line 53 def self.new(io) io = StringIO.new(io) if io.is_a?(String) Bio::FlatFile.new(Bio::FastaFormat, io) end |
.open(file, &block) ⇒ Object
opens the flatfile and yields a Bio::FlatFile object
41 42 43 |
# File 'lib/ms/fasta.rb', line 41 def self.open(file, &block) Bio::FlatFile.open(Bio::FastaFormat, file, &block) end |
.protein_lengths_and_descriptions(file) ⇒ Object
returns two hashes [id_to_length, id_to_description] faster (~4x) than official route.
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
# File 'lib/ms/fasta.rb', line 60 def self.protein_lengths_and_descriptions(file) protid_to_description = {} protid_to_length = {} re = /^>([^\s]+) (.*)/ ids = [] lengths = [] current_length = nil IO.foreach(file) do |line| line.chomp! if md=re.match(line) lengths << current_length current_id = md[1] ids << current_id current_length = 0 protid_to_description[current_id] = md[2] else current_length += line.size end end lengths << current_length lengths.shift # remove the first nil entry [Hash[ids.zip(lengths).to_a], protid_to_description] end |