Class: Bio::Lasergene

Inherits:
Object show all
Defined in:
lib/bio/db/lasergene.rb

Overview

bio/db/lasergene.rb - Interface for DNAStar Lasergene sequence file format

Author

Trevor Wennblom <[email protected]>

Copyright

Copyright © 2007 Center for Biomedical Research Informatics, University of Minnesota (cbri.umn.edu)

License

The Ruby License

Description

Bio::Lasergene reads DNAStar Lasergene formatted sequence files, or .seq files. It only expects to find one sequence per file.

Usage

require 'bio'
filename = 'MyFile.seq'
lseq = Bio::Lasergene.new( IO.readlines(filename) )
lseq.entry_id  # => "Contig 1"
lseq.seq  # => ATGACGTATCCAAAGAGGCGTTACC

Comments

I’m only aware of the following three kinds of Lasergene file formats. Feel free to send me other examples that may not currently be accounted for.

File format 1:

## begin ##
"Contig 1" (1,934)
  Contig Length:                  934 bases
  Average Length/Sequence:        467 bases
  Total Sequence Length:         1869 bases
  Top Strand:                       2 sequences
  Bottom Strand:                    2 sequences
  Total:                            4 sequences
^^
ATGACGTATCCAAAGAGGCGTTACCGGAGAAGAAGACACCGCCCCCGCAGTCCTCTTGGCCAGATCCTCCGCCGCCGCCCCTGGCTCGTCCACCCCCGCCACAGTTACCGCTGGAGAAGGAAAAATGGCATCTTCAWCACCCGCCTATCCCGCAYCTTCGGAWRTACTATCAAGCGAACCACAGTCAGAACGCCCTCCTGGGCGGTGGACATGATGAGATTCAATATTAATGACTTTCTTCCCCCAGGAGGGGGCTCAAACCCCCGCTCTGTGCCCTTTGAATACTACAGAATAAGAAAGGTTAAGGTTGAATTCTGGCCCTGCTCCCCGATCACCCAGGGTGACAGGGGAATGGGCTCCAGTGCTGWTATTCTAGMTGATRRCTTKGTAACAAAGRCCACAGCCCTCACCTATGACCCCTATGTAAACTTCTCCTCCCGCCATACCATAACCCAGCCCTTCTCCTACCRCTCCCGYTACTTTACCCCCAAACCTGTCCTWGATKCCACTATKGATKACTKCCAACCAAACAACAAAAGAAACCAGCTGTGGSTGAGACTACAWACTGCTGGAAATGTAGACCWCGTAGGCCTSGGCACTGCGTKCGAAAACAGTATATACGACCAGGAATACAATATCCGTGTMACCATGTATGTACAATTCAGAGAATTTAATCTTAAAGACCCCCCRCTTMACCCKTAATGAATAATAAMAACCATTACGAAGTGATAAAAWAGWCTCAGTAATTTATTYCATATGGAAATTCWSGGCATGGGGGGGAAAGGGTGACGAACKKGCCCCCTTCCTCCSTSGMYTKTTCYGTAGCATTCYTCCAMAAYACCWAGGCAGYAMTCCTCCSATCAAGAGcYTSYACAGCTGGGACAGCAGTTGAGGAGGACCATTCAAAGGGGGTCGGATTGCTGGTAATCAGA
## end ##

File format 2:

## begin ##
^^:                                  350,935
Contig 1 (1,935)
  Contig Length:                  935 bases
  Average Length/Sequence:        580 bases
  Total Sequence Length:         2323 bases
  Top Strand:                       2 sequences
  Bottom Strand:                    2 sequences
  Total:                            4 sequences
^^
ATGTCGGGGAAATGCTTGACCGCGGGCTACTGCTCATCATTGCTTTCTTTGTGGTATATCGTGCCGTTCTGTTTTGCTGTGCTCGTCAACGCCAGCGGCGACAGCAGCTCTCATTTTCAGTCGATTTATAACTTGACGTTATGTGAGCTGAATGGCACGAACTGGCTGGCAGACAACTTTAACTGGGCTGTGGAGACTTTTGTCATCTTCCCCGTGTTGACTCACATTGTTTCCTATGGTGCACTCACTACCAGTCATTTTCTTGACACAGTTGGTCTAGTTACTGTGTCTACCGCCGGGTTTTATCACGGGCGGTACGTCTTGAGTAGCATCTACGCGGTCTGTGCTCTGGCTGCGTTGATTTGCTTCGCCATCAGGTTTGCGAAGAACTGCATGTCCTGGCGCTACTCTTGCACTAGATACACCAACTTCCTCCTGGACACCAAGGGCAGACTCTATCGTTGGCGGTCGCCTGTCATCATAGAGAAAGGGGGTAAGGTTGAGGTCGAAGGTCATCTGATCGATCTCAAAAGAGTTGTGCTTGATGGCTCTGTGGCGACACCTTTAACCAGAGTTTCAGCGGAACAATGGGGTCGTCCCTAGACGACTTTTGCCATGATAGTACAGCCCCACAGAAGGTGCTCTTGGCGTTTTCCATCACCTACACGCCAGTGATGATATATGCCCTAAAGGTAAGCCGCGGCCGACTTTTGGGGCTTCTGCACCTTTTGATTTTTTTGAACTGTGCCTTTACTTTCGGGTACATGACATTCGTGCACTTTCGGAGCACGAACAAGGTCGCGCTCACTATGGGAGCAGTAGTCGCACTCCTTTGGGGGGTGTACTCAGCCATAGAAACCTGGAAATTCATCACCTCCAGATGCCGTTGTGCTTGCTAGGCCGCAAGTACATTCTGGCCCCTGCCCACCACGTTG
## end ##

File format 3 (non-standard Lasergene header):

## begin ##
LOCUS       PRU87392               15411 bp    RNA     linear   VRL 17-NOV-2000
DEFINITION  Porcine reproductive and respiratory syndrome virus strain VR-2332,
            complete genome.
ACCESSION   U87392 AF030244 U00153
VERSION     U87392.3  GI:11192298
[...cut...]
     3'UTR           15261..15411
     polyA_site      15409
ORIGIN      
^^
atgacgtataggtgttggctctatgccttggcatttgtattgtcaggagctgtgaccattggcacagcccaaaacttgctgcacagaaacacccttctgtgatagcctccttcaggggagcttagggtttgtccctagcaccttgcttccggagttgcactgctttacggtctctccacccctttaaccatgtctgggatacttgatcggtgcacgtgtacccccaatgccagggtgtttatggcggagggccaagtctactgcacacgatgcctcagtgcacggtctctccttcccctgaacctccaagtttctgagctcggggtgctaggcctattctacaggcccgaagagccactccggtggacgttgccacgtgcattccccactgttgagtgctcccccgccggggcctgctggctttctgcaatctttccaatcgcacgaatgaccagtggaaacctgaacttccaacaaagaatggtacgggtcgcagctgagctttacagagccggccagctcacccctgcagtcttgaaggctctacaagtttatgaacggggttgccgctggtaccccattgttggacctgtccctggagtggccgttttcgccaattccctacatgtgagtgataaacctttcccgggagcaactcacgtgttgaccaacctgccgctcccgcagagacccaagcctgaagacttttgcccctttgagtgtgctatggctactgtctatgacattggtcatgacgccgtcatgtatgtggccgaaaggaaagtctcctgggcccctcgtggcggggatgaagtgaaatttgaagctgtccccggggagttgaagttgattgcgaaccggctccgcacctccttcccgccccaccacacagtggacatgtctaagttcgccttcacagcccctgggtgtggtgtttctatgcgggtcgaacgccaacacggctgccttcccgctgacactgtccctgaaggcaactgctggtggagcttgtttgacttgcttccactggaagttcagaacaaagaaattcgccatgctaaccaatttggctaccagaccaagcatggtgtctctggcaagtacctacagcggaggctgca[...cut...]
## end ##

Constant Summary collapse

DELIMITER_1 =

Match ‘^^:’ at the beginning of a line

'^\^\^:'
DELIMITER_2 =

Match ‘^^’ at the beginning of a line

'^\^\^'

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(lines) ⇒ Lasergene

Returns a new instance of Lasergene.



124
125
126
# File 'lib/bio/db/lasergene.rb', line 124

def initialize(lines)
  process(lines)
end

Instance Attribute Details

#average_lengthObject (readonly)

Average length per sequence

  • Parsed from standard Lasergene header



103
104
105
# File 'lib/bio/db/lasergene.rb', line 103

def average_length
  @average_length
end

#bottom_strand_sequencesObject (readonly)

Number of bottom strand sequences

  • Parsed from standard Lasergene header



115
116
117
# File 'lib/bio/db/lasergene.rb', line 115

def bottom_strand_sequences
  @bottom_strand_sequences
end

#commentsObject (readonly)

Entire header before the sequence



86
87
88
# File 'lib/bio/db/lasergene.rb', line 86

def comments
  @comments
end

#contig_lengthObject (readonly)

Contig length, length of present sequence

  • Parsed from standard Lasergene header



99
100
101
# File 'lib/bio/db/lasergene.rb', line 99

def contig_length
  @contig_length
end

#nameObject (readonly)

Name of sequence

  • Parsed from standard Lasergene header



95
96
97
# File 'lib/bio/db/lasergene.rb', line 95

def name
  @name
end

#sequenceObject (readonly)

Sequence

Bio::Sequence::NA or Bio::Sequence::AA object



91
92
93
# File 'lib/bio/db/lasergene.rb', line 91

def sequence
  @sequence
end

#top_strand_sequencesObject (readonly)

Number of top strand sequences

  • Parsed from standard Lasergene header



111
112
113
# File 'lib/bio/db/lasergene.rb', line 111

def top_strand_sequences
  @top_strand_sequences
end

#total_lengthObject (readonly)

Length of parent sequence

  • Parsed from standard Lasergene header



107
108
109
# File 'lib/bio/db/lasergene.rb', line 107

def total_length
  @total_length
end

#total_sequencesObject (readonly)

Number of sequences

  • Parsed from standard Lasergene header



119
120
121
# File 'lib/bio/db/lasergene.rb', line 119

def total_sequences
  @total_sequences
end

Instance Method Details

#entry_idObject

Name of sequence

  • Parsed from standard Lasergene header



147
148
149
# File 'lib/bio/db/lasergene.rb', line 147

def entry_id
  @name
end

#seqObject

Sequence

Bio::Sequence::NA or Bio::Sequence::AA object



141
142
143
# File 'lib/bio/db/lasergene.rb', line 141

def seq
  @sequence
end

#standard_comment?Boolean

Is the comment header recognized as standard Lasergene format?


Arguments

  • none

Returns

true or false

Returns:

  • (Boolean)


134
135
136
# File 'lib/bio/db/lasergene.rb', line 134

def standard_comment?
  @standard_comment
end