Class: Bio::GCG::Seq

Inherits:
Object show all
Defined in:
lib/bio/appl/gcg/seq.rb

Overview

Bio::GCG::Seq

This is GCG sequence file format (.seq or .pep) parser class.

References

  • Information about GCG Wisconsin Package(R)

www.accelrys.com/products/gcg_wisconsin_package .

  • EMBOSS sequence formats

www.hgmp.mrc.ac.uk/Software/EMBOSS/Themes/SequenceFormats.html

  • BioPerl document

docs.bioperl.org/releases/bioperl-1.2.3/Bio/SeqIO/gcg.html

Constant Summary

DELIMITER =

delimiter used by Bio::FlatFile

RS = nil

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(str) ⇒ Seq

Creates new instance of this class. str must be a GCG seq formatted string.



38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# File 'lib/bio/appl/gcg/seq.rb', line 38

def initialize(str)
  @heading = str[/.*/] # '!!NA_SEQUENCE 1.0' or like this
  str = str.sub(/.*/, '')
  str.sub!(/.*\.\.$/m, '')
  @definition = $&.to_s.sub(/^.*\.\.$/, '').to_s
  desc = $&.to_s
  if m = /(.+)\s+Length\:\s+(\d+)\s+(.+)\s+Type\:\s+(\w)\s+Check\:\s+(\d+)/.match(desc) then
    @entry_id = m[1].to_s.strip
    @length   = (m[2] ? m[2].to_i : nil)
    @date     = m[3].to_s.strip
    @seq_type = m[4]
    @checksum = (m[5] ? m[5].to_i : nil)
  end
  @data = str
  @seq = nil
  @definition.strip!
end

Instance Attribute Details

#checksumObject (readonly)

“Check:” field, which indicates checksum of current sequence.



74
75
76
# File 'lib/bio/appl/gcg/seq.rb', line 74

def checksum
  @checksum
end

#dateObject (readonly)

Date field of this entry.



67
68
69
# File 'lib/bio/appl/gcg/seq.rb', line 67

def date
  @date
end

#definitionObject (readonly)

Description field.



60
61
62
# File 'lib/bio/appl/gcg/seq.rb', line 60

def definition
  @definition
end

#entry_idObject (readonly)

ID field.



57
58
59
# File 'lib/bio/appl/gcg/seq.rb', line 57

def entry_id
  @entry_id
end

#headingObject (readonly)

heading ('!!NA_SEQUENCE 1.0' or whatever like this)



78
79
80
# File 'lib/bio/appl/gcg/seq.rb', line 78

def heading
  @heading
end

#lengthObject (readonly)

“Length:” field. Note that sometimes this might differ from real sequence length.



64
65
66
# File 'lib/bio/appl/gcg/seq.rb', line 64

def length
  @length
end

#seq_typeObject (readonly)

“Type:” field, which indicates sequence type. “N” means nucleic acid sequence, “P” means protein sequence.



71
72
73
# File 'lib/bio/appl/gcg/seq.rb', line 71

def seq_type
  @seq_type
end

Class Method Details

.calc_checksum(str) ⇒ Object

Calculates checksum from given string.



141
142
143
144
145
146
147
148
149
150
151
# File 'lib/bio/appl/gcg/seq.rb', line 141

def self.calc_checksum(str)
  # Reference: Bio::SeqIO::gcg of BioPerl-1.2.3
  idx = 0
  sum = 0
  str.upcase.tr('^A-Z.~', '').each_byte do |c|
    idx += 1
    sum += idx * c
    idx = 0 if idx >= 57
  end
  (sum % 10000)
end

.to_gcg(hash) ⇒ Object

Creates a new GCG sequence format text. Parameters can be omitted.

Examples:

Bio::GCG::Seq.to_gcg(:definition=>'H.sapiens DNA',
                     :seq_type=>'N', :entry_id=>'gi-1234567',
                     :seq=>seq, :date=>date)


161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
# File 'lib/bio/appl/gcg/seq.rb', line 161

def self.to_gcg(hash)
  seq = hash[:seq]
  if seq.is_a?(Bio::Sequence::NA) then
    seq_type = 'N'
  elsif seq.is_a?(Bio::Sequence::AA) then
    seq_type = 'P'
  else
    seq_type = (hash[:seq_type] or 'P')
  end
  if seq_type == 'N' then
    head = '!!NA_SEQUENCE 1.0'
  else
    head = '!!AA_SEQUENCE 1.0'
  end
  date = (hash[:date] or Time.now.strftime('%B %d, %Y %H:%M'))
  entry_id = hash[:entry_id].to_s.strip
  len = seq.length
  checksum = self.calc_checksum(seq)
  definition = hash[:definition].to_s.strip
  seq = seq.upcase.gsub(/.{1,50}/, "\\0\n")
  seq.gsub!(/.{10}/, "\\0 ")
  w = len.to_s.size + 1
  i = 1
  seq.gsub!(/^/) { |x| s = sprintf("\n%*d ", w, i); i += 50; s }

  [ head, "\n", definition, "\n\n",
    "#{entry_id}  Length: #{len}  #{date}  " \
    "Type: #{seq_type}  Check: #{checksum}  ..\n",
    seq, "\n" ].join('')
end

Instance Method Details

#aaseqObject

If you know the sequence is AA, use this method. Returns a Bio::Sequence::AA object.

If you call naseq for protein sequence, or aaseq for nucleic sequence, RuntimeError will be raised.



108
109
110
111
112
113
114
# File 'lib/bio/appl/gcg/seq.rb', line 108

def aaseq
  if seq.is_a?(Bio::Sequence::AA) then
    @seq
  else
    raise 'seq_type != \'P\''
  end
end

#naseqObject

If you know the sequence is NA, use this method. Returens a Bio::Sequence::NA object.

If you call naseq for protein sequence, or aaseq for nucleic sequence, RuntimeError will be raised.



121
122
123
124
125
126
127
# File 'lib/bio/appl/gcg/seq.rb', line 121

def naseq
  if seq.is_a?(Bio::Sequence::NA) then
    @seq
  else
    raise 'seq_type != \'N\''
  end
end

#seqObject

Sequence data. The class of the sequence is Bio::Sequence::NA, Bio::Sequence::AA or Bio::Sequence::Generic, according to the sequence type.



88
89
90
91
92
93
94
95
96
97
98
99
100
101
# File 'lib/bio/appl/gcg/seq.rb', line 88

def seq
  unless @seq then
    case @seq_type
    when 'N', 'n'
      k = Bio::Sequence::NA
    when 'P', 'p'
      k = Bio::Sequence::AA
    else
      k = Bio::Sequence
    end
    @seq = k.new(@data.tr('^-a-zA-Z.~', ''))
  end
  @seq
end

#validate_checksumObject

Validates checksum. If validation succeeds, returns true. Otherwise, returns false.



132
133
134
# File 'lib/bio/appl/gcg/seq.rb', line 132

def validate_checksum
  checksum == self.class.calc_checksum(seq)
end