Class: Bio::NBRF

Inherits:
DB show all
Defined in:
lib/bio/db/nbrf.rb

Overview

Sequence data class for NBRF/PIR flatfile format.

Constant Summary collapse

DELIMITER =

Delimiter of each entry. Bio::FlatFile uses it.

RS = "\n>"
DELIMITER_OVERRUN =

(Integer) excess read size included in DELIMITER.

1

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from DB

#exists?, #fetch, #get, open, #tags

Constructor Details

#initialize(str) ⇒ NBRF

Creates a new NBRF object. It stores the comment and sequence information from one entry of the NBRF/PIR format string. If the argument contains more than one entry, only the first entry is used.



45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# File 'lib/bio/db/nbrf.rb', line 45

def initialize(str)
  str = str.sub(/\A[\r\n]+/, '') # remove first void lines
  line1, line2, rest = str.split(/^/, 3)

  rest = rest.to_s
  rest.sub!(/^>.*/m, '') # remove trailing entries for sure
  @entry_overrun = $&
  rest.sub!(/\*\s*\z/, '') # remove last '*' and "\n"
  @data = rest

  @definition = line2.to_s.chomp
  if /^>?([A-Za-z0-9]{2})\;(.*)/ =~ line1.to_s then
    @seq_type = $1
    @entry_id = $2
  end
end

Instance Attribute Details

#dataObject

sequence data of the entry (???)



77
78
79
# File 'lib/bio/db/nbrf.rb', line 77

def data
  @data
end

#definitionObject

Returns the description line of the NBRF/PIR formatted data.



74
75
76
# File 'lib/bio/db/nbrf.rb', line 74

def definition
  @definition
end

#entry_idObject Also known as: accession

Returns ID described in the entry.



70
71
72
# File 'lib/bio/db/nbrf.rb', line 70

def entry_id
  @entry_id
end

#entry_overrunObject (readonly)

piece of next entry. Bio::FlatFile uses it.



80
81
82
# File 'lib/bio/db/nbrf.rb', line 80

def entry_overrun
  @entry_overrun
end

#seq_typeObject

Returns sequence type described in the entry.

P1 (protein), F1 (protein fragment)
DL (DNA linear), DC (DNA circular)
RL (DNA linear), RC (DNA circular)
N3 (tRNA), N1 (other functional RNA)


67
68
69
# File 'lib/bio/db/nbrf.rb', line 67

def seq_type
  @seq_type
end

Class Method Details

.to_nbrf(hash) ⇒ Object

Creates a NBRF/PIR formatted text. Parameters can be omitted.



167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
# File 'lib/bio/db/nbrf.rb', line 167

def self.to_nbrf(hash)
  seq_type = hash[:seq_type]
  seq = hash[:seq]
  unless seq_type
    if seq.is_a?(Bio::Sequence::AA) then
      seq_type = 'P1'
    elsif seq.is_a?(Bio::Sequence::NA) then
      seq_type = /u/i =~ seq ? 'RL' : 'DL'
    else
      seq_type = 'XX'
    end
  end
  width = hash.has_key?(:width) ? hash[:width] : 70
  if width then
    seq = seq.to_s + "*"
    seq.gsub!(Regexp.new(".{1,#{width}}"), "\\0\n")
  else
    seq = seq.to_s + "*\n"
  end
  ">#{seq_type};#{hash[:entry_id]}\n#{hash[:definition]}\n#{seq}"
end

Instance Method Details

#aalenObject

Returens the length of protein (amino acids) sequence. If you call aaseq for nucleic acids sequence, RuntimeError will be occurred. Use the method if you know whether the sequence is NA or AA.



157
158
159
# File 'lib/bio/db/nbrf.rb', line 157

def aalen
  aaseq.length
end

#aaseqObject

Returens the protein (amino acids) sequence. If you call aaseq for nucleic acids sequence, RuntimeError will be occurred. Use the method if you know whether the sequence is NA or AA.



143
144
145
146
147
148
149
150
151
# File 'lib/bio/db/nbrf.rb', line 143

def aaseq
  if seq.is_a?(Bio::Sequence::NA) then
    raise 'not nucleic but protein sequence'
  elsif seq.is_a?(Bio::Sequence::AA) then
    seq
  else
    Bio::Sequence::AA.new(seq)
  end
end

#entryObject Also known as: to_s

Returns the stored one entry as a NBRF/PIR format. (same as to_s)



84
85
86
# File 'lib/bio/db/nbrf.rb', line 84

def entry
  @entry = ">#{@seq_type or 'XX'};#{@entry_id}\n#{definition}\n#{@data}*\n"
end

#lengthObject

Returns sequence length.



115
116
117
# File 'lib/bio/db/nbrf.rb', line 115

def length
  seq.length
end

#nalenObject

Returens the length of sequence. If you call nalen for protein sequence, RuntimeError will be occurred. Use the method if you know whether the sequence is NA or AA.



135
136
137
# File 'lib/bio/db/nbrf.rb', line 135

def nalen
  naseq.length
end

#naseqObject

Returens the nucleic acid sequence. If you call naseq for protein sequence, RuntimeError will be occurred. Use the method if you know whether the sequence is NA or AA.



122
123
124
125
126
127
128
129
130
# File 'lib/bio/db/nbrf.rb', line 122

def naseq
  if seq.is_a?(Bio::Sequence::AA) then
    raise 'not nucleic but protein sequence'
  elsif seq.is_a?(Bio::Sequence::NA) then
    seq
  else
    Bio::Sequence::NA.new(seq)
  end
end

#seqObject

Returns sequence data. Returns Bio::Sequence::NA, Bio::Sequence::AA or Bio::Sequence, according to the sequence type.



107
108
109
110
111
112
# File 'lib/bio/db/nbrf.rb', line 107

def seq
  unless defined?(@seq)
    @seq = seq_class.new(@data.tr(" \t\r\n0-9", '')) # lazy clean up
  end
  @seq
end

#seq_classObject

Returns Bio::Sequence::AA, Bio::Sequence::NA, or Bio::Sequence, depending on sequence type.



91
92
93
94
95
96
97
98
99
100
101
102
# File 'lib/bio/db/nbrf.rb', line 91

def seq_class
  case @seq_type
  when /[PF]1/
    # protein
    Sequence::AA
  when /[DR][LC]/, /N[13]/
    # nucleic
    Sequence::NA
  else
    Sequence
  end
end