Class: Bio::HMMER::Model

Inherits:
Object
  • Object
show all
Defined in:
lib/bio/appl/hmmer/model.rb

Overview

Example HMM model file, taken from PFAM 26.

HMMER3/b [3.0 | March 2010] NAME 1-cysPrx_C ACC PF10417.4 DESC C-terminal domain of 1-Cys peroxiredoxin LENG 40 ALPH amino RF no CS yes MAP yes DATE Mon Sep 26 01:36:52 2011 NSEQ 86 EFFN 26.221497 CKSUM 2893062708 GA 20.40 20.40; TC 20.40 20.50; NC 20.30 20.30; BM hmmbuild HMM.ann SEED.ann SM hmmsearch -Z 15929002 -E 1000 –cpu 4 HMM pfamseq STATS LOCAL MSV -7.4458 0.71948 STATS LOCAL VITERBI -7.8857 0.71948 STATS LOCAL FORWARD -4.3017 0.71948 HMM A C D E F G H I K L M N P Q R S T V W Y

          m->m     m->i     m->d     i->m     i->i     d->m     d->d
COMPO   2.25274  4.33630  2.74834  2.65826  3.87771  2.70273  3.95751  3.25125  2.56848  2.82101  4.06536  3.21194  2.50202  2.97228  3.39798  2.99665  2.70159  2.62185  3.52465  3.78187
        2.68618  4.42225  2.77519  2.73123  3.46354  2.40513  3.72494  3.29354  2.67741  2.69355  4.24690  2.90347  2.73739  3.18146  2.89801  2.37887  2.77519  2.98518  4.58477  3.61503
        0.00153  6.88065  7.60300  0.61958  0.77255  0.00000        *
    1   0.35002  6.53529  7.25102  7.30399  7.52698  2.51105  7.66616  7.05965  7.14684  6.75377  3.79371  6.32251  6.23186  7.14685  7.02553  1.76469  5.22656  6.02391  8.86510  7.90937      1 - H
        2.68618  4.42225  2.77519  2.73123  3.46354  2.40513  3.72494  3.29354  2.67741  2.69355  4.24690  2.90347  2.73739  3.18146  2.89801  2.37887  2.77519  2.98518  4.58477  3.61503
        0.00153  6.88065  7.60300  0.61958  0.77255  0.48576  0.95510
    2   3.84702  5.95842  6.58626  5.97181  2.06407  5.79758  6.11757  2.43011  5.76033  0.58540  3.17079  5.96610  6.11889  5.84497  5.74893  5.11667  4.83456  2.97676  3.47925  3.13829      2 - H
        2.68618  4.42225  2.77519  2.73123  3.46354  2.40513  3.72494  3.29354  2.67741  2.69355  4.24690  2.90347  2.73739  3.18146  2.89801  2.37887  2.77519  2.98518  4.58477  3.61503
        0.00153  6.88065  7.60300  0.61958  0.77255  0.48576  0.95510

Defined Under Namespace

Classes: ParseException

Constant Summary collapse

TAGS =
%w(NAME ACC DESC LENG ALPH RF CS MAP DATE NSEQ EFFN CKSUM GA TC NC BM SM STATS_LOCAL_MSV)

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Instance Attribute Details

#alphabetObject

Letters used in the model (amino acids). The order of this array is the same as the order



50
51
52
# File 'lib/bio/appl/hmmer/model.rb', line 50

def alphabet
  @alphabet
end

#insert_emissionsObject

Probabilities for inserts in the model (second line of each position in the main model part of the HMMER3/b file format)



67
68
69
# File 'lib/bio/appl/hmmer/model.rb', line 67

def insert_emissions
  @insert_emissions
end

#match_emissionsObject

Probabilities for matches in the model (first line of each position in the main model part of the HMMER3/b file format) You probably want to use #match_probability not read this variable directly.

For speed reasons, you may want to use this data directly, however.

#match_emissions is an Array of Array objects: model#[amino_acid_index] => Probability of match

amino_acid_index is an index of #alphabet



64
65
66
# File 'lib/bio/appl/hmmer/model.rb', line 64

def match_emissions
  @match_emissions
end

#state_transitionsObject

Probabilities for state transitions in the model (third line of each position in the main model part of the HMMER3/b file format)



70
71
72
# File 'lib/bio/appl/hmmer/model.rb', line 70

def state_transitions
  @state_transitions
end

#statsObject

An array of 4 element arrays from STATS lines (e.g. STATS LOCAL FORWARD -4.3017 0.71948) => [‘LOCAL’,‘FORWARD’,-4.3017,0.71948]



47
48
49
# File 'lib/bio/appl/hmmer/model.rb', line 47

def stats
  @stats
end

#transitionsObject

“m->m”, “m->i”, “m->d”, “i->m”, “i->i”, “d->m”, “d->d”


53
54
55
# File 'lib/bio/appl/hmmer/model.rb', line 53

def transitions
  @transitions
end

Class Method Details

.models(multiple_model_text) ⇒ Object

A reader interface for multiple HMMs into individual model objects

Examples

# Iterator
Bio::HMMER::Model.models(big_hmm_text) do |mode|
  model
end

# Array
reports = Bio::HMMER::Model.models(reports_text)


84
85
86
87
88
89
90
91
92
93
94
# File 'lib/bio/appl/hmmer/model.rb', line 84

def self.models(multiple_model_text)
  ary = []
  multiple_model_text.each_line("\n//\n") do |report|
    if block_given?
      yield Bio::HMMER::Model.parse(report)
    else
      ary << Bio::HMMER::Model.parse(report)
    end
  end
  return ary
end

.parse(io) ⇒ Object

Parse a HMM file fed in through some IO, and return an instantiated Bio::HMMER::Model with all the informations filled in

e.g. PF10417.4 has length 40 (as of PFAM v26):

Bio::Hmmer::Model.parse(File.open('test/data/PF10417.4.hmm')).leng => 40


112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
# File 'lib/bio/appl/hmmer/model.rb', line 112

def self.parse(io)
  model = Bio::HMMER::Model.new

  lines = nil
  if io.kind_of?(String)
    lines = io.split("\n")
  else
    lines = io.read.split("\n")
  end

  unless lines[0].match(/^HMMER3\/b/)
    raise ParseException, "Only HMMER3/b files are currently supported, sorry. The first line of the HMMER model file causing the problem is #{lines[0]}"
  end
  lineno = 1

  # Majority of tags at the top of the model
  while TAGS.include?(lines[lineno].split(/ +/)[0])
    splits = lines[lineno].strip.split(/ +/)
    lit = splits[0].downcase
    tag = TAGS.select{|t| t==lit}[0]
    value = splits[1..(splits.length-1)].join(' ')

    # cast appropriately
    value = value.to_i if %w(leng nseq cksum).include?(lit)
    value = value.to_f if %w(effn).include?(lit)
    value = [splits[1].to_f, splits[2].to_f] if %w(ga tc nc).include?(lit)

    model.send("#{lit}=".to_sym,value)
    lineno += 1
  end

  # STATS_LOCAL_MSV STATS_LOCAL_VITERBI STATS_LOCAL_FORWARD
  while matches = lines[lineno].match(/^STATS ([^ ]+) ([^ ]+) +([\-\d\.]+) +([\-\d\.]+)$/)
    model.stats ||= []
    model.stats.push [matches[1], matches[2], matches[3].to_f, matches[4].to_f]
    lineno += 1
  end

  # Next expect HMM   A C D ...
  splits = lines[lineno].strip.split(/ +/)
  unless splits[0]=='HMM'
    raise ParseException, "Unexpected HMM file line encountered, expected the beginning of the main model section e.g. HMM          A        C        D        E ...: #{lines[lineno]}"
  end
  model.alphabet = splits[1..(splits.length-1)]
  lineno += 1

  # m->m     m->i     m->d     i->m     i->i     d->m     d->d
  splits = lines[lineno].strip.split(/ +/)
  unless splits.length == 7
    raise ParseException, "Unexpected line in HMM file, expected '            m->m     m->i     m->d     i->m     i->i     d->m     d->d', found #{lines[lineno]}"
  end
  model.transitions = splits
  lineno += 1

  #skip the next lines since my application does not require these
  splits = lines[lineno].strip.split(/ +/)
  lineno += 1 if splits[0]=='COMPO'
  lineno += 2

  model.match_emissions = []
  model.insert_emissions = []
  model.state_transitions = []

  # Iterate through the model, one line at a time
  model_iterate = 0
  while !(lines[lineno].match(/^\/\//))
    splits = lines[lineno].strip.split(/ +/)

    unless splits.length > model.alphabet.length
      raise ParseException, "Unexpected line format encountered in the main model: #{lines[lineno]}"
    end

    # Check that the state numbers are in order
    model_iterate += 1
    unless splits[0].to_i == model_iterate
      raise ParseException, "Unexpected model state number: #{splits[0]}"
    end
    probabilities = splits[1..model.alphabet.length].collect do |s|
    # From the HMMER manual:
    # All probability parameters are all stored as negative natural log probabilities with
    # five digits of precision to the right of the decimal point, rounded. For example,
    # a probability of 0.25 is stored as − log 0.25 = 1.38629.
      Math.exp(-(s.to_f))
    end
    model.match_emissions.push probabilities
    lineno += 1

    splits = lines[lineno].strip.split(/ +/)
    unless splits.length == model.alphabet.length
      raise ParseException, "Unexpected line format encountered in the main model: #{lines[lineno]}"
    end
    model.insert_emissions.push splits.collect{|s| 10.0**-(s.to_f)}
    lineno += 1

    splits = lines[lineno].strip.split(/ +/)
    unless splits.length == model.transitions.length
      raise ParseException, "Unexpected line format encountered in the main model: #{lines[lineno]}"
    end
    model.state_transitions.push splits.collect{|s| 10.0**-(s.to_f)}
    lineno += 1
  end

  return model
end

Instance Method Details

#match_probability(hmm_position, letter) ⇒ Object

Return the probability given in the HMM of the match probability of a particular position.

Note that the hmm_position is 1-based - the first position is 1, not 0

Assumes that the letter supplied is in the HMM’s alphabet



101
102
103
104
# File 'lib/bio/appl/hmmer/model.rb', line 101

def match_probability(hmm_position, letter)
  index = @alphabet.index(letter)
  match_emissions[hmm_position-1][index]
end