Class: Bio::PAML::Codeml::Report

Inherits:
Bio::PAML::Common::Report show all
Defined in:
lib/bio/appl/paml/codeml/report.rb

Overview

Description

Run PAML codeml and get the results from the output file. The Codeml::Report object is returned by Bio::PAML::Codeml.query. For example

codeml = Bio::PAML::Codeml.new('codeml', :runmode => 0, 
    :RateAncestor => 1, :alpha => 0.5, :fix_alpha => 0)
result = codeml.query(alignment, tree)

where alignment and tree are Bioruby objects. This class assumes we have a buffer containing the output of codeml.

References

Phylogenetic Analysis by Maximum Likelihood (PAML) is a package of programs for phylogenetic analyses of DNA or protein sequences using maximum likelihood. It is maintained and distributed for academic use free of charge by Ziheng Yang. Suggestion citation

Yang, Z. 1997
PAML: a program package for phylogenetic analysis by maximum likelihood 
CABIOS 13:555-556

abacus.gene.ucl.ac.uk/software/paml.html

Examples

– The following is not shown in the documentation

>> require 'bio'
>> require 'bio/test/biotestfile'
>> buf = BioTestFile.read('paml/codeml/models/results0-3.txt')

++

Invoke Bioruby's PAML codeml parser, after having read the contents of the codeml result file into buf (for example using File.read)

>> c = Bio::PAML::Codeml::Report.new(buf)

Do we have two models?

>> c.models.size
=> 2
>> c.models[0].name
=> "M0"
>> c.models[1].name
=> "M3"

Check the general information

>> c.num_sequences
=> 6
>> c.num_codons
=> 134
>> c.descr
=> "M0-3"

Test whether the second model M3 is significant over M0

>> c.significant
=> true

Now fetch the results of the first model M0, and check its values

>> m0 = c.models[0]
>> m0.tree_length
=> 1.90227
>> m0.lnL
=> -1125.800375
>> m0.omega
=> 0.58589
>> m0.dN_dS
=> 0.58589
>> m0.kappa
=> 2.14311
>> m0.alpha
=> nil

We also have a tree (as a string)

>> m0.tree
=> "((((PITG_23265T0: 0.000004, PITG_23253T0: 0.400074): 0.000004, PITG_23257T0: 0.952614): 0.000004, PITG_23264T0: 0.445507): 0.000004, PITG_23267T0: 0.011814, PITG_23293T0: 0.092242);"

Check the M3 and its specific values

>> m3 = c.models[1]
>> m3.lnL
=> -1070.964046
>> m3.classes.size
=> 3
>> m3.classes[0]
=> {:w=>0.00928, :p=>0.56413}

And the tree

>> m3.tree
=> "((((PITG_23265T0: 0.000004, PITG_23253T0: 0.762597): 0.000004, PITG_23257T0: 2.721710): 0.000004, PITG_23264T0: 0.924326): 0.014562, PITG_23267T0: 0.000004, PITG_23293T0: 0.237433);"

Next take the overall posterior analysis

>> c.nb_sites.size
=> 44
>> c.nb_sites[0].to_a
=> [17, "I", 0.988, 3.293]

or by field

>> codon = c.nb_sites[0]
>> codon.position
=> 17
>> codon.probability
=> 0.988
>> codon.dN_dS
=> 3.293

with aliases

>> codon.p
=> 0.988
>> codon.w
=> 3.293

Now we generate special string 'graph' for positive selection. The following returns a string the length of the input alignment and shows the locations of positive selection:

>> c.nb_sites.graph[0..32]
=> "                **    *       * *"

And with dN/dS (high values are still an asterisk *)

>> c.nb_sites.graph_omega[0..32]
=> "                3*    6       6 2"

We also provide the raw buffers to adhere to the principle of unexpected use. Test the raw buffers for content:

>> c.header.to_s =~ /seed/
=> 1
>> m0.to_s =~ /one-ratio/
=> 3
>> m3.to_s =~ /discrete/
=> 3
>> c.footer.to_s =~ /Bayes/
=> 16

Finally we do a test on an M7+M8 run. Again, after loading the results file into buf

>> buf78 = BioTestFile.read('paml/codeml/models/results7-8.txt')

++

Invoke Bioruby's PAML codeml parser

>> c = Bio::PAML::Codeml::Report.new(buf78)

Do we have two models?

>> c.models.size
=> 2
>> c.models[0].name
=> "M7"
>> c.models[1].name
=> "M8"

Assert the results are significant

>> c.significant
=> true

Compared to M0/M3 there are some differences. The important ones are the parameters and the full Bayesian result available for M7/M8. This is the naive Bayesian result:

>> c.nb_sites.size
=> 10

And this is the full Bayesian result:

>> c.sites.size
=> 30
>> c.sites[0].to_a
=> [17, "I", 0.672, 2.847]
>> c.sites.graph[0..32]
=> "                **    *       * *"

Note the differences of omega with earlier M0-M3 naive Bayesian analysis:

>> c.sites.graph_omega[0..32]
=> "                24    3       3 2"

The locations are the same, but the omega differs.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(buf) ⇒ Report

Parse codeml output file passed with buf, where buf contains the content of a codeml result file

Raises:



222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
# File 'lib/bio/appl/paml/codeml/report.rb', line 222

def initialize buf
  # split the main buffer into sections for each model, header and footer.
  sections = buf.split("\nModel ")
  model_num = sections.size-1
  raise ReportError,"Incorrect codeml data models=#{model_num}" if model_num > 2
  foot2 = sections[model_num].split("\nNaive ")
  if foot2.size == 2
    # We have a dual model
    sections[model_num] = foot2[0]
    @footer = 'Naive '+foot2[1]
    @models = []
    sections[1..-1].each do | model_buf |
      @models.push Model.new(model_buf)
    end
  else
    # A single model is run
    sections = buf.split("\nTREE #")
    model_num = sections.size-1
    raise ReportError,"Can not parse single model file" if model_num != 1
    @models = []
    @models.push sections[1]
    @footer = sections[1][/Time used/,1]
    @single = ReportSingle.new(buf)
  end
  @header = sections[0]
end

Instance Attribute Details

Returns the value of attribute footer



218
219
220
# File 'lib/bio/appl/paml/codeml/report.rb', line 218

def footer
  @footer
end

#headerObject (readonly)

Returns the value of attribute header



218
219
220
# File 'lib/bio/appl/paml/codeml/report.rb', line 218

def header
  @header
end

#modelsObject (readonly)

Returns the value of attribute models



218
219
220
# File 'lib/bio/appl/paml/codeml/report.rb', line 218

def models
  @models
end

Instance Method Details

#alphaObject

compatibility call for older interface (single models only)



325
326
327
# File 'lib/bio/appl/paml/codeml/report.rb', line 325

def alpha
  @single.alpha
end

#descrObject

Give a short description of the models, for example 'M0-3'



250
251
252
253
254
255
256
257
258
259
260
# File 'lib/bio/appl/paml/codeml/report.rb', line 250

def descr
  num = @models.size
  case num
    when 0 
      'No model'
    when 1 
      @models[0].name
    else 
      @models[0].name + '-' + @models[1].modelnum.to_s
  end
end

#nb_sitesObject

Return a PositiveSites (naive empirical bayesian) object



273
274
275
# File 'lib/bio/appl/paml/codeml/report.rb', line 273

def nb_sites
  PositiveSites.new("Naive Empirical Bayes (NEB)",@footer,num_codons)
end

#num_codonsObject

Return the number of condons in the codeml alignment



263
264
265
# File 'lib/bio/appl/paml/codeml/report.rb', line 263

def num_codons
  @header.scan(/seed used = \d+\n\s+\d+\s+\d+/).to_s.split[5].to_i/3
end

#num_sequencesObject

Return the number of sequences in the codeml alignment



268
269
270
# File 'lib/bio/appl/paml/codeml/report.rb', line 268

def num_sequences
  @header.scan(/seed used = \d+\n\s+\d+\s+\d+/).to_s.split[4].to_i
end

#significantObject

If the number of models is two we can calculate whether the result is statistically significant, or not, at the 1% significance level. For example, for M7-8 the LRT statistic, or twice the log likelihood difference between the two compared models, may be compared against chi-square, with critical value 9.21 at the 1% significance level.

Here we support a few likely combinations, M0-3, M1-2 and M7-8, used most often in literature. For other combinations, or a different significance level, you'll have to calculate chi-square yourself.

Returns true or false. If no result is calculated this method raises an error

Raises:



294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
# File 'lib/bio/appl/paml/codeml/report.rb', line 294

def significant
  raise ReportError,"Wrong number of models #{@models.size}" if @models.size != 2
  lnL1 = @models[0].lnL
  model1 = @models[0].modelnum
  lnL2 = @models[1].lnL
  model2 = @models[1].modelnum
  case [model1, model2]
    when [0,3]
      2*(lnL2-lnL1) > 13.2767   # chi2: p=0.01, df=4
    when [1,2]
      2*(lnL2-lnL1) >  9.2103   # chi2: p=0.01, df=2
    when [7,8]
      2*(lnL2-lnL1) >  9.2103   # chi2: p=0.01, df=2
    else
      raise ReportError,"Significance calculation for #{descr} not supported"
  end
end

#sitesObject

Return a PositiveSites Bayes Empirical Bayes (BEB) analysis



278
279
280
# File 'lib/bio/appl/paml/codeml/report.rb', line 278

def sites
  PositiveSites.new("Bayes Empirical Bayes (BEB)",@footer,num_codons)
end

#treeObject

compatibility call for older interface (single models only)



330
331
332
# File 'lib/bio/appl/paml/codeml/report.rb', line 330

def tree
  @single.tree
end

#tree_lengthObject

compatibility call for older interface (single models only)



320
321
322
# File 'lib/bio/appl/paml/codeml/report.rb', line 320

def tree_length
  @single.tree_length
end

#tree_log_likelihoodObject

compatibility call for older interface (single models only)



315
316
317
# File 'lib/bio/appl/paml/codeml/report.rb', line 315

def tree_log_likelihood
  @single.tree_log_likelihood
end