Class: Bio::Go

Inherits:
Object
  • Object
show all
Defined in:
lib/go.rb

Defined Under Namespace

Classes: SubsumeTester

Instance Method Summary collapse

Constructor Details

#initializeGo

Returns a new instance of Go.



7
8
9
10
# File 'lib/go.rb', line 7

def initialize
  @r = RSRuby.instance
  @r.library('GO.db')
end

Instance Method Details

#ancestors_cc(primary_go_id) ⇒ Object

Return an array of GO ids that correspond to the parent GO terms in the ontology. This isn’t the most efficient this could be, because it probably gets the parents for a single id multiple times.



151
152
153
# File 'lib/go.rb', line 151

def ancestors_cc(primary_go_id)
  go_get(primary_go_id, 'GOCCANCESTOR')
end

#biological_process_offspring(go_term) ⇒ Object

Return an array of GO identifiers that are the offspring (all the descendents) of the given GO term given that it is a biological process GO term.



46
47
48
# File 'lib/go.rb', line 46

def biological_process_offspring(go_term)
  go_get(go_term, 'GOBPOFFSPRING')
end

#cc_pdb_to_go(pdb_id) ⇒ Object

Retrieve the GO annotations associated with a PDB id, using Bio::Fetch PDB and UniprotKB at EBI



97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
# File 'lib/go.rb', line 97

def cc_pdb_to_go(pdb_id)
  # retrieve the pdb file from EBI, to extract the UniprotKB Identifiers
  pdb = Bio::Fetch.new('http://www.ebi.ac.uk/cgi-bin/dbfetch').fetch('pdb', pdb_id)
  
  # parse the PDB and return the uniprot accessions (there may be >1 because of chains)
  uniprots = Bio::PDB.new(pdb).dbref.select{|s| s.database=='UNP'}.collect{|s| s.dbAccession}
  
  gos = []
  uniprots.uniq.each do |uniprot|
    u = Bio::Fetch.new('http://www.ebi.ac.uk/cgi-bin/dbfetch').fetch('uniprot', uniprot)
    
    unp = Bio::SPTR.new(u)
    
    gos.push unp.dr('GO').select{|a|
      a['Version'].match(/^C\:/)
    }.collect{ |g|
      g['Accession']
    }
  end
  
  return gos.flatten.uniq
end

#cellular_component_offspring(go_term) ⇒ Object

Return an array of GO identifiers that are the offspring (all the descendents) of the given GO term given that it is a cellular component GO term.



32
33
34
# File 'lib/go.rb', line 32

def cellular_component_offspring(go_term)
  go_get(go_term, 'GOCCOFFSPRING')
end

#cordial_cc(primary_go_id) ⇒ Object

Return an array of ancestors of the GO term or any of the GO terms’ children, in no particular order. This is useful when wanting to know if a term has an annotation that is non-overlapping with a particular go term. For instance, ‘membrane’ is cordial with ‘nucleus’, they are boths is an ancestors of ‘nuclear membrane’. However, ‘mitochondrion’ and ‘nucleus’ are not cordial, since they share no common offspring.



162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
# File 'lib/go.rb', line 162

def cordial_cc(primary_go_id)
  # cordial can be direct ancestors of a term - then the common term
  # is this term itself
  cordial_ids = ancestors_cc(primary_go_id)
  
  # collect all ancestors of all offspring
  offspring = cellular_component_offspring(primary_go_id)
  offspring.each do |o|
    cordial_ids.push ancestors_cc(o)
    cordial_ids.push o
  end
  
  # remove the term itself and any children - they are not
  # merely cordial
  cordial_ids = cordial_ids.flatten.uniq.reject do |i|
    offspring.include?(i) or primary_go_id==i
  end
  
  # return a uniq array of cordial terms
  cordial_ids
end

#go_get(go_term, partition) ⇒ Object

Generic method for retrieving e.g offspring(‘GO:0042717’, ‘GOCCCHILDREN’)



52
53
54
55
56
# File 'lib/go.rb', line 52

def go_get(go_term, partition)
  answers = @r.eval_R("get('#{go_term}', #{partition})")
  return [] if answers.kind_of?(Bignum) # returns this for some reason when there's no children
  return answers
end

#go_offspring(go_id) ⇒ Object

Return an array of GO identifiers that are the offspring (all the descendents) of the given GO term from any ontology (cellular component, biological process or molecular function)



15
16
17
18
19
20
21
22
23
24
25
26
27
# File 'lib/go.rb', line 15

def go_offspring(go_id)
  o = ontology_abbreviation(go_id)
  case o
    when 'MF'
    return molecular_function_offspring(go_id)
    when 'CC'
    return cellular_component_offspring(go_id)
    when 'BP'
    return biological_process_offspring(go_id)
  else
    raise Exception, "Unknown ontology abbreviation found: #{o} for go id: #{go_id}"
  end
end

#molecular_function_offspring(go_term) ⇒ Object

Return an array of GO identifiers that are the offspring (all the descendents) of the given GO term given that it is a molecular function GO term.



39
40
41
# File 'lib/go.rb', line 39

def molecular_function_offspring(go_term)
  go_get(go_term, 'GOMFOFFSPRING')
end

#ontology_abbreviation(go_id) ⇒ Object

Return ‘MF’, ‘CC’ or ‘BP’ corresponding to the



144
145
146
# File 'lib/go.rb', line 144

def ontology_abbreviation(go_id)
  @r.eval_R("Ontology(get('#{go_id}', GOTERM))")
end

#primary_go_id(go_id_or_synonym_id) ⇒ Object

Given a GO ID such as GO:0048253, return the GO term that is the primary ID (GO:0050333), so that offspring functions can be used properly.



60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
# File 'lib/go.rb', line 60

def primary_go_id(go_id_or_synonym_id)
  # > get('GO:0048253', GOSYNONYM)
  #GOID: GO:0050333
  #Term: thiamin-triphosphatase activity
  #Ontology: MF
  #Definition: Catalysis of the reaction: thiamin triphosphate + H2O =
  #    thiamin diphosphate + phosphate.
  #Synonym: thiamine-triphosphatase activity
  #Synonym: thiamine-triphosphate phosphohydrolase activity
  #Synonym: ThTPase activity
  #Synonym: GO:0048253
  #Secondary: GO:0048253
  
  # A performance note:
  # According to some tests that I ran, finding GOID by searching GOTERM
  # is much faster than by GOSYNONYM. A
  
  begin
    # Assume it is a primary ID, as it likely will be most of the time.
    return @r.eval_R("GOID(get('#{go_id_or_synonym_id}', GOTERM))")
  rescue RException
    # if no primary is found, try to finding it by synonym. raise RException if none is found
    begin
      return @r.eval_R("GOID(get('#{go_id_or_synonym_id}', GOSYNONYM))")
    rescue RException => e
      raise RException, "#{e.message}: GO Identifier '#{go_id_or_synonym_id}' does not appear to be a primary ID nor synonym. Is the GO.db database up to date?"
    end  
  end
end

#subsume?(subsumer_go_id, subsumee_go_id) ⇒ Boolean

Does the subsumer subsume the subsumee? i.e. Does it include the subsumee as one of its children in the GO tree?

For repetitively testing one GO term subsumes others, it might be faster to use subsume_tester

Returns:

  • (Boolean)


125
126
127
128
129
130
131
132
133
134
135
# File 'lib/go.rb', line 125

def subsume?(subsumer_go_id, subsumee_go_id)
  # map the subsumee to non-synonomic id
  primaree = self.primary_go_id(subsumee_go_id)
  primarer = self.primary_go_id(subsumer_go_id)
  
  # return if they are the same - the obvious case
  return true if primaree == primarer
  
  # return if subsumee is a descendent of sumsumer
  return go_offspring(primarer).include?(primaree)
end

#subsume_tester(subsumer_go_id, check_for_synonym = true) ⇒ Object

Return a subsume tester for a given GO term. This method is faster than repeatedly calling subsume? because the list of children is cached



139
140
141
# File 'lib/go.rb', line 139

def subsume_tester(subsumer_go_id, check_for_synonym=true)
  Go::SubsumeTester.new(self, subsumer_go_id, check_for_synonym)
end

#term(go_id) ⇒ Object

Retrieve the string description of the given go identifier



91
92
93
# File 'lib/go.rb', line 91

def term(go_id)
  @r.eval_R("Term(get('#{go_id}', GOTERM))")
end