Module: OboParser::Utilities
- Defined in:
- lib/utilities.rb
Constant Summary collapse
- HOMOLONTO_HEADER =
Two column translation tools
%{ format-version: 1.2 auto-generated-by: obo_parser default-namespace: fix_me [Typedef] id: OGEE:has_member name: has_member is_a: OBO_REL:relationship def: "C has_member C', C is an homology group and C' is a biological object" [] comment: "We leave open the possibility that an homology group is a biological object. Thus, an homology group C may have C' has_member, with C' being an homology group." is_transitive: true is_anti_symmetric: true }
Class Method Summary collapse
-
.column_translate(options = {}) ⇒ String
Takes a two column input file, references it to two ontologies, and provides a report.
-
.cytoscapify(options = {}) ⇒ Object
Takes a Hash of OBO ontology files, an Array of relationships, and writes two input files (a network, and node properties) for Cytoscape == Example use OboParser::Utilities.cytoscapify(:ontologies => {‘HAO’ => File.read(‘input/hao.obo’), ‘TADS’ => File.read(‘input/tads.obo’), ‘TGMA’ => File.read(‘input/tgma.obo’), ‘FBBT’ => File.read(‘input/fbbt.obo’) }, :properties => [‘is_a’, ‘part_of’]).
-
.dump_comparison_by_id(cutoff = 0, files = []) ⇒ String
Summarizes labels used by id in a two column tab delimited format Providing a cutoff will report only those ids/labels with > 1 label per id Does not (yet) include reference to synonyms, this could be easily extended.
-
.hashify_pairs(options = {}) ⇒ Hash
Takes a two column input file, references it to two ontologies, and returns a hash == Example use file = File.read(‘HAO_TGMA_list.txt’) col1_obo = File.read(‘hao.obo’) col2_obo = File.read(‘tgma.obo’) OboParser::Utilities.hashify_pairs(:data => file, :col1_obo => col1_obo, :col2_obo => col2_obo).
-
.homolonto_stanza(id, name, *members) ⇒ String
Returns a HomolOnto Stanza.
-
.parents(options = {}) ⇒ Hash
Takes a two column input file, references it to two ontologies, and returns a report that identifies data pairs that have parents who are also a data pair given a provided property/relation type.
-
.shared_labels(files = []) ⇒ String
Returns all labels found in all passed ontologies.
-
.term_stanza_from_file(id, file) ⇒ String
Given a Term id and a String representing an OBO file returns that stanza.
Class Method Details
.column_translate(options = {}) ⇒ String
Takes a two column input file, references it to two ontologies, and provides a report.
Example use
file = File.read('HAO_TGMA_list.txt')
col1_obo = File.read('hao.obo')
col2_obo = File.read('tgma.obo')
OboParser::Utilities.column_translate(:data => file, :col1_obo => col1_obo, :col2_obo => col2_obo, :output => :homolonto)
Output types
There are several output report types
:xls - Translates the columns in the data_file to the option passed in :translate_to, the first matching against col1_obo, the second against col2_obo. Returns an Excel file.
:homolonto - Generates a homolonto compatible file to STDOUT
:cols - Prints a two column format to STDOUT
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 |
# File 'lib/utilities.rb', line 126 def self.column_translate( = {}) opt = { :data => nil, :col1_obo => nil, :col2_obo => nil, :translate_to => :id, # also :label :output => :cols, # also :xls, :homolonto, :parent_match :parent_match_to => :is_a, # only used when :output == :parent_match :output_filename => 'foo', :index_start => 0 }.merge!() c1obo = parse_obo_file(opt[:col1_obo]) c2obo = parse_obo_file(opt[:col2_obo]) case opt[:output] when :xls Spreadsheet.client_encoding = 'UTF-8' book = Spreadsheet::Workbook.new sheet = book.create_worksheet when :homolonto s = HOMOLONTO_HEADER opt[:translate_to] = :id # force this in this mode end i = opt[:index_start] v1 = nil # a label like 'head' v2 = nil c1 = nil # an id 'FOO:123' c2 = nil opt[:data].split(/\n/).each do |row| i += 1 c1, c2 = row.split(/\t/).map(&:strip) if c1.nil? || c2.nil? puts next end # the conversion if opt[:translate_to] == :id if c1 =~ /.*\:.*/ # it's an id, leave it v1 = c1 else v1 = c1obo.term_hash[c1] end if c2 =~ /.*\:.*/ v2 = c2 else v2 = c2obo.term_hash[c2] end else if c1 =~ /.*\:.*/ v1 = c1obo.id_hash[c1] else v1 = c1 end if c2 =~ /.*\:.*/ v2 = c2obo.id_hash[c2] else v2 = c2 end end case opt[:output] when :cols puts "#{v1}\t#{v2}" when :xls sheet[i,0] = v1 sheet[i,1] = OboParser::Utilities.term_stanza_from_file(v1, opt[:col1_obo]) sheet[i,2] = v2 sheet[i,3] = OboParser::Utilities.term_stanza_from_file(v2, opt[:col2_obo]) when :homolonto s << OboParser::Utilities.homolonto_stanza(i, c1obo.id_hash[v1] , v1, v2) # "#{c1obo.id_hash[v1]} ! #{c2obo.id_hash[v2]}" s << "\n\n" end end case opt[:output] when :xls book.write "#{opt[:output_filename]}.xls" when :homolonto puts s + "\n" end true end |
.cytoscapify(options = {}) ⇒ Object
Takes a Hash of OBO ontology files, an Array of relationships, and writes two input files (a network, and node properties) for Cytoscape
Example use
OboParser::Utilities.cytoscapify(:ontologies => {‘HAO’ => File.read(‘input/hao.obo’), ‘TADS’ => File.read(‘input/tads.obo’), ‘TGMA’ => File.read(‘input/tgma.obo’), ‘FBBT’ => File.read(‘input/fbbt.obo’) }, :properties => [‘is_a’, ‘part_of’])
TODO: @return File1, File2, Filen
301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 |
# File 'lib/utilities.rb', line 301 def self.cytoscapify( = {}) opt = { :ontologies => {}, :properties => [] }.merge!() return false if opt[:properties].empty? return false if opt[:ontologies].empty? nodes = File.new("nodes.tab", "w+") edges = File.new("edges.eda", "w+") opt[:ontologies].keys.each do |k| obo_file = parse_obo_file(opt[:ontologies][k]) obo_file.terms.each do |t| nodes.puts [t.id.value, t.name.value, k].join("\t") + "\n" t.relationships.each do |rel, id| edges.puts [t.id.value, "(#{rel})", id].join("\t") + "\n" if opt[:properties].include?(rel) end end end nodes.close edges.close true end |
.dump_comparison_by_id(cutoff = 0, files = []) ⇒ String
Summarizes labels used by id in a two column tab delimited format Providing a cutoff will report only those ids/labels with > 1 label per id Does not (yet) include reference to synonyms, this could be easily extended.
Example use
of1 = File.read(‘foo1.obo’) of2 = File.read(‘foo2.obo’) of3 = File.read(‘foo3.obo’) of4 = File.read(‘foo4.obo’)
OboParser::Utilities.dump_comparison_by_id(0,[of1, of2, of3, of4])
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
# File 'lib/utilities.rb', line 22 def self.dump_comparison_by_id(cutoff = 0, files = []) return '' if files.size < 1 of = [] files.each_with_index do |f, i| of[i] = parse_obo_file(f) end all_data = {} of.each do |f| tmp_hash = f.id_hash tmp_hash.keys.each do |id| if all_data[id] all_data[id].push(tmp_hash[id]) else all_data[id] = [tmp_hash[id]] end end end all_data.keys.sort.each do |k| if all_data[k].uniq.size > cutoff puts "#{k}\t#{all_data[k].uniq.join(', ')}" end end end |
.hashify_pairs(options = {}) ⇒ Hash
Takes a two column input file, references it to two ontologies, and returns a hash
Example use
file = File.read('HAO_TGMA_list.txt')
col1_obo = File.read('hao.obo')
col2_obo = File.read('tgma.obo')
OboParser::Utilities.hashify_pairs(:data => file, :col1_obo => col1_obo, :col2_obo => col2_obo)
229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 |
# File 'lib/utilities.rb', line 229 def self.hashify_pairs( = {}) opt = { :data => nil, :col1_obo => nil, :col2_obo => nil, }.merge!() c1obo = parse_obo_file(opt[:col1_obo]) c2obo = parse_obo_file(opt[:col2_obo]) hash = Hash.new i = opt[:index_start] v1 = nil # a label like 'head' v2 = nil c1 = nil # an id 'FOO:123' c2 = nil opt[:data].split(/\n/).each do |row| i += 1 c1, c2 = row.split(/\t/).map(&:strip) if c1.nil? || c2.nil? next end # the conversion if c1 =~ /.*\:.*/ # it's an id, leave it v1 = c1 else v1 = c1obo.term_hash[c1] end if c2 =~ /.*\:.*/ v2 = c2 else v2 = c2obo.term_hash[c2] end hash.merge!(c1 => c2) end return hash end |
.homolonto_stanza(id, name, *members) ⇒ String
Returns a HomolOnto Stanza
280 281 282 283 284 285 286 287 288 289 290 |
# File 'lib/utilities.rb', line 280 def self.homolonto_stanza(id, name, *members) return 'NOT ENOUGH RELATIONSHIPS' if members.length < 2 s = [] s << '[Term]' s << "id: HOG:#{id}" s << "name: #{name}" members.each do |m| s << "relationship: has_member #{m}" end s.join("\n") end |
.parents(options = {}) ⇒ Hash
Takes a two column input file, references it to two ontologies, and returns a report that identifies data pairs that have parents who are also a data pair given a provided property/relation type.
Example use
file = File.read('HAO_TGMA_list.txt')
col1_obo = File.read('hao.obo')
col2_obo = File.read('tgma.obo')
foo = OboParser::Utilities.parents(:data => data, :col1_obo => col1_obo, :col2_obo => col2_obo, :property => ‘is_a’)
puts “– NO (#.size)n” puts foo.join(“n”) puts “– YES (#.size)n” puts foo.join(“n”)
356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 |
# File 'lib/utilities.rb', line 356 def self.parents( = {}) opt = { :data => nil, :col1_obo => nil, :col2_obo => nil, :property => nil }.merge!() return false if opt[:property].nil? c1obo = parse_obo_file(opt[:col1_obo]) c2obo = parse_obo_file(opt[:col2_obo]) result = {:yes => [], :no => [], :unplaced => []} # update hash = hashify_pairs(:data => opt[:data], :col1_obo => opt[:col1_obo], :col2_obo => opt[:col2_obo]) obo1_hash = c1obo.id_index obo2_hash = c2obo.id_index hash.keys.each do |k| a = k b = hash[a] ids_1 = [] ids_2 = [] if !obo1_hash[a] puts "can't find #{k}\n" next end if !obo2_hash[b] puts "can't find #{k}\n" next end obo1_hash[a].relationships.each do |rel, id| if rel == opt[:property] ids_1.push id end end obo2_hash[b].relationships.each do |rel, id| if rel == opt[:property] ids_2.push id end end unplaced = true ids_1.each do |c| ids_2.each do |d| t = "#{a} -> #{b}" if hash[c] == d result[:yes].push(t) unplaced = false next # don't add again after we find a hit else result[:no].push(t) unplaced = false end end end result[:unplaced] end result end |
.shared_labels(files = []) ⇒ String
Returns all labels found in all passed ontologies. Does not yet include synonyms.
Example use
of1 = File.read('fly_anatomy.obo')
of2 = File.read('hao.obo')
of3 = File.read('mosquito_anatomy.obo')
OboParser::Utilities.shared_labels([of1, of3])
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
# File 'lib/utilities.rb', line 61 def self.shared_labels(files = []) comparison = {} files.each do |f| o = parse_obo_file(f) o.term_hash.keys.each do |k| tmp = k.gsub(/adult/, "").strip tmp = k.gsub(/embryonic\/larval/, "").strip if comparison[tmp] comparison[tmp] += 1 else comparison.merge!(tmp => 1) end end end match = [] comparison.keys.each do |k| if comparison[k] == files.size match.push k end end puts match.sort.join("\n") puts "\n#{match.length} total." end |
.term_stanza_from_file(id, file) ⇒ String
Given a Term id and a String representing an OBO file returns that stanza.
436 437 438 439 440 441 |
# File 'lib/utilities.rb', line 436 def self.term_stanza_from_file(id, file) foo = "" file =~ /(^\[Term\]\s*?id:\s*?#{id}.*?)(^\[Term\]|^\[Typedef\])/im foo = $1 if !$1.nil? foo.gsub(/\n\r/,"\n") end |