Class: OpenTox::Dataset
Overview
Ruby wrapper for OpenTox Dataset Webservices (opentox.org/dev/apis/api-1.2/dataset).
Direct Known Subclasses
Instance Attribute Summary collapse
-
#compounds ⇒ Object
readonly
Returns the value of attribute compounds.
-
#data_entries ⇒ Object
readonly
Returns the value of attribute data_entries.
-
#features ⇒ Object
readonly
Returns the value of attribute features.
-
#metadata ⇒ Object
readonly
Returns the value of attribute metadata.
Attributes included from OpenTox
Class Method Summary collapse
-
.all(uri = CONFIG[:services], subjectid = nil) ⇒ Array
Get all datasets from a service.
-
.create(uri = CONFIG[:services], subjectid = nil) ⇒ OpenTox::Dataset
Create an empty dataset and save it at the dataset service (assigns URI to dataset).
-
.create_from_csv_file(file, subjectid = nil) ⇒ OpenTox::Dataset
Create dataset from CSV file (format specification: toxcreate.org/help) - loads data_entries, compounds, features - sets metadata (warnings) for parser errors - you will have to set remaining metadata manually.
-
.exist?(uri, subjectid = nil) ⇒ Boolean
replaces find as exist check, takes not as long, does NOT raise an un-authorized exception.
-
.find(uri, subjectid = nil) ⇒ OpenTox::Dataset
Find a dataset and load all data.
- .from_json(json, subjectid = nil) ⇒ Object
-
.merge(dataset1, dataset2, metadata, subjectid = nil, features1 = nil, features2 = nil, compounds1 = nil, compounds2 = nil) ⇒ Object
merges two dataset into a new dataset (by default uses all compounds and features) precondition: both datasets are fully loaded example: if you want no features from dataset2, give empty array as features2.
Instance Method Summary collapse
-
#accept_values(feature) ⇒ Array
returns the accept_values of a feature, i.e.
-
#add(compound, feature, value) ⇒ Object
Insert a statement (compound_uri,feature_uri,value).
-
#add_compound(compound) ⇒ Object
Add a new compound.
-
#add_feature(feature, metadata = {}) ⇒ Object
Add a feature.
-
#add_feature_metadata(feature, metadata) ⇒ Object
Add/modify metadata for a feature.
-
#add_metadata(metadata) ⇒ Object
Add/modify metadata, existing entries will be overwritten.
-
#complete_data_entries ⇒ Object
Complete feature values by adding zeroes.
-
#copy_hash(hash) ⇒ Object
Copy a hash (eg. from JSON) into a dataset (rewrites URI).
-
#delete(subjectid = nil) ⇒ Object
Delete dataset at the dataset service.
-
#feature_name(feature) ⇒ String
Get name (DC.title) of a feature.
-
#feature_type(subjectid = nil) ⇒ String
Detect feature type(s) in the dataset.
-
#initialize(uri = nil, subjectid = nil) ⇒ OpenTox::Dataset
constructor
Create dataset with optional URI.
-
#load_all(subjectid = nil) ⇒ Object
Load all data (metadata, data_entries, compounds and features) from URI.
-
#load_compounds(subjectid = nil) ⇒ Array
Load and return only compound URIs from the dataset service.
-
#load_csv(csv, subjectid = nil) ⇒ OpenTox::Dataset
Load CSV string (format specification: toxcreate.org/help) - loads data_entries, compounds, features - sets metadata (warnings) for parser errors - you will have to set remaining metadata manually.
-
#load_features(subjectid = nil) ⇒ Hash
Load and return only features from the dataset service.
- #load_json(json) ⇒ Object
-
#load_metadata(subjectid = nil) ⇒ Hash
Load and return only metadata of a Dataset object.
- #load_rdfxml(rdfxml, subjectid = nil) ⇒ Object
-
#load_rdfxml_file(file, subjectid = nil) ⇒ OpenTox::Dataset
Load RDF/XML representation from a file.
- #load_sdf(sdf, subjectid = nil) ⇒ Object
-
#load_spreadsheet(book, subjectid = nil) ⇒ OpenTox::Dataset
Load Spreadsheet book (created with roo gem roo.rubyforge.org/, excel format specification: toxcreate.org/help) - loads data_entries, compounds, features - sets metadata (warnings) for parser errors - you will have to set remaining metadata manually.
-
#load_yaml(yaml) ⇒ OpenTox::Dataset
Load YAML representation into the dataset.
-
#save(subjectid = nil) ⇒ String
Save dataset at the dataset service - creates a new dataset if uri is not set - overwrites dataset if uri exists.
-
#split(compounds, features, metadata, subjectid = nil) ⇒ OpenTox::Dataset
Creates a new dataset, by splitting the current dataset, i.e.
- #title ⇒ Object
-
#to_csv ⇒ String
Get CSV string representation (data_entries only, metadata will be discarded).
- #to_json ⇒ Object
-
#to_ntriples ⇒ String
Get OWL-DL in ntriples format.
-
#to_rdfxml ⇒ String
Get OWL-DL in RDF/XML format.
-
#to_sdf ⇒ String
Get SDF representation of compounds.
-
#to_spreadsheet ⇒ Spreadsheet::Workbook
Get Spreadsheet representation.
- #to_urilist ⇒ Object
-
#to_xls ⇒ Spreadsheet::Workbook
Get Excel representation (alias for to_spreadsheet).
Methods included from OpenTox
Constructor Details
#initialize(uri = nil, subjectid = nil) ⇒ OpenTox::Dataset
Create dataset with optional URI. Does not load data into the dataset - you will need to execute one of the load_* methods to pull data from a service or to insert it from other representations.
17 18 19 20 21 22 |
# File 'lib/dataset.rb', line 17 def initialize(uri=nil,subjectid=nil) super uri @features = {} @compounds = [] @data_entries = {} end |
Instance Attribute Details
#compounds ⇒ Object (readonly)
Returns the value of attribute compounds.
8 9 10 |
# File 'lib/dataset.rb', line 8 def compounds @compounds end |
#data_entries ⇒ Object (readonly)
Returns the value of attribute data_entries.
8 9 10 |
# File 'lib/dataset.rb', line 8 def data_entries @data_entries end |
#features ⇒ Object (readonly)
Returns the value of attribute features.
8 9 10 |
# File 'lib/dataset.rb', line 8 def features @features end |
#metadata ⇒ Object (readonly)
Returns the value of attribute metadata.
8 9 10 |
# File 'lib/dataset.rb', line 8 def @metadata end |
Class Method Details
.all(uri = CONFIG[:services], subjectid = nil) ⇒ Array
Get all datasets from a service
82 83 84 |
# File 'lib/dataset.rb', line 82 def self.all(uri=CONFIG[:services]["opentox-dataset"], subjectid=nil) RestClientWrapper.get(uri,{:accept => "text/uri-list",:subjectid => subjectid}).to_s.each_line.collect{|u| Dataset.new(u.chomp, subjectid)} end |
.create(uri = CONFIG[:services], subjectid = nil) ⇒ OpenTox::Dataset
Create an empty dataset and save it at the dataset service (assigns URI to dataset)
29 30 31 32 33 |
# File 'lib/dataset.rb', line 29 def self.create(uri=CONFIG[:services]["opentox-dataset"], subjectid=nil) dataset = Dataset.new(nil,subjectid) dataset.save(subjectid) dataset end |
.create_from_csv_file(file, subjectid = nil) ⇒ OpenTox::Dataset
Create dataset from CSV file (format specification: toxcreate.org/help)
-
loads data_entries, compounds, features
-
sets metadata (warnings) for parser errors
-
you will have to set remaining metadata manually
41 42 43 44 45 46 47 48 |
# File 'lib/dataset.rb', line 41 def self.create_from_csv_file(file, subjectid=nil) dataset = Dataset.create(CONFIG[:services]["opentox-dataset"], subjectid) parser = Parser::Spreadsheets.new parser.dataset = dataset parser.load_csv(File.open(file).read) dataset.save(subjectid) dataset end |
.exist?(uri, subjectid = nil) ⇒ Boolean
replaces find as exist check, takes not as long, does NOT raise an un-authorized exception
69 70 71 72 73 74 75 76 77 |
# File 'lib/dataset.rb', line 69 def self.exist?(uri, subjectid=nil) return false unless uri dataset = Dataset.new(uri, subjectid) begin dataset.( subjectid ).size > 0 rescue false end end |
.find(uri, subjectid = nil) ⇒ OpenTox::Dataset
Find a dataset and load all data. This can be time consuming, use Dataset.new together with one of the load_* methods for a fine grained control over data loading.
59 60 61 62 63 64 |
# File 'lib/dataset.rb', line 59 def self.find(uri, subjectid=nil) return nil unless uri dataset = Dataset.new(uri, subjectid) dataset.load_all(subjectid) dataset end |
.from_json(json, subjectid = nil) ⇒ Object
50 51 52 53 54 |
# File 'lib/dataset.rb', line 50 def self.from_json(json, subjectid=nil) dataset = Dataset.new(nil,subjectid) dataset.copy_hash Yajl::Parser.parse(json) dataset end |
.merge(dataset1, dataset2, metadata, subjectid = nil, features1 = nil, features2 = nil, compounds1 = nil, compounds2 = nil) ⇒ Object
merges two dataset into a new dataset (by default uses all compounds and features) precondition: both datasets are fully loaded example: if you want no features from dataset2, give empty array as features2
388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 |
# File 'lib/dataset.rb', line 388 def self.merge( dataset1, dataset2, , subjectid=nil, features1=nil, features2=nil, compounds1=nil, compounds2=nil ) features1 = dataset1.features.keys unless features1 features2 = dataset2.features.keys unless features2 compounds1 = dataset1.compounds unless compounds1 compounds2 = dataset2.compounds unless compounds2 data_combined = OpenTox::Dataset.create(CONFIG[:services]["opentox-dataset"],subjectid) LOGGER.debug("merging datasets #{dataset1.uri} and #{dataset2.uri} to #{data_combined.uri}") [[dataset1, features1, compounds1], [dataset2, features2, compounds2]].each do |dataset,features,compounds| compounds.each{|c| data_combined.add_compound(c)} features.each do |f| m = dataset.features[f] m[OT.hasSource] = dataset.uri unless m[OT.hasSource] data_combined.add_feature(f,m) compounds.each do |c| dataset.data_entries[c][f].each do |v| data_combined.add(c,f,v) end if dataset.data_entries[c] and dataset.data_entries[c][f] end end end = {} unless [OT.hasSource] = "Merge from #{dataset1.uri} and #{dataset2.uri}" unless [OT.hasSource] data_combined.() data_combined.save(subjectid) data_combined end |
Instance Method Details
#accept_values(feature) ⇒ Array
returns the accept_values of a feature, i.e. the classification domain / all possible feature values
194 195 196 197 198 |
# File 'lib/dataset.rb', line 194 def accept_values(feature) accept_values = features[feature][OT.acceptValue] accept_values.sort if accept_values accept_values end |
#add(compound, feature, value) ⇒ Object
Insert a statement (compound_uri,feature_uri,value)
295 296 297 298 299 300 301 |
# File 'lib/dataset.rb', line 295 def add (compound,feature,value) @compounds << compound unless @compounds.include? compound @features[feature] = {} unless @features[feature] @data_entries[compound] = {} unless @data_entries[compound] @data_entries[compound][feature] = [] unless @data_entries[compound][feature] @data_entries[compound][feature] << value if value!=nil end |
#add_compound(compound) ⇒ Object
Add a new compound
337 338 339 |
# File 'lib/dataset.rb', line 337 def add_compound (compound) @compounds << compound unless @compounds.include? compound end |
#add_feature(feature, metadata = {}) ⇒ Object
Add a feature
314 315 316 |
# File 'lib/dataset.rb', line 314 def add_feature(feature,={}) @features[feature] = end |
#add_feature_metadata(feature, metadata) ⇒ Object
Add/modify metadata for a feature
331 332 333 |
# File 'lib/dataset.rb', line 331 def (feature,) .each { |k,v| @features[feature][k] = v } end |
#add_metadata(metadata) ⇒ Object
Add/modify metadata, existing entries will be overwritten
307 308 309 |
# File 'lib/dataset.rb', line 307 def () .each { |k,v| @metadata[k] = v } end |
#complete_data_entries ⇒ Object
Complete feature values by adding zeroes
319 320 321 322 323 324 325 326 |
# File 'lib/dataset.rb', line 319 def complete_data_entries all_features = @features.keys @data_entries.each { |c, e| (Set.new(all_features.collect)).subtract(Set.new e.keys).to_a.each { |f| self.add(c,f,0) } } end |
#copy_hash(hash) ⇒ Object
Copy a hash (eg. from JSON) into a dataset (rewrites URI)
445 446 447 448 449 450 451 452 453 454 455 |
# File 'lib/dataset.rb', line 445 def copy_hash(hash) @metadata = hash["metadata"] @data_entries = hash["data_entries"] @compounds = hash["compounds"] @features = hash["features"] if @uri self.uri = @uri else @uri = hash["metadata"][XSD.anyURI] end end |
#delete(subjectid = nil) ⇒ Object
Delete dataset at the dataset service
440 441 442 |
# File 'lib/dataset.rb', line 440 def delete(subjectid=nil) RestClientWrapper.delete(@uri, :subjectid => subjectid) end |
#feature_name(feature) ⇒ String
Get name (DC.title) of a feature
281 282 283 |
# File 'lib/dataset.rb', line 281 def feature_name(feature) @features[feature][DC.title] end |
#feature_type(subjectid = nil) ⇒ String
Detect feature type(s) in the dataset
202 203 204 205 206 207 208 209 210 211 212 |
# File 'lib/dataset.rb', line 202 def feature_type(subjectid=nil) load_features(subjectid) feature_types = @features.collect{|f,| [RDF.type]}.flatten.uniq if feature_types.include?(OT.NominalFeature) "classification" elsif feature_types.include?(OT.NumericFeature) "regression" else "unknown" end end |
#load_all(subjectid = nil) ⇒ Object
Load all data (metadata, data_entries, compounds and features) from URI
157 158 159 160 161 162 163 164 |
# File 'lib/dataset.rb', line 157 def load_all(subjectid=nil) if (CONFIG[:json_hosts].include?(URI.parse(@uri).host)) copy_hash Yajl::Parser.parse(RestClientWrapper.get(@uri, {:accept => "application/json", :subjectid => subjectid})) else parser = Parser::Owl::Dataset.new(@uri, subjectid) copy parser.load_uri(subjectid) end end |
#load_compounds(subjectid = nil) ⇒ Array
Load and return only compound URIs from the dataset service
168 169 170 171 172 173 174 175 176 177 |
# File 'lib/dataset.rb', line 168 def load_compounds(subjectid=nil) # fix for datasets like http://apps.ideaconsult.net:8080/ambit2/dataset/272?max=50 u = URI::parse(uri) u.path = File.join(u.path,"compounds") u = u.to_s RestClientWrapper.get(u,{:accept=> "text/uri-list", :subjectid => subjectid}).to_s.each_line do |compound_uri| @compounds << compound_uri.chomp end @compounds.uniq! end |
#load_csv(csv, subjectid = nil) ⇒ OpenTox::Dataset
Load CSV string (format specification: toxcreate.org/help)
-
loads data_entries, compounds, features
-
sets metadata (warnings) for parser errors
-
you will have to set remaining metadata manually
128 129 130 131 132 133 |
# File 'lib/dataset.rb', line 128 def load_csv(csv, subjectid=nil) save(subjectid) unless @uri # get a uri for creating features parser = Parser::Spreadsheets.new parser.dataset = self parser.load_csv(csv) end |
#load_features(subjectid = nil) ⇒ Hash
Load and return only features from the dataset service
181 182 183 184 185 186 187 188 189 |
# File 'lib/dataset.rb', line 181 def load_features(subjectid=nil) if (CONFIG[:json_hosts].include?(URI.parse(@uri).host)) @features = Yajl::Parser.parse(RestClientWrapper.get(File.join(@uri,"features"), {:accept => "application/json", :subjectid => subjectid})) else parser = Parser::Owl::Dataset.new(@uri, subjectid) @features = parser.load_features(subjectid) end @features end |
#load_json(json) ⇒ Object
93 94 95 |
# File 'lib/dataset.rb', line 93 def load_json(json) copy_hash Yajl::Parser.parse(json) end |
#load_metadata(subjectid = nil) ⇒ Hash
Load and return only metadata of a Dataset object
150 151 152 153 154 |
# File 'lib/dataset.rb', line 150 def (subjectid=nil) Parser::Owl::Dataset.new(@uri, subjectid).(subjectid) self.uri = @uri if @uri # keep uri @metadata end |
#load_rdfxml(rdfxml, subjectid = nil) ⇒ Object
97 98 99 100 101 102 103 104 |
# File 'lib/dataset.rb', line 97 def load_rdfxml(rdfxml, subjectid=nil) raise "rdfxml data is empty" if rdfxml.to_s.size==0 file = Tempfile.new("ot-rdfxml") file.puts rdfxml file.close load_rdfxml_file file, subjectid file.delete end |
#load_rdfxml_file(file, subjectid = nil) ⇒ OpenTox::Dataset
Load RDF/XML representation from a file
109 110 111 112 113 |
# File 'lib/dataset.rb', line 109 def load_rdfxml_file(file, subjectid=nil) parser = Parser::Owl::Dataset.new @uri, subjectid parser.uri = file.path copy parser.load_uri(subjectid) end |
#load_sdf(sdf, subjectid = nil) ⇒ Object
115 116 117 118 119 120 |
# File 'lib/dataset.rb', line 115 def load_sdf(sdf,subjectid=nil) save(subjectid) unless @uri # get a uri for creating features parser = Parser::Sdf.new parser.dataset = self parser.load_sdf(sdf) end |
#load_spreadsheet(book, subjectid = nil) ⇒ OpenTox::Dataset
Load Spreadsheet book (created with roo gem roo.rubyforge.org/, excel format specification: toxcreate.org/help)
-
loads data_entries, compounds, features
-
sets metadata (warnings) for parser errors
-
you will have to set remaining metadata manually
141 142 143 144 145 146 |
# File 'lib/dataset.rb', line 141 def load_spreadsheet(book, subjectid=nil) save(subjectid) unless @uri # get a uri for creating features parser = Parser::Spreadsheets.new parser.dataset = self parser.load_spreadsheet(book) end |
#load_yaml(yaml) ⇒ OpenTox::Dataset
Load YAML representation into the dataset
89 90 91 |
# File 'lib/dataset.rb', line 89 def load_yaml(yaml) copy YAML.load(yaml) end |
#save(subjectid = nil) ⇒ String
Save dataset at the dataset service
-
creates a new dataset if uri is not set
-
overwrites dataset if uri exists
419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 |
# File 'lib/dataset.rb', line 419 def save(subjectid=nil) # TODO: rewrite feature URI's ?? @compounds.uniq! if @uri if (CONFIG[:json_hosts].include?(URI.parse(@uri).host)) #LOGGER.debug self.to_json RestClientWrapper.post(@uri,self.to_json,{:content_type => "application/json", :subjectid => subjectid}) else File.open("ot-post-file.rdf","w+") { |f| f.write(self.to_rdfxml); @path = f.path } task_uri = RestClient.post(@uri, {:file => File.new(@path)},{:accept => "text/uri-list" , :subjectid => subjectid}).to_s.chomp Task.find(task_uri).wait_for_completion self.uri = RestClientWrapper.get(task_uri,{:accept => 'text/uri-list', :subjectid => subjectid}) end else # create dataset if uri is empty self.uri = RestClientWrapper.post(CONFIG[:services]["opentox-dataset"],{:subjectid => subjectid}).to_s.chomp end @uri end |
#split(compounds, features, metadata, subjectid = nil) ⇒ OpenTox::Dataset
Creates a new dataset, by splitting the current dataset, i.e. using only a subset of compounds and features
347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 |
# File 'lib/dataset.rb', line 347 def split( compounds, features, , subjectid=nil) LOGGER.debug "split dataset using "+compounds.size.to_s+"/"+@compounds.size.to_s+" compounds" raise "no new compounds selected" unless compounds and compounds.size>0 dataset = OpenTox::Dataset.create(CONFIG[:services]["opentox-dataset"],subjectid) if features.size==0 compounds.each{ |c| dataset.add_compound(c) } else compounds.each do |c| features.each do |f| if @data_entries[c]==nil or @data_entries[c][f]==nil dataset.add(c,f,nil) else @data_entries[c][f].each do |v| dataset.add(c,f,v) end end end end end # set feature metadata in new dataset accordingly (including accept values) features.each do |f| self.features[f].each do |k,v| dataset.features[f][k] = v end end dataset.() dataset.save(subjectid) dataset end |
#title ⇒ Object
285 286 287 |
# File 'lib/dataset.rb', line 285 def title @metadata[DC.title] end |
#to_csv ⇒ String
Get CSV string representation (data_entries only, metadata will be discarded)
234 235 236 |
# File 'lib/dataset.rb', line 234 def to_csv Serializer::Spreadsheets.new(self).to_csv end |
#to_json ⇒ Object
216 217 218 |
# File 'lib/dataset.rb', line 216 def to_json Yajl::Encoder.encode({:uri => @uri, :metadata => @metadata, :data_entries => @data_entries, :compounds => @compounds, :features => @features}) end |
#to_ntriples ⇒ String
Get OWL-DL in ntriples format
240 241 242 243 244 |
# File 'lib/dataset.rb', line 240 def to_ntriples s = Serializer::Owl.new s.add_dataset(self) s.to_ntriples end |
#to_rdfxml ⇒ String
Get OWL-DL in RDF/XML format
248 249 250 251 252 |
# File 'lib/dataset.rb', line 248 def to_rdfxml s = Serializer::Owl.new s.add_dataset(self) s.to_rdfxml end |
#to_sdf ⇒ String
Get SDF representation of compounds
256 257 258 259 260 261 262 263 264 265 266 267 268 269 |
# File 'lib/dataset.rb', line 256 def to_sdf sum="" @compounds.each{ |c| sum << OpenTox::Compound.new(c).to_inchi sum << OpenTox::Compound.new(c).to_sdf.sub(/\n\$\$\$\$/,'') @data_entries[c].each{ |f,v| sum << "> <\"#{f}\">\n" sum << v.join(", ") sum << "\n\n" } sum << "$$$$\n" } sum end |
#to_spreadsheet ⇒ Spreadsheet::Workbook
Get Spreadsheet representation
222 223 224 |
# File 'lib/dataset.rb', line 222 def to_spreadsheet Serializer::Spreadsheets.new(self).to_spreadsheet end |
#to_urilist ⇒ Object
271 272 273 274 275 276 |
# File 'lib/dataset.rb', line 271 def to_urilist @compounds.inject { |sum, c| sum << OpenTox::Compound.new(c).uri sum + "\n" } end |
#to_xls ⇒ Spreadsheet::Workbook
Get Excel representation (alias for to_spreadsheet)
228 229 230 |
# File 'lib/dataset.rb', line 228 def to_xls to_spreadsheet end |