Class: OpenTox::Dataset

Inherits:
Object
  • Object
show all
Includes:
OpenTox
Defined in:
lib/dataset.rb

Overview

Ruby wrapper for OpenTox Dataset Webservices (opentox.org/dev/apis/api-1.2/dataset).

Direct Known Subclasses

LazarPrediction

Instance Attribute Summary collapse

Attributes included from OpenTox

#uri

Class Method Summary collapse

Instance Method Summary collapse

Methods included from OpenTox

sign_in, text_to_html

Constructor Details

#initialize(uri = nil, subjectid = nil) ⇒ OpenTox::Dataset

Create dataset with optional URI. Does not load data into the dataset - you will need to execute one of the load_* methods to pull data from a service or to insert it from other representations.

Examples:

Create an empty dataset

dataset = OpenTox::Dataset.new

Create an empty dataset with URI

dataset = OpenTox::Dataset.new("http:://webservices.in-silico/ch/dataset/1")

Parameters:

  • uri (optional, String) (defaults to: nil)

    Dataset URI



17
18
19
20
21
22
# File 'lib/dataset.rb', line 17

def initialize(uri=nil,subjectid=nil)
  super uri
  @features = {}
  @compounds = []
  @data_entries = {}
end

Instance Attribute Details

#compoundsObject (readonly)

Returns the value of attribute compounds.



8
9
10
# File 'lib/dataset.rb', line 8

def compounds
  @compounds
end

#data_entriesObject (readonly)

Returns the value of attribute data_entries.



8
9
10
# File 'lib/dataset.rb', line 8

def data_entries
  @data_entries
end

#featuresObject (readonly)

Returns the value of attribute features.



8
9
10
# File 'lib/dataset.rb', line 8

def features
  @features
end

#metadataObject (readonly)

Returns the value of attribute metadata.



8
9
10
# File 'lib/dataset.rb', line 8

def 
  @metadata
end

Class Method Details

.all(uri = CONFIG[:services], subjectid = nil) ⇒ Array

Get all datasets from a service

Parameters:

  • uri (optional, String) (defaults to: CONFIG[:services])

    URI of the dataset service, defaults to service specified in configuration

Returns:

  • (Array)

    Array of dataset object without data (use one of the load_* methods to pull data from the server)



82
83
84
# File 'lib/dataset.rb', line 82

def self.all(uri=CONFIG[:services]["opentox-dataset"], subjectid=nil)
  RestClientWrapper.get(uri,{:accept => "text/uri-list",:subjectid => subjectid}).to_s.each_line.collect{|u| Dataset.new(u.chomp, subjectid)}
end

.create(uri = CONFIG[:services], subjectid = nil) ⇒ OpenTox::Dataset

Create an empty dataset and save it at the dataset service (assigns URI to dataset)

Examples:

Create new dataset and save it to obtain a URI

dataset = OpenTox::Dataset.create

Parameters:

  • uri (optional, String) (defaults to: CONFIG[:services])

    Dataset URI

Returns:



29
30
31
32
33
# File 'lib/dataset.rb', line 29

def self.create(uri=CONFIG[:services]["opentox-dataset"], subjectid=nil)
  dataset = Dataset.new(nil,subjectid)
  dataset.save(subjectid)
  dataset
end

.create_from_csv_file(file, subjectid = nil) ⇒ OpenTox::Dataset

Create dataset from CSV file (format specification: toxcreate.org/help)

  • loads data_entries, compounds, features

  • sets metadata (warnings) for parser errors

  • you will have to set remaining metadata manually

Parameters:

  • file (String)

    CSV file path

Returns:



41
42
43
44
45
46
47
48
# File 'lib/dataset.rb', line 41

def self.create_from_csv_file(file, subjectid=nil) 
  dataset = Dataset.create(CONFIG[:services]["opentox-dataset"], subjectid)
  parser = Parser::Spreadsheets.new
  parser.dataset = dataset
  parser.load_csv(File.open(file).read)
  dataset.save(subjectid)
  dataset
end

.exist?(uri, subjectid = nil) ⇒ Boolean

replaces find as exist check, takes not as long, does NOT raise an un-authorized exception

Parameters:

  • uri (String)

    Dataset URI

Returns:

  • (Boolean)

    true if dataset exists and user has get rights, false else



69
70
71
72
73
74
75
76
77
# File 'lib/dataset.rb', line 69

def self.exist?(uri, subjectid=nil)
  return false unless uri
  dataset = Dataset.new(uri, subjectid)
  begin
    dataset.( subjectid ).size > 0
  rescue
    false
  end
end

.find(uri, subjectid = nil) ⇒ OpenTox::Dataset

Find a dataset and load all data. This can be time consuming, use Dataset.new together with one of the load_* methods for a fine grained control over data loading.

Parameters:

  • uri (String)

    Dataset URI

Returns:



59
60
61
62
63
64
# File 'lib/dataset.rb', line 59

def self.find(uri, subjectid=nil)
  return nil unless uri
  dataset = Dataset.new(uri, subjectid)
  dataset.load_all(subjectid)
  dataset
end

.from_json(json, subjectid = nil) ⇒ Object



50
51
52
53
54
# File 'lib/dataset.rb', line 50

def self.from_json(json, subjectid=nil) 
  dataset = Dataset.new(nil,subjectid)
  dataset.copy_hash Yajl::Parser.parse(json)
  dataset
end

.merge(dataset1, dataset2, metadata, subjectid = nil, features1 = nil, features2 = nil, compounds1 = nil, compounds2 = nil) ⇒ Object

merges two dataset into a new dataset (by default uses all compounds and features) precondition: both datasets are fully loaded example: if you want no features from dataset2, give empty array as features2

Parameters:

  • dataset1 (OpenTox::Dataset)

    to merge

  • dataset2 (OpenTox::Dataset)

    to merge

  • metadata (Hash)
  • subjectid (optional, String) (defaults to: nil)
  • features1, (optional, Array)

    if specified only this features of dataset1 are used

  • features2, (optional, Array)

    if specified only this features of dataset2 are used

  • compounds1, (optional, Array)

    if specified only this compounds of dataset1 are used

  • compounds2, (optional, Array)

    if specified only this compounds of dataset2 are used



388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
# File 'lib/dataset.rb', line 388

def self.merge( dataset1, dataset2, , subjectid=nil, features1=nil, features2=nil, compounds1=nil, compounds2=nil )
  features1 = dataset1.features.keys unless features1
  features2 = dataset2.features.keys unless features2
  compounds1 = dataset1.compounds unless compounds1
  compounds2 = dataset2.compounds unless compounds2
  data_combined = OpenTox::Dataset.create(CONFIG[:services]["opentox-dataset"],subjectid)
  LOGGER.debug("merging datasets #{dataset1.uri} and #{dataset2.uri} to #{data_combined.uri}")
  [[dataset1, features1, compounds1], [dataset2, features2, compounds2]].each do |dataset,features,compounds|
    compounds.each{|c| data_combined.add_compound(c)}
    features.each do |f|
      m = dataset.features[f]
      m[OT.hasSource] = dataset.uri unless m[OT.hasSource]
      data_combined.add_feature(f,m)
      compounds.each do |c|
        dataset.data_entries[c][f].each do |v|
          data_combined.add(c,f,v)
        end if dataset.data_entries[c] and dataset.data_entries[c][f]
      end
    end
  end
   = {} unless 
  [OT.hasSource] = "Merge from #{dataset1.uri} and #{dataset2.uri}" unless [OT.hasSource]
  data_combined.()
  data_combined.save(subjectid)
  data_combined
end

Instance Method Details

#accept_values(feature) ⇒ Array

returns the accept_values of a feature, i.e. the classification domain / all possible feature values

Parameters:

  • feature (String)

    the URI of the feature

Returns:

  • (Array)

    return array with strings, nil if value is not set (e.g. when feature is numeric)



194
195
196
197
198
# File 'lib/dataset.rb', line 194

def accept_values(feature)
  accept_values = features[feature][OT.acceptValue]
  accept_values.sort if accept_values
  accept_values
end

#add(compound, feature, value) ⇒ Object

Insert a statement (compound_uri,feature_uri,value)

Examples:

Insert a statement (compound_uri,feature_uri,value)

dataset.add "http://webservices.in-silico.ch/compound/InChI=1S/C6Cl6/c7-1-2(8)4(10)6(12)5(11)3(1)9", "http://webservices.in-silico.ch/dataset/1/feature/hamster_carcinogenicity", 1

Parameters:

  • compound (String)

    Compound URI

  • feature (String)

    Compound URI

  • value (Boolean, Float)

    Feature value



295
296
297
298
299
300
301
# File 'lib/dataset.rb', line 295

def add (compound,feature,value)
  @compounds << compound unless @compounds.include? compound
  @features[feature] = {}  unless @features[feature]
  @data_entries[compound] = {} unless @data_entries[compound]
  @data_entries[compound][feature] = [] unless @data_entries[compound][feature]
  @data_entries[compound][feature] << value if value!=nil
end

#add_compound(compound) ⇒ Object

Add a new compound

Parameters:

  • compound (String)

    Compound URI



337
338
339
# File 'lib/dataset.rb', line 337

def add_compound (compound)
  @compounds << compound unless @compounds.include? compound
end

#add_feature(feature, metadata = {}) ⇒ Object

Add a feature

Parameters:

  • feature (String)

    Feature URI

  • metadata (Hash) (defaults to: {})

    Hash with feature metadata



314
315
316
# File 'lib/dataset.rb', line 314

def add_feature(feature,={})
  @features[feature] = 
end

#add_feature_metadata(feature, metadata) ⇒ Object

Add/modify metadata for a feature

Parameters:

  • feature (String)

    Feature URI

  • metadata (Hash)

    Hash with feature metadata



331
332
333
# File 'lib/dataset.rb', line 331

def (feature,)
  .each { |k,v| @features[feature][k] = v }
end

#add_metadata(metadata) ⇒ Object

Add/modify metadata, existing entries will be overwritten

Examples:

dataset.({DC.title => "any_title", DC.creator => "my_email"})

Parameters:

  • metadata (Hash)

    Hash mapping predicate_uris to values



307
308
309
# File 'lib/dataset.rb', line 307

def ()
  .each { |k,v| @metadata[k] = v }
end

#complete_data_entriesObject

Complete feature values by adding zeroes



319
320
321
322
323
324
325
326
# File 'lib/dataset.rb', line 319

def complete_data_entries
  all_features = @features.keys
  @data_entries.each { |c, e|
    (Set.new(all_features.collect)).subtract(Set.new e.keys).to_a.each { |f|
      self.add(c,f,0)
    }
  }
end

#copy_hash(hash) ⇒ Object

Copy a hash (eg. from JSON) into a dataset (rewrites URI)



445
446
447
448
449
450
451
452
453
454
455
# File 'lib/dataset.rb', line 445

def copy_hash(hash)
  @metadata = hash["metadata"]
  @data_entries = hash["data_entries"]
  @compounds = hash["compounds"]
  @features = hash["features"]
  if @uri
    self.uri = @uri 
  else
    @uri = hash["metadata"][XSD.anyURI]
  end
end

#delete(subjectid = nil) ⇒ Object

Delete dataset at the dataset service



440
441
442
# File 'lib/dataset.rb', line 440

def delete(subjectid=nil)
  RestClientWrapper.delete(@uri, :subjectid => subjectid)
end

#feature_name(feature) ⇒ String

Get name (DC.title) of a feature

Parameters:

  • feature (String)

    Feature URI

Returns:



281
282
283
# File 'lib/dataset.rb', line 281

def feature_name(feature)
  @features[feature][DC.title]
end

#feature_type(subjectid = nil) ⇒ String

Detect feature type(s) in the dataset

Returns:

  • (String)

    ‘classification“, ”regression“, ”mixed“ or unknown`



202
203
204
205
206
207
208
209
210
211
212
# File 'lib/dataset.rb', line 202

def feature_type(subjectid=nil)
  load_features(subjectid)
  feature_types = @features.collect{|f,| [RDF.type]}.flatten.uniq
  if feature_types.include?(OT.NominalFeature)
    "classification"
  elsif feature_types.include?(OT.NumericFeature)
    "regression"
  else
    "unknown"
  end
end

#load_all(subjectid = nil) ⇒ Object

Load all data (metadata, data_entries, compounds and features) from URI



157
158
159
160
161
162
163
164
# File 'lib/dataset.rb', line 157

def load_all(subjectid=nil)
  if (CONFIG[:json_hosts].include?(URI.parse(@uri).host))
    copy_hash Yajl::Parser.parse(RestClientWrapper.get(@uri, {:accept => "application/json", :subjectid => subjectid}))
  else
    parser = Parser::Owl::Dataset.new(@uri, subjectid)
    copy parser.load_uri(subjectid)
  end
end

#load_compounds(subjectid = nil) ⇒ Array

Load and return only compound URIs from the dataset service

Returns:

  • (Array)

    Compound URIs in the dataset



168
169
170
171
172
173
174
175
176
177
# File 'lib/dataset.rb', line 168

def load_compounds(subjectid=nil)
  # fix for datasets like http://apps.ideaconsult.net:8080/ambit2/dataset/272?max=50
  u = URI::parse(uri)
  u.path = File.join(u.path,"compounds")
  u = u.to_s
  RestClientWrapper.get(u,{:accept=> "text/uri-list", :subjectid => subjectid}).to_s.each_line do |compound_uri|
    @compounds << compound_uri.chomp
  end
  @compounds.uniq!
end

#load_csv(csv, subjectid = nil) ⇒ OpenTox::Dataset

Load CSV string (format specification: toxcreate.org/help)

  • loads data_entries, compounds, features

  • sets metadata (warnings) for parser errors

  • you will have to set remaining metadata manually

Parameters:

  • csv (String)

    CSV representation of the dataset

Returns:



128
129
130
131
132
133
# File 'lib/dataset.rb', line 128

def load_csv(csv, subjectid=nil) 
  save(subjectid) unless @uri # get a uri for creating features
  parser = Parser::Spreadsheets.new
  parser.dataset = self
  parser.load_csv(csv)
end

#load_features(subjectid = nil) ⇒ Hash

Load and return only features from the dataset service

Returns:

  • (Hash)

    Features of the dataset



181
182
183
184
185
186
187
188
189
# File 'lib/dataset.rb', line 181

def load_features(subjectid=nil)
  if (CONFIG[:json_hosts].include?(URI.parse(@uri).host))
    @features = Yajl::Parser.parse(RestClientWrapper.get(File.join(@uri,"features"), {:accept => "application/json", :subjectid => subjectid}))
  else
    parser = Parser::Owl::Dataset.new(@uri, subjectid)
    @features = parser.load_features(subjectid)
  end
  @features
end

#load_json(json) ⇒ Object



93
94
95
# File 'lib/dataset.rb', line 93

def load_json(json)
  copy_hash Yajl::Parser.parse(json)
end

#load_metadata(subjectid = nil) ⇒ Hash

Load and return only metadata of a Dataset object

Returns:

  • (Hash)

    Metadata of the dataset



150
151
152
153
154
# File 'lib/dataset.rb', line 150

def (subjectid=nil)
   Parser::Owl::Dataset.new(@uri, subjectid).(subjectid)
  self.uri = @uri if @uri # keep uri
  @metadata
end

#load_rdfxml(rdfxml, subjectid = nil) ⇒ Object



97
98
99
100
101
102
103
104
# File 'lib/dataset.rb', line 97

def load_rdfxml(rdfxml, subjectid=nil)
  raise "rdfxml data is empty" if rdfxml.to_s.size==0
  file = Tempfile.new("ot-rdfxml")
  file.puts rdfxml
  file.close
  load_rdfxml_file file, subjectid
  file.delete
end

#load_rdfxml_file(file, subjectid = nil) ⇒ OpenTox::Dataset

Load RDF/XML representation from a file

Parameters:

  • file (String)

    File with RDF/XML representation of the dataset

Returns:



109
110
111
112
113
# File 'lib/dataset.rb', line 109

def load_rdfxml_file(file, subjectid=nil)
  parser = Parser::Owl::Dataset.new @uri, subjectid
  parser.uri = file.path
  copy parser.load_uri(subjectid)
end

#load_sdf(sdf, subjectid = nil) ⇒ Object



115
116
117
118
119
120
# File 'lib/dataset.rb', line 115

def load_sdf(sdf,subjectid=nil)
  save(subjectid) unless @uri # get a uri for creating features
  parser = Parser::Sdf.new
  parser.dataset = self
  parser.load_sdf(sdf)
end

#load_spreadsheet(book, subjectid = nil) ⇒ OpenTox::Dataset

Load Spreadsheet book (created with roo gem roo.rubyforge.org/, excel format specification: toxcreate.org/help)

  • loads data_entries, compounds, features

  • sets metadata (warnings) for parser errors

  • you will have to set remaining metadata manually

Parameters:

  • book (Excel)

    Excel workbook object (created with roo gem)

Returns:



141
142
143
144
145
146
# File 'lib/dataset.rb', line 141

def load_spreadsheet(book, subjectid=nil)
  save(subjectid) unless @uri # get a uri for creating features
  parser = Parser::Spreadsheets.new
  parser.dataset = self
  parser.load_spreadsheet(book)
end

#load_yaml(yaml) ⇒ OpenTox::Dataset

Load YAML representation into the dataset

Parameters:

  • yaml (String)

    YAML representation of the dataset

Returns:



89
90
91
# File 'lib/dataset.rb', line 89

def load_yaml(yaml)
  copy YAML.load(yaml)
end

#save(subjectid = nil) ⇒ String

Save dataset at the dataset service

  • creates a new dataset if uri is not set

  • overwrites dataset if uri exists

Returns:



419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
# File 'lib/dataset.rb', line 419

def save(subjectid=nil)
  # TODO: rewrite feature URI's ??
  @compounds.uniq!
  if @uri
    if (CONFIG[:json_hosts].include?(URI.parse(@uri).host))
      #LOGGER.debug self.to_json
      RestClientWrapper.post(@uri,self.to_json,{:content_type =>  "application/json", :subjectid => subjectid})
    else
      File.open("ot-post-file.rdf","w+") { |f| f.write(self.to_rdfxml); @path = f.path }
      task_uri = RestClient.post(@uri, {:file => File.new(@path)},{:accept => "text/uri-list" , :subjectid => subjectid}).to_s.chomp
      Task.find(task_uri).wait_for_completion
      self.uri = RestClientWrapper.get(task_uri,{:accept => 'text/uri-list', :subjectid => subjectid})
    end
  else
    # create dataset if uri is empty
    self.uri = RestClientWrapper.post(CONFIG[:services]["opentox-dataset"],{:subjectid => subjectid}).to_s.chomp
  end
  @uri
end

#split(compounds, features, metadata, subjectid = nil) ⇒ OpenTox::Dataset

Creates a new dataset, by splitting the current dataset, i.e. using only a subset of compounds and features

Parameters:

  • compounds (Array)

    List of compound URIs

  • features (Array)

    List of feature URIs

  • metadata (Hash)

    Hash containing the metadata for the new dataset

  • subjectid (String) (defaults to: nil)

Returns:



347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
# File 'lib/dataset.rb', line 347

def split( compounds, features, , subjectid=nil)
  LOGGER.debug "split dataset using "+compounds.size.to_s+"/"+@compounds.size.to_s+" compounds"
  raise "no new compounds selected" unless compounds and compounds.size>0
  dataset = OpenTox::Dataset.create(CONFIG[:services]["opentox-dataset"],subjectid)
  if features.size==0
    compounds.each{ |c| dataset.add_compound(c) }
  else
    compounds.each do |c|
      features.each do |f|
        if @data_entries[c]==nil or @data_entries[c][f]==nil
          dataset.add(c,f,nil)
        else
          @data_entries[c][f].each do |v|
            dataset.add(c,f,v)
          end
        end
      end
    end
  end
  # set feature metadata in new dataset accordingly (including accept values)      
  features.each do |f|
    self.features[f].each do |k,v|
      dataset.features[f][k] = v
    end
  end
  dataset.()
  dataset.save(subjectid)
  dataset
end

#titleObject



285
286
287
# File 'lib/dataset.rb', line 285

def title
  @metadata[DC.title]
end

#to_csvString

Get CSV string representation (data_entries only, metadata will be discarded)

Returns:

  • (String)

    CSV representation



234
235
236
# File 'lib/dataset.rb', line 234

def to_csv
  Serializer::Spreadsheets.new(self).to_csv
end

#to_jsonObject



216
217
218
# File 'lib/dataset.rb', line 216

def to_json
  Yajl::Encoder.encode({:uri => @uri, :metadata => @metadata, :data_entries => @data_entries, :compounds => @compounds, :features => @features})
end

#to_ntriplesString

Get OWL-DL in ntriples format

Returns:

  • (String)

    N-Triples representation



240
241
242
243
244
# File 'lib/dataset.rb', line 240

def to_ntriples
  s = Serializer::Owl.new
  s.add_dataset(self)
  s.to_ntriples
end

#to_rdfxmlString

Get OWL-DL in RDF/XML format

Returns:

  • (String)

    RDF/XML representation



248
249
250
251
252
# File 'lib/dataset.rb', line 248

def to_rdfxml
  s = Serializer::Owl.new
  s.add_dataset(self)
  s.to_rdfxml
end

#to_sdfString

Get SDF representation of compounds

Returns:

  • (String)

    SDF representation



256
257
258
259
260
261
262
263
264
265
266
267
268
269
# File 'lib/dataset.rb', line 256

def to_sdf
  sum=""
  @compounds.each{ |c|
    sum << OpenTox::Compound.new(c).to_inchi
    sum << OpenTox::Compound.new(c).to_sdf.sub(/\n\$\$\$\$/,'')
    @data_entries[c].each{ |f,v|
      sum << ">  <\"#{f}\">\n"
      sum << v.join(", ")
      sum << "\n\n"
    }
    sum << "$$$$\n"
  }
  sum
end

#to_spreadsheetSpreadsheet::Workbook

Get Spreadsheet representation

Returns:

  • (Spreadsheet::Workbook)

    Workbook which can be written with the spreadsheet gem (data_entries only, metadata will will be discarded))



222
223
224
# File 'lib/dataset.rb', line 222

def to_spreadsheet
  Serializer::Spreadsheets.new(self).to_spreadsheet
end

#to_urilistObject



271
272
273
274
275
276
# File 'lib/dataset.rb', line 271

def to_urilist
  @compounds.inject { |sum, c|
    sum << OpenTox::Compound.new(c).uri
    sum + "\n"
  }
end

#to_xlsSpreadsheet::Workbook

Get Excel representation (alias for to_spreadsheet)

Returns:

  • (Spreadsheet::Workbook)

    Workbook which can be written with the spreadsheet gem (data_entries only, metadata will will be discarded))



228
229
230
# File 'lib/dataset.rb', line 228

def to_xls
  to_spreadsheet
end