Class: OpenTox::Dataset

Inherits:
Object
  • Object
show all
Includes:
OpenTox
Defined in:
lib/dataset.rb

Overview

Ruby wrapper for OpenTox Dataset Webservices (opentox.org/dev/apis/api-1.2/dataset).

Direct Known Subclasses

LazarPrediction

Instance Attribute Summary collapse

Attributes included from OpenTox

#uri

Class Method Summary collapse

Instance Method Summary collapse

Methods included from OpenTox

login, text_to_html

Constructor Details

#initialize(uri = nil, subjectid = nil) ⇒ OpenTox::Dataset

Create dataset with optional URI. Does not load data into the dataset - you will need to execute one of the load_* methods to pull data from a service or to insert it from other representations.

Examples:

Create an empty dataset

dataset = OpenTox::Dataset.new

Create an empty dataset with URI

dataset = OpenTox::Dataset.new("http:://webservices.in-silico/ch/dataset/1")

Parameters:

  • uri (optional, String) (defaults to: nil)

    Dataset URI



17
18
19
20
21
22
# File 'lib/dataset.rb', line 17

def initialize(uri=nil,subjectid=nil)
  super uri
  @features = {}
  @compounds = []
  @data_entries = {}
end

Instance Attribute Details

#compoundsObject (readonly)

Returns the value of attribute compounds.



8
9
10
# File 'lib/dataset.rb', line 8

def compounds
  @compounds
end

#data_entriesObject (readonly)

Returns the value of attribute data_entries.



8
9
10
# File 'lib/dataset.rb', line 8

def data_entries
  @data_entries
end

#featuresObject (readonly)

Returns the value of attribute features.



8
9
10
# File 'lib/dataset.rb', line 8

def features
  @features
end

#metadataObject (readonly)

Returns the value of attribute metadata.



8
9
10
# File 'lib/dataset.rb', line 8

def 
  @metadata
end

Class Method Details

.all(uri = CONFIG[:services], subjectid = nil) ⇒ Array

Get all datasets from a service

Parameters:

  • uri (optional, String) (defaults to: CONFIG[:services])

    URI of the dataset service, defaults to service specified in configuration

Returns:

  • (Array)

    Array of dataset object without data (use one of the load_* methods to pull data from the server)



76
77
78
# File 'lib/dataset.rb', line 76

def self.all(uri=CONFIG[:services]["opentox-dataset"], subjectid=nil)
  RestClientWrapper.get(uri,{:accept => "text/uri-list",:subjectid => subjectid}).to_s.each_line.collect{|u| Dataset.new(u, subjectid)}
end

.create(uri = CONFIG[:services], subjectid = nil) ⇒ OpenTox::Dataset

Create an empty dataset and save it at the dataset service (assigns URI to dataset)

Examples:

Create new dataset and save it to obtain a URI

dataset = OpenTox::Dataset.create

Parameters:

  • uri (optional, String) (defaults to: CONFIG[:services])

    Dataset URI

Returns:



29
30
31
32
33
# File 'lib/dataset.rb', line 29

def self.create(uri=CONFIG[:services]["opentox-dataset"], subjectid=nil)
  dataset = Dataset.new(nil,subjectid)
  dataset.save(subjectid)
  dataset
end

.create_from_csv_file(file, subjectid = nil) ⇒ OpenTox::Dataset

Create dataset from CSV file (format specification: toxcreate.org/help)

  • loads data_entries, compounds, features

  • sets metadata (warnings) for parser errors

  • you will have to set remaining metadata manually

Parameters:

  • file (String)

    CSV file path

Returns:



41
42
43
44
45
46
47
48
# File 'lib/dataset.rb', line 41

def self.create_from_csv_file(file, subjectid=nil) 
  dataset = Dataset.create(CONFIG[:services]["opentox-dataset"], subjectid)
  parser = Parser::Spreadsheets.new
  parser.dataset = dataset
  parser.load_csv(File.open(file).read)
  dataset.save(subjectid)
  dataset
end

.exist?(uri, subjectid = nil) ⇒ Boolean

replaces find as exist check, takes not as long, does NOT raise an un-authorized exception

Parameters:

  • uri (String)

    Dataset URI

Returns:

  • (Boolean)

    true if dataset exists and user has get rights, false else



63
64
65
66
67
68
69
70
71
# File 'lib/dataset.rb', line 63

def self.exist?(uri, subjectid=nil)
  return false unless uri
  dataset = Dataset.new(uri, subjectid)
  begin
    dataset.( subjectid ).size > 0
  rescue
    false
  end
end

.find(uri, subjectid = nil) ⇒ OpenTox::Dataset

Find a dataset and load all data. This can be time consuming, use Dataset.new together with one of the load_* methods for a fine grained control over data loading.

Parameters:

  • uri (String)

    Dataset URI

Returns:



53
54
55
56
57
58
# File 'lib/dataset.rb', line 53

def self.find(uri, subjectid=nil)
  return nil unless uri
  dataset = Dataset.new(uri, subjectid)
  dataset.load_all(subjectid)
  dataset
end

Instance Method Details

#add(compound, feature, value) ⇒ Object

Insert a statement (compound_uri,feature_uri,value)

Examples:

Insert a statement (compound_uri,feature_uri,value)

dataset.add "http://webservices.in-silico.ch/compound/InChI=1S/C6Cl6/c7-1-2(8)4(10)6(12)5(11)3(1)9", "http://webservices.in-silico.ch/dataset/1/feature/hamster_carcinogenicity", true

Parameters:

  • compound (String)

    Compound URI

  • feature (String)

    Compound URI

  • value (Boolean, Float)

    Feature value



235
236
237
238
239
240
241
# File 'lib/dataset.rb', line 235

def add (compound,feature,value)
  @compounds << compound unless @compounds.include? compound
  @features[feature] = {}  unless @features[feature]
  @data_entries[compound] = {} unless @data_entries[compound]
  @data_entries[compound][feature] = [] unless @data_entries[compound][feature]
  @data_entries[compound][feature] << value if value!=nil
end

#add_compound(compound) ⇒ Object

Add a new compound

Parameters:

  • compound (String)

    Compound URI



267
268
269
# File 'lib/dataset.rb', line 267

def add_compound (compound)
  @compounds << compound unless @compounds.include? compound
end

#add_feature(feature, metadata = {}) ⇒ Object

Add a feature

Parameters:

  • feature (String)

    Feature URI

  • metadata (Hash) (defaults to: {})

    Hash with feature metadata



254
255
256
# File 'lib/dataset.rb', line 254

def add_feature(feature,={})
  @features[feature] = 
end

#add_feature_metadata(feature, metadata) ⇒ Object

Add/modify metadata for a feature

Parameters:

  • feature (String)

    Feature URI

  • metadata (Hash)

    Hash with feature metadata



261
262
263
# File 'lib/dataset.rb', line 261

def (feature,)
  .each { |k,v| @features[feature][k] = v }
end

#add_metadata(metadata) ⇒ Object

Add/modify metadata, existing entries will be overwritten

Examples:

dataset.({DC.title => "any_title", DC.creator => "my_email"})

Parameters:

  • metadata (Hash)

    Hash mapping predicate_uris to values



247
248
249
# File 'lib/dataset.rb', line 247

def ()
  .each { |k,v| @metadata[k] = v }
end

#delete(subjectid = nil) ⇒ Object

Delete dataset at the dataset service



326
327
328
# File 'lib/dataset.rb', line 326

def delete(subjectid=nil)
  RestClientWrapper.delete(@uri, :subjectid => subjectid)
end

#feature_name(feature) ⇒ String

Get name (DC.title) of a feature

Parameters:

  • feature (String)

    Feature URI

Returns:



221
222
223
# File 'lib/dataset.rb', line 221

def feature_name(feature)
  @features[feature][DC.title]
end

#feature_typeString

Detect feature type(s) in the dataset

Returns:

  • (String)

    ‘classification“, ”regression“, ”mixed“ or unknown`



168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
# File 'lib/dataset.rb', line 168

def feature_type
  feature_types = @features.collect{|f,| [OT.isA]}.uniq
  if feature_types.size > 1
    "mixed"
  else
    case feature_types.first
    when /NominalFeature/
      "classification"
    when /NumericFeature/
      "regression"
    else
      "unknown"
    end
  end
end

#load_all(subjectid = nil) ⇒ Object

Load all data (metadata, data_entries, compounds and features) from URI



140
141
142
143
144
145
146
147
# File 'lib/dataset.rb', line 140

def load_all(subjectid=nil)
  if (CONFIG[:yaml_hosts].include?(URI.parse(@uri).host))
    copy YAML.load(RestClientWrapper.get(@uri, {:accept => "application/x-yaml", :subjectid => subjectid}))
  else
    parser = Parser::Owl::Dataset.new(@uri, subjectid)
    copy parser.load_uri(subjectid)
  end
end

#load_compounds(subjectid = nil) ⇒ Array

Load and return only compound URIs from the dataset service

Returns:

  • (Array)

    Compound URIs in the dataset



151
152
153
154
155
156
# File 'lib/dataset.rb', line 151

def load_compounds(subjectid=nil)
  RestClientWrapper.get(File.join(uri,"compounds"),{:accept=> "text/uri-list", :subjectid => subjectid}).to_s.each_line do |compound_uri|
    @compounds << compound_uri.chomp
  end
  @compounds.uniq!
end

#load_csv(csv, subjectid = nil) ⇒ OpenTox::Dataset

Load CSV string (format specification: toxcreate.org/help)

  • loads data_entries, compounds, features

  • sets metadata (warnings) for parser errors

  • you will have to set remaining metadata manually

Parameters:

  • csv (String)

    CSV representation of the dataset

Returns:



111
112
113
114
115
116
# File 'lib/dataset.rb', line 111

def load_csv(csv, subjectid=nil) 
  save(subjectid) unless @uri # get a uri for creating features
  parser = Parser::Spreadsheets.new
  parser.dataset = self
  parser.load_csv(csv)
end

#load_features(subjectid = nil) ⇒ Hash

Load and return only features from the dataset service

Returns:

  • (Hash)

    Features of the dataset



160
161
162
163
164
# File 'lib/dataset.rb', line 160

def load_features(subjectid=nil)
  parser = Parser::Owl::Dataset.new(@uri, subjectid)
  @features = parser.load_features(subjectid)
  @features
end

#load_metadata(subjectid = nil) ⇒ Hash

Load and return only metadata of a Dataset object

Returns:

  • (Hash)

    Metadata of the dataset



133
134
135
136
137
# File 'lib/dataset.rb', line 133

def (subjectid=nil)
   Parser::Owl::Dataset.new(@uri, subjectid).(subjectid)
  self.uri = @uri if @uri # keep uri
  @metadata
end

#load_rdfxml(rdfxml) ⇒ Object



87
88
89
90
91
92
93
94
# File 'lib/dataset.rb', line 87

def load_rdfxml(rdfxml)
  raise "rdfxml data is empty" if rdfxml.to_s.size==0
  file = Tempfile.new("ot-rdfxml")
  file.puts rdfxml
  file.close
  load_rdfxml_file file
  file.delete
end

#load_rdfxml_file(file, subjectid = nil) ⇒ OpenTox::Dataset

Load RDF/XML representation from a file

Parameters:

  • file (String)

    File with RDF/XML representation of the dataset

Returns:



99
100
101
102
103
# File 'lib/dataset.rb', line 99

def load_rdfxml_file(file, subjectid=nil)
  parser = Parser::Owl::Dataset.new @uri, subjectid
  parser.uri = file.path
  copy parser.load_uri(subjectid)
end

#load_spreadsheet(book, subjectid = nil) ⇒ OpenTox::Dataset

Load Spreadsheet book (created with roo gem roo.rubyforge.org/, excel format specification: toxcreate.org/help)

  • loads data_entries, compounds, features

  • sets metadata (warnings) for parser errors

  • you will have to set remaining metadata manually

Parameters:

  • book (Excel)

    Excel workbook object (created with roo gem)

Returns:



124
125
126
127
128
129
# File 'lib/dataset.rb', line 124

def load_spreadsheet(book, subjectid=nil)
  save(subjectid) unless @uri # get a uri for creating features
  parser = Parser::Spreadsheets.new
  parser.dataset = self
  parser.load_spreadsheet(book)
end

#load_yaml(yaml) ⇒ OpenTox::Dataset

Load YAML representation into the dataset

Parameters:

  • yaml (String)

    YAML representation of the dataset

Returns:



83
84
85
# File 'lib/dataset.rb', line 83

def load_yaml(yaml)
  copy YAML.load(yaml)
end

#save(subjectid = nil) ⇒ String

Save dataset at the dataset service

  • creates a new dataset if uri is not set

  • overwrites dataset if uri exists

Returns:



305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
# File 'lib/dataset.rb', line 305

def save(subjectid=nil)
  # TODO: rewrite feature URI's ??
  @compounds.uniq!
  if @uri
    if (CONFIG[:yaml_hosts].include?(URI.parse(@uri).host))
      RestClientWrapper.post(@uri,self.to_yaml,{:content_type =>  "application/x-yaml", :subjectid => subjectid})
    else
      File.open("ot-post-file.rdf","w+") { |f| f.write(self.to_rdfxml); @path = f.path }
      task_uri = RestClient.post(@uri, {:file => File.new(@path)},{:accept => "text/uri-list" , :subjectid => subjectid}).to_s.chomp
      #task_uri = `curl -X POST -H "Accept:text/uri-list" -F "file=@#{@path};type=application/rdf+xml" http://apps.ideaconsult.net:8080/ambit2/dataset`
      Task.find(task_uri).wait_for_completion
      self.uri = RestClientWrapper.get(task_uri,{:accept => 'text/uri-list', :subjectid => subjectid})
    end
  else
    # create dataset if uri is empty
    self.uri = RestClientWrapper.post(CONFIG[:services]["opentox-dataset"],{:subjectid => subjectid}).to_s.chomp
  end
  @uri
end

#split(compounds, features, metadata, subjectid = nil) ⇒ OpenTox::Dataset

Creates a new dataset, by splitting the current dataset, i.e. using only a subset of compounds and features

Parameters:

  • compounds (Array)

    List of compound URIs

  • features (Array)

    List of feature URIs

  • metadata (Hash)

    Hash containing the metadata for the new dataset

  • subjectid (String) (defaults to: nil)

Returns:



277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
# File 'lib/dataset.rb', line 277

def split( compounds, features, , subjectid=nil)
  LOGGER.debug "split dataset using "+compounds.size.to_s+"/"+@compounds.size.to_s+" compounds"
  raise "no new compounds selected" unless compounds and compounds.size>0
  dataset = OpenTox::Dataset.create(CONFIG[:services]["opentox-dataset"],subjectid)
  if features.size==0
    compounds.each{ |c| dataset.add_compound(c) }
  else
    compounds.each do |c|
      features.each do |f|
        unless @data_entries[c][f]
          dataset.add(c,f,nil)
        else
          @data_entries[c][f].each do |v|
            dataset.add(c,f,v)
          end
        end
      end
    end
  end
  dataset.()
  dataset.save(subjectid)
  dataset
end

#titleObject



225
226
227
# File 'lib/dataset.rb', line 225

def title
  @metadata[DC.title]
end

#to_csvString

Get CSV string representation (data_entries only, metadata will be discarded)

Returns:

  • (String)

    CSV representation



198
199
200
# File 'lib/dataset.rb', line 198

def to_csv
  Serializer::Spreadsheets.new(self).to_csv
end

#to_ntriplesString

Get OWL-DL in ntriples format

Returns:

  • (String)

    N-Triples representation



204
205
206
207
208
# File 'lib/dataset.rb', line 204

def to_ntriples
  s = Serializer::Owl.new
  s.add_dataset(self)
  s.to_ntriples
end

#to_rdfxmlString

Get OWL-DL in RDF/XML format

Returns:

  • (String)

    RDF/XML representation



212
213
214
215
216
# File 'lib/dataset.rb', line 212

def to_rdfxml
  s = Serializer::Owl.new
  s.add_dataset(self)
  s.to_rdfxml
end

#to_spreadsheetSpreadsheet::Workbook

Get Spreadsheet representation

Returns:

  • (Spreadsheet::Workbook)

    Workbook which can be written with the spreadsheet gem (data_entries only, metadata will will be discarded))



186
187
188
# File 'lib/dataset.rb', line 186

def to_spreadsheet
  Serializer::Spreadsheets.new(self).to_spreadsheet
end

#to_xlsSpreadsheet::Workbook

Get Excel representation (alias for to_spreadsheet)

Returns:

  • (Spreadsheet::Workbook)

    Workbook which can be written with the spreadsheet gem (data_entries only, metadata will will be discarded))



192
193
194
# File 'lib/dataset.rb', line 192

def to_xls
  to_spreadsheet
end