Class: OpenTox::Dataset
Overview
Ruby wrapper for OpenTox Dataset Webservices (opentox.org/dev/apis/api-1.2/dataset).
Direct Known Subclasses
Instance Attribute Summary collapse
-
#compounds ⇒ Object
readonly
Returns the value of attribute compounds.
-
#data_entries ⇒ Object
readonly
Returns the value of attribute data_entries.
-
#features ⇒ Object
readonly
Returns the value of attribute features.
-
#metadata ⇒ Object
readonly
Returns the value of attribute metadata.
Attributes included from OpenTox
Class Method Summary collapse
-
.all(uri = CONFIG[:services], subjectid = nil) ⇒ Array
Get all datasets from a service.
-
.create(uri = CONFIG[:services], subjectid = nil) ⇒ OpenTox::Dataset
Create an empty dataset and save it at the dataset service (assigns URI to dataset).
-
.create_from_csv_file(file, subjectid = nil) ⇒ OpenTox::Dataset
Create dataset from CSV file (format specification: toxcreate.org/help) - loads data_entries, compounds, features - sets metadata (warnings) for parser errors - you will have to set remaining metadata manually.
-
.exist?(uri, subjectid = nil) ⇒ Boolean
replaces find as exist check, takes not as long, does NOT raise an un-authorized exception.
-
.find(uri, subjectid = nil) ⇒ OpenTox::Dataset
Find a dataset and load all data.
Instance Method Summary collapse
-
#add(compound, feature, value) ⇒ Object
Insert a statement (compound_uri,feature_uri,value).
-
#add_compound(compound) ⇒ Object
Add a new compound.
-
#add_feature(feature, metadata = {}) ⇒ Object
Add a feature.
-
#add_feature_metadata(feature, metadata) ⇒ Object
Add/modify metadata for a feature.
-
#add_metadata(metadata) ⇒ Object
Add/modify metadata, existing entries will be overwritten.
-
#delete(subjectid = nil) ⇒ Object
Delete dataset at the dataset service.
-
#feature_name(feature) ⇒ String
Get name (DC.title) of a feature.
-
#feature_type ⇒ String
Detect feature type(s) in the dataset.
-
#initialize(uri = nil, subjectid = nil) ⇒ OpenTox::Dataset
constructor
Create dataset with optional URI.
-
#load_all(subjectid = nil) ⇒ Object
Load all data (metadata, data_entries, compounds and features) from URI.
-
#load_compounds(subjectid = nil) ⇒ Array
Load and return only compound URIs from the dataset service.
-
#load_csv(csv, subjectid = nil) ⇒ OpenTox::Dataset
Load CSV string (format specification: toxcreate.org/help) - loads data_entries, compounds, features - sets metadata (warnings) for parser errors - you will have to set remaining metadata manually.
-
#load_features(subjectid = nil) ⇒ Hash
Load and return only features from the dataset service.
-
#load_metadata(subjectid = nil) ⇒ Hash
Load and return only metadata of a Dataset object.
- #load_rdfxml(rdfxml) ⇒ Object
-
#load_rdfxml_file(file, subjectid = nil) ⇒ OpenTox::Dataset
Load RDF/XML representation from a file.
-
#load_spreadsheet(book, subjectid = nil) ⇒ OpenTox::Dataset
Load Spreadsheet book (created with roo gem roo.rubyforge.org/, excel format specification: toxcreate.org/help) - loads data_entries, compounds, features - sets metadata (warnings) for parser errors - you will have to set remaining metadata manually.
-
#load_yaml(yaml) ⇒ OpenTox::Dataset
Load YAML representation into the dataset.
-
#save(subjectid = nil) ⇒ String
Save dataset at the dataset service - creates a new dataset if uri is not set - overwrites dataset if uri exists.
-
#split(compounds, features, metadata, subjectid = nil) ⇒ OpenTox::Dataset
Creates a new dataset, by splitting the current dataset, i.e.
- #title ⇒ Object
-
#to_csv ⇒ String
Get CSV string representation (data_entries only, metadata will be discarded).
-
#to_ntriples ⇒ String
Get OWL-DL in ntriples format.
-
#to_rdfxml ⇒ String
Get OWL-DL in RDF/XML format.
-
#to_spreadsheet ⇒ Spreadsheet::Workbook
Get Spreadsheet representation.
-
#to_xls ⇒ Spreadsheet::Workbook
Get Excel representation (alias for to_spreadsheet).
Methods included from OpenTox
Constructor Details
#initialize(uri = nil, subjectid = nil) ⇒ OpenTox::Dataset
Create dataset with optional URI. Does not load data into the dataset - you will need to execute one of the load_* methods to pull data from a service or to insert it from other representations.
17 18 19 20 21 22 |
# File 'lib/dataset.rb', line 17 def initialize(uri=nil,subjectid=nil) super uri @features = {} @compounds = [] @data_entries = {} end |
Instance Attribute Details
#compounds ⇒ Object (readonly)
Returns the value of attribute compounds.
8 9 10 |
# File 'lib/dataset.rb', line 8 def compounds @compounds end |
#data_entries ⇒ Object (readonly)
Returns the value of attribute data_entries.
8 9 10 |
# File 'lib/dataset.rb', line 8 def data_entries @data_entries end |
#features ⇒ Object (readonly)
Returns the value of attribute features.
8 9 10 |
# File 'lib/dataset.rb', line 8 def features @features end |
#metadata ⇒ Object (readonly)
Returns the value of attribute metadata.
8 9 10 |
# File 'lib/dataset.rb', line 8 def @metadata end |
Class Method Details
.all(uri = CONFIG[:services], subjectid = nil) ⇒ Array
Get all datasets from a service
76 77 78 |
# File 'lib/dataset.rb', line 76 def self.all(uri=CONFIG[:services]["opentox-dataset"], subjectid=nil) RestClientWrapper.get(uri,{:accept => "text/uri-list",:subjectid => subjectid}).to_s.each_line.collect{|u| Dataset.new(u, subjectid)} end |
.create(uri = CONFIG[:services], subjectid = nil) ⇒ OpenTox::Dataset
Create an empty dataset and save it at the dataset service (assigns URI to dataset)
29 30 31 32 33 |
# File 'lib/dataset.rb', line 29 def self.create(uri=CONFIG[:services]["opentox-dataset"], subjectid=nil) dataset = Dataset.new(nil,subjectid) dataset.save(subjectid) dataset end |
.create_from_csv_file(file, subjectid = nil) ⇒ OpenTox::Dataset
Create dataset from CSV file (format specification: toxcreate.org/help)
-
loads data_entries, compounds, features
-
sets metadata (warnings) for parser errors
-
you will have to set remaining metadata manually
41 42 43 44 45 46 47 48 |
# File 'lib/dataset.rb', line 41 def self.create_from_csv_file(file, subjectid=nil) dataset = Dataset.create(CONFIG[:services]["opentox-dataset"], subjectid) parser = Parser::Spreadsheets.new parser.dataset = dataset parser.load_csv(File.open(file).read) dataset.save(subjectid) dataset end |
.exist?(uri, subjectid = nil) ⇒ Boolean
replaces find as exist check, takes not as long, does NOT raise an un-authorized exception
63 64 65 66 67 68 69 70 71 |
# File 'lib/dataset.rb', line 63 def self.exist?(uri, subjectid=nil) return false unless uri dataset = Dataset.new(uri, subjectid) begin dataset.( subjectid ).size > 0 rescue false end end |
.find(uri, subjectid = nil) ⇒ OpenTox::Dataset
Find a dataset and load all data. This can be time consuming, use Dataset.new together with one of the load_* methods for a fine grained control over data loading.
53 54 55 56 57 58 |
# File 'lib/dataset.rb', line 53 def self.find(uri, subjectid=nil) return nil unless uri dataset = Dataset.new(uri, subjectid) dataset.load_all(subjectid) dataset end |
Instance Method Details
#add(compound, feature, value) ⇒ Object
Insert a statement (compound_uri,feature_uri,value)
235 236 237 238 239 240 241 |
# File 'lib/dataset.rb', line 235 def add (compound,feature,value) @compounds << compound unless @compounds.include? compound @features[feature] = {} unless @features[feature] @data_entries[compound] = {} unless @data_entries[compound] @data_entries[compound][feature] = [] unless @data_entries[compound][feature] @data_entries[compound][feature] << value if value!=nil end |
#add_compound(compound) ⇒ Object
Add a new compound
267 268 269 |
# File 'lib/dataset.rb', line 267 def add_compound (compound) @compounds << compound unless @compounds.include? compound end |
#add_feature(feature, metadata = {}) ⇒ Object
Add a feature
254 255 256 |
# File 'lib/dataset.rb', line 254 def add_feature(feature,={}) @features[feature] = end |
#add_feature_metadata(feature, metadata) ⇒ Object
Add/modify metadata for a feature
261 262 263 |
# File 'lib/dataset.rb', line 261 def (feature,) .each { |k,v| @features[feature][k] = v } end |
#add_metadata(metadata) ⇒ Object
Add/modify metadata, existing entries will be overwritten
247 248 249 |
# File 'lib/dataset.rb', line 247 def () .each { |k,v| @metadata[k] = v } end |
#delete(subjectid = nil) ⇒ Object
Delete dataset at the dataset service
326 327 328 |
# File 'lib/dataset.rb', line 326 def delete(subjectid=nil) RestClientWrapper.delete(@uri, :subjectid => subjectid) end |
#feature_name(feature) ⇒ String
Get name (DC.title) of a feature
221 222 223 |
# File 'lib/dataset.rb', line 221 def feature_name(feature) @features[feature][DC.title] end |
#feature_type ⇒ String
Detect feature type(s) in the dataset
168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 |
# File 'lib/dataset.rb', line 168 def feature_type feature_types = @features.collect{|f,| [OT.isA]}.uniq if feature_types.size > 1 "mixed" else case feature_types.first when /NominalFeature/ "classification" when /NumericFeature/ "regression" else "unknown" end end end |
#load_all(subjectid = nil) ⇒ Object
Load all data (metadata, data_entries, compounds and features) from URI
140 141 142 143 144 145 146 147 |
# File 'lib/dataset.rb', line 140 def load_all(subjectid=nil) if (CONFIG[:yaml_hosts].include?(URI.parse(@uri).host)) copy YAML.load(RestClientWrapper.get(@uri, {:accept => "application/x-yaml", :subjectid => subjectid})) else parser = Parser::Owl::Dataset.new(@uri, subjectid) copy parser.load_uri(subjectid) end end |
#load_compounds(subjectid = nil) ⇒ Array
Load and return only compound URIs from the dataset service
151 152 153 154 155 156 |
# File 'lib/dataset.rb', line 151 def load_compounds(subjectid=nil) RestClientWrapper.get(File.join(uri,"compounds"),{:accept=> "text/uri-list", :subjectid => subjectid}).to_s.each_line do |compound_uri| @compounds << compound_uri.chomp end @compounds.uniq! end |
#load_csv(csv, subjectid = nil) ⇒ OpenTox::Dataset
Load CSV string (format specification: toxcreate.org/help)
-
loads data_entries, compounds, features
-
sets metadata (warnings) for parser errors
-
you will have to set remaining metadata manually
111 112 113 114 115 116 |
# File 'lib/dataset.rb', line 111 def load_csv(csv, subjectid=nil) save(subjectid) unless @uri # get a uri for creating features parser = Parser::Spreadsheets.new parser.dataset = self parser.load_csv(csv) end |
#load_features(subjectid = nil) ⇒ Hash
Load and return only features from the dataset service
160 161 162 163 164 |
# File 'lib/dataset.rb', line 160 def load_features(subjectid=nil) parser = Parser::Owl::Dataset.new(@uri, subjectid) @features = parser.load_features(subjectid) @features end |
#load_metadata(subjectid = nil) ⇒ Hash
Load and return only metadata of a Dataset object
133 134 135 136 137 |
# File 'lib/dataset.rb', line 133 def (subjectid=nil) Parser::Owl::Dataset.new(@uri, subjectid).(subjectid) self.uri = @uri if @uri # keep uri @metadata end |
#load_rdfxml(rdfxml) ⇒ Object
87 88 89 90 91 92 93 94 |
# File 'lib/dataset.rb', line 87 def load_rdfxml(rdfxml) raise "rdfxml data is empty" if rdfxml.to_s.size==0 file = Tempfile.new("ot-rdfxml") file.puts rdfxml file.close load_rdfxml_file file file.delete end |
#load_rdfxml_file(file, subjectid = nil) ⇒ OpenTox::Dataset
Load RDF/XML representation from a file
99 100 101 102 103 |
# File 'lib/dataset.rb', line 99 def load_rdfxml_file(file, subjectid=nil) parser = Parser::Owl::Dataset.new @uri, subjectid parser.uri = file.path copy parser.load_uri(subjectid) end |
#load_spreadsheet(book, subjectid = nil) ⇒ OpenTox::Dataset
Load Spreadsheet book (created with roo gem roo.rubyforge.org/, excel format specification: toxcreate.org/help)
-
loads data_entries, compounds, features
-
sets metadata (warnings) for parser errors
-
you will have to set remaining metadata manually
124 125 126 127 128 129 |
# File 'lib/dataset.rb', line 124 def load_spreadsheet(book, subjectid=nil) save(subjectid) unless @uri # get a uri for creating features parser = Parser::Spreadsheets.new parser.dataset = self parser.load_spreadsheet(book) end |
#load_yaml(yaml) ⇒ OpenTox::Dataset
Load YAML representation into the dataset
83 84 85 |
# File 'lib/dataset.rb', line 83 def load_yaml(yaml) copy YAML.load(yaml) end |
#save(subjectid = nil) ⇒ String
Save dataset at the dataset service
-
creates a new dataset if uri is not set
-
overwrites dataset if uri exists
305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 |
# File 'lib/dataset.rb', line 305 def save(subjectid=nil) # TODO: rewrite feature URI's ?? @compounds.uniq! if @uri if (CONFIG[:yaml_hosts].include?(URI.parse(@uri).host)) RestClientWrapper.post(@uri,self.to_yaml,{:content_type => "application/x-yaml", :subjectid => subjectid}) else File.open("ot-post-file.rdf","w+") { |f| f.write(self.to_rdfxml); @path = f.path } task_uri = RestClient.post(@uri, {:file => File.new(@path)},{:accept => "text/uri-list" , :subjectid => subjectid}).to_s.chomp #task_uri = `curl -X POST -H "Accept:text/uri-list" -F "file=@#{@path};type=application/rdf+xml" http://apps.ideaconsult.net:8080/ambit2/dataset` Task.find(task_uri).wait_for_completion self.uri = RestClientWrapper.get(task_uri,{:accept => 'text/uri-list', :subjectid => subjectid}) end else # create dataset if uri is empty self.uri = RestClientWrapper.post(CONFIG[:services]["opentox-dataset"],{:subjectid => subjectid}).to_s.chomp end @uri end |
#split(compounds, features, metadata, subjectid = nil) ⇒ OpenTox::Dataset
Creates a new dataset, by splitting the current dataset, i.e. using only a subset of compounds and features
277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 |
# File 'lib/dataset.rb', line 277 def split( compounds, features, , subjectid=nil) LOGGER.debug "split dataset using "+compounds.size.to_s+"/"+@compounds.size.to_s+" compounds" raise "no new compounds selected" unless compounds and compounds.size>0 dataset = OpenTox::Dataset.create(CONFIG[:services]["opentox-dataset"],subjectid) if features.size==0 compounds.each{ |c| dataset.add_compound(c) } else compounds.each do |c| features.each do |f| unless @data_entries[c][f] dataset.add(c,f,nil) else @data_entries[c][f].each do |v| dataset.add(c,f,v) end end end end end dataset.() dataset.save(subjectid) dataset end |
#title ⇒ Object
225 226 227 |
# File 'lib/dataset.rb', line 225 def title @metadata[DC.title] end |
#to_csv ⇒ String
Get CSV string representation (data_entries only, metadata will be discarded)
198 199 200 |
# File 'lib/dataset.rb', line 198 def to_csv Serializer::Spreadsheets.new(self).to_csv end |
#to_ntriples ⇒ String
Get OWL-DL in ntriples format
204 205 206 207 208 |
# File 'lib/dataset.rb', line 204 def to_ntriples s = Serializer::Owl.new s.add_dataset(self) s.to_ntriples end |
#to_rdfxml ⇒ String
Get OWL-DL in RDF/XML format
212 213 214 215 216 |
# File 'lib/dataset.rb', line 212 def to_rdfxml s = Serializer::Owl.new s.add_dataset(self) s.to_rdfxml end |
#to_spreadsheet ⇒ Spreadsheet::Workbook
Get Spreadsheet representation
186 187 188 |
# File 'lib/dataset.rb', line 186 def to_spreadsheet Serializer::Spreadsheets.new(self).to_spreadsheet end |
#to_xls ⇒ Spreadsheet::Workbook
Get Excel representation (alias for to_spreadsheet)
192 193 194 |
# File 'lib/dataset.rb', line 192 def to_xls to_spreadsheet end |