Class: OpenTox::Model::Validation

Inherits:

Object

Object
OpenTox::Model::Validation

show all

Includes:: Mongoid::Document, Mongoid::Timestamps, OpenTox

Defined in:: lib/model.rb

Overview

Convenience class for generating and validating lazar models in a single step and predicting substances (compounds and nanoparticles), arrays of substances and datasets

Class Method Summary collapse

.from_csv_file(file) ⇒ OpenTox::Model::Validation

Create and validate a lazar model from a csv file with training data and a json file with metadata.
.from_enanomapper(training_dataset: nil, prediction_feature: nil, algorithms: nil) ⇒ OpenTox::Model::Validation

Create and validate a nano-lazar model, import data from eNanoMapper if necessary nano-lazar methods are described in detail in github.com/enanomapper/nano-lazar-paper/blob/master/nano-lazar.pdf *eNanoMapper import is currently broken, because APIs and data formats are constantly changing and we have no resources to track this changes permanently!*.

Instance Method Summary collapse

#algorithms ⇒ Hash

Get algorithms.
#classification? ⇒ TrueClass, FalseClass

Is it a classification model.
#crossvalidations ⇒ Array<OpenTox::CrossValidation]

Get crossvalidations.
#model ⇒ OpenTox::Model::Lazar

Get lazar model.
#predict(object) ⇒ Hash, ...

Predict a substance (compound or nanoparticle), an array of substances or a dataset.
#prediction_feature ⇒ OpenTox::Feature

Get prediction feature.
#regression? ⇒ TrueClass, FalseClass

Is it a regression model.
#repeated_crossvalidation ⇒ OpenTox::Validation::RepeatedCrossValidation

Get repeated crossvalidations.
#training_dataset ⇒ OpenTox::Dataset

Get training dataset.

Class Method Details

.from_csv_file(file) ⇒ `OpenTox::Model::Validation`

Create and validate a lazar model from a csv file with training data and a json file with metadata

Parameters:

CSV (File) —

file with two or three columns. The first column is optional and may contain an arbitrary substance ID. The next column should contain either SMILES or InChIs of the training compounds, followed by toxic activities (qualitative or quantitative) in the last column. Use -log10 transformed values for regression datasets. The first line should contain “ID” (optional), either SMILES or InChI and the endpoint name (last column). Add metadata to a JSON file with the same basename containing the fields “species”, “endpoint”, “source”, “qmrf” (optional) and “unit” (regression only). You can find example training data in the data folder of lazar.

Returns:

(OpenTox::Model::Validation) —

lazar model with five independent 10-fold crossvalidations

Raises:

(ArgumentError)

# File 'lib/model.rb', line 502

def self.from_csv_file file
  metadata_file = file.sub(/csv$/,"json")
  raise ArgumentError, "No metadata file #{metadata_file}" unless File.exist? metadata_file
  model_validation = self.new JSON.parse(File.read(metadata_file))
  training_dataset = Dataset.from_csv_file file
  model = Lazar.create training_dataset: training_dataset
  model_validation[:model_id] = model.id
  model_validation[:repeated_crossvalidation_id] = OpenTox::Validation::RepeatedCrossValidation.create(model).id # full class name required
  model_validation.save
  model_validation
end

.from_enanomapper(training_dataset: nil, prediction_feature: nil, algorithms: nil) ⇒ `OpenTox::Model::Validation`

Create and validate a nano-lazar model, import data from eNanoMapper if necessary

nano-lazar methods are described in detail in https://github.com/enanomapper/nano-lazar-paper/blob/master/nano-lazar.pdf
*eNanoMapper import is currently broken, because APIs and data formats are constantly changing and we have no resources to track this changes permanently!*

Parameters:

training_dataset (OpenTox::Dataset, nil) (defaults to: nil)
prediction_feature (OpenTox::Feature, nil) (defaults to: nil)
algorithms (Hash, nil) (defaults to: nil)

Returns:

(OpenTox::Model::Validation) —

lazar model with five independent 10-fold crossvalidations

# File 'lib/model.rb', line 521

def self.from_enanomapper training_dataset: nil, prediction_feature:nil, algorithms: nil
  
  # find/import training_dataset
  training_dataset ||= Dataset.where(:name => "Protein Corona Fingerprinting Predicts the Cellular Interaction of Gold and Silver Nanoparticles").first
  unless training_dataset # try to import 
    Import::Enanomapper.import
    training_dataset = Dataset.where(name: "Protein Corona Fingerprinting Predicts the Cellular Interaction of Gold and Silver Nanoparticles").first
    raise ArgumentError, "Cannot import 'Protein Corona Fingerprinting Predicts the Cellular Interaction of Gold and Silver Nanoparticles' dataset" unless training_dataset
  end
  prediction_feature ||= Feature.where(name: "log2(Net cell association)", category: "TOX").first

  model_validation = self.new(
    :endpoint => prediction_feature.name,
    :source => prediction_feature.source,
    :species => "A549 human lung epithelial carcinoma cells",
    :unit => prediction_feature.unit
  )
  model = LazarRegression.create prediction_feature: prediction_feature, training_dataset: training_dataset, algorithms: algorithms
  model_validation[:model_id] = model.id
  repeated_cv = OpenTox::Validation::RepeatedCrossValidation.create model, 10, 5
  model_validation[:repeated_crossvalidation_id] = repeated_cv.id
  model_validation.save
  model_validation
end

Instance Method Details

#algorithms ⇒ `Hash`

Get algorithms

Returns:

(Hash)



465
466
467

# File 'lib/model.rb', line 465

def algorithms
  model.algorithms
end

#classification? ⇒ `TrueClass`, `FalseClass`

Is it a classification model

Returns:

(TrueClass, FalseClass)



495
496
497

# File 'lib/model.rb', line 495

def classification?
  model.is_a? LazarClassification
end

#crossvalidations ⇒ `Array<OpenTox::CrossValidation]`

Get crossvalidations

Returns:

(Array<OpenTox::CrossValidation]) —

Array<OpenTox::CrossValidation]



483
484
485

# File 'lib/model.rb', line 483

def crossvalidations
  repeated_crossvalidation.crossvalidations
end

#model ⇒ `OpenTox::Model::Lazar`

Get lazar model

Returns:

(OpenTox::Model::Lazar)



459
460
461

# File 'lib/model.rb', line 459

def model
  Lazar.find model_id
end

#predict(object) ⇒ `Hash`, ...

Predict a substance (compound or nanoparticle), an array of substances or a dataset

Parameters:

(OpenTox::Compound, OpenTox::Nanoparticle, Array<OpenTox::Substance>, OpenTox::Dataset)

Returns:

(Hash, Array<Hash>, OpenTox::Dataset)



447
448
449

# File 'lib/model.rb', line 447

def predict object
  model.predict object
end

#prediction_feature ⇒ `OpenTox::Feature`

Get prediction feature

Returns:

(OpenTox::Feature)



471
472
473

# File 'lib/model.rb', line 471

def prediction_feature
  model.prediction_feature
end

#regression? ⇒ `TrueClass`, `FalseClass`

Is it a regression model

Returns:

(TrueClass, FalseClass)



489
490
491

# File 'lib/model.rb', line 489

def regression?
  model.is_a? LazarRegression
end

#repeated_crossvalidation ⇒ `OpenTox::Validation::RepeatedCrossValidation`

Get repeated crossvalidations

Returns:

(OpenTox::Validation::RepeatedCrossValidation)



477
478
479

# File 'lib/model.rb', line 477

def repeated_crossvalidation
  OpenTox::Validation::RepeatedCrossValidation.find repeated_crossvalidation_id # full class name required
end

#training_dataset ⇒ `OpenTox::Dataset`

Get training dataset

Returns:

(OpenTox::Dataset)



453
454
455

# File 'lib/model.rb', line 453

def training_dataset
  model.training_dataset
end

Class: OpenTox::Model::Validation

Overview

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.from_csv_file(file) ⇒ OpenTox::Model::Validation

.from_enanomapper(training_dataset: nil, prediction_feature: nil, algorithms: nil) ⇒ OpenTox::Model::Validation

Instance Method Details

#algorithms ⇒ Hash

#classification? ⇒ TrueClass, FalseClass

#crossvalidations ⇒ Array<OpenTox::CrossValidation]

#model ⇒ OpenTox::Model::Lazar

#predict(object) ⇒ Hash, ...

#prediction_feature ⇒ OpenTox::Feature

#regression? ⇒ TrueClass, FalseClass

#repeated_crossvalidation ⇒ OpenTox::Validation::RepeatedCrossValidation

#training_dataset ⇒ OpenTox::Dataset

.from_csv_file(file) ⇒ `OpenTox::Model::Validation`

.from_enanomapper(training_dataset: nil, prediction_feature: nil, algorithms: nil) ⇒ `OpenTox::Model::Validation`

#algorithms ⇒ `Hash`

#classification? ⇒ `TrueClass`, `FalseClass`

#crossvalidations ⇒ `Array<OpenTox::CrossValidation]`

#model ⇒ `OpenTox::Model::Lazar`

#predict(object) ⇒ `Hash`, ...

#prediction_feature ⇒ `OpenTox::Feature`

#regression? ⇒ `TrueClass`, `FalseClass`

#repeated_crossvalidation ⇒ `OpenTox::Validation::RepeatedCrossValidation`

#training_dataset ⇒ `OpenTox::Dataset`