Class: OpenTox::Model::Validation

Inherits:
Object
  • Object
show all
Includes:
Mongoid::Document, Mongoid::Timestamps, OpenTox
Defined in:
lib/model.rb

Overview

Convenience class for generating and validating lazar models in a single step and predicting substances (compounds and nanoparticles), arrays of substances and datasets

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.from_csv_file(file) ⇒ OpenTox::Model::Validation

Create and validate a lazar model from a csv file with training data and a json file with metadata

Parameters:

  • CSV (File)

    file with two or three columns. The first column is optional and may contain an arbitrary substance ID. The next column should contain either SMILES or InChIs of the training compounds, followed by toxic activities (qualitative or quantitative) in the last column. Use -log10 transformed values for regression datasets. The first line should contain “ID” (optional), either SMILES or InChI and the endpoint name (last column). Add metadata to a JSON file with the same basename containing the fields “species”, “endpoint”, “source”, “qmrf” (optional) and “unit” (regression only). You can find example training data in the data folder of lazar.

Returns:

Raises:

  • (ArgumentError)


502
503
504
505
506
507
508
509
510
511
512
# File 'lib/model.rb', line 502

def self.from_csv_file file
   = file.sub(/csv$/,"json")
  raise ArgumentError, "No metadata file #{}" unless File.exist? 
  model_validation = self.new JSON.parse(File.read())
  training_dataset = Dataset.from_csv_file file
  model = Lazar.create training_dataset: training_dataset
  model_validation[:model_id] = model.id
  model_validation[:repeated_crossvalidation_id] = OpenTox::Validation::RepeatedCrossValidation.create(model).id # full class name required
  model_validation.save
  model_validation
end

.from_enanomapper(training_dataset: nil, prediction_feature: nil, algorithms: nil) ⇒ OpenTox::Model::Validation

Create and validate a nano-lazar model, import data from eNanoMapper if necessary

nano-lazar methods are described in detail in https://github.com/enanomapper/nano-lazar-paper/blob/master/nano-lazar.pdf
*eNanoMapper import is currently broken, because APIs and data formats are constantly changing and we have no resources to track this changes permanently!*

Parameters:

  • training_dataset (OpenTox::Dataset, nil) (defaults to: nil)
  • prediction_feature (OpenTox::Feature, nil) (defaults to: nil)
  • algorithms (Hash, nil) (defaults to: nil)

Returns:



521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
# File 'lib/model.rb', line 521

def self.from_enanomapper training_dataset: nil, prediction_feature:nil, algorithms: nil
  
  # find/import training_dataset
  training_dataset ||= Dataset.where(:name => "Protein Corona Fingerprinting Predicts the Cellular Interaction of Gold and Silver Nanoparticles").first
  unless training_dataset # try to import 
    Import::Enanomapper.import
    training_dataset = Dataset.where(name: "Protein Corona Fingerprinting Predicts the Cellular Interaction of Gold and Silver Nanoparticles").first
    raise ArgumentError, "Cannot import 'Protein Corona Fingerprinting Predicts the Cellular Interaction of Gold and Silver Nanoparticles' dataset" unless training_dataset
  end
  prediction_feature ||= Feature.where(name: "log2(Net cell association)", category: "TOX").first

  model_validation = self.new(
    :endpoint => prediction_feature.name,
    :source => prediction_feature.source,
    :species => "A549 human lung epithelial carcinoma cells",
    :unit => prediction_feature.unit
  )
  model = LazarRegression.create prediction_feature: prediction_feature, training_dataset: training_dataset, algorithms: algorithms
  model_validation[:model_id] = model.id
  repeated_cv = OpenTox::Validation::RepeatedCrossValidation.create model, 10, 5
  model_validation[:repeated_crossvalidation_id] = repeated_cv.id
  model_validation.save
  model_validation
end

Instance Method Details

#algorithmsHash

Get algorithms

Returns:

  • (Hash)


465
466
467
# File 'lib/model.rb', line 465

def algorithms
  model.algorithms
end

#classification?TrueClass, FalseClass

Is it a classification model

Returns:

  • (TrueClass, FalseClass)


495
496
497
# File 'lib/model.rb', line 495

def classification?
  model.is_a? LazarClassification
end

#crossvalidationsArray<OpenTox::CrossValidation]

Get crossvalidations

Returns:

  • (Array<OpenTox::CrossValidation])

    Array<OpenTox::CrossValidation]



483
484
485
# File 'lib/model.rb', line 483

def crossvalidations
  repeated_crossvalidation.crossvalidations
end

#modelOpenTox::Model::Lazar

Get lazar model



459
460
461
# File 'lib/model.rb', line 459

def model
  Lazar.find model_id
end

#predict(object) ⇒ Hash, ...

Predict a substance (compound or nanoparticle), an array of substances or a dataset



447
448
449
# File 'lib/model.rb', line 447

def predict object
  model.predict object
end

#prediction_featureOpenTox::Feature

Get prediction feature

Returns:

  • (OpenTox::Feature)


471
472
473
# File 'lib/model.rb', line 471

def prediction_feature
  model.prediction_feature
end

#regression?TrueClass, FalseClass

Is it a regression model

Returns:

  • (TrueClass, FalseClass)


489
490
491
# File 'lib/model.rb', line 489

def regression?
  model.is_a? LazarRegression
end

#repeated_crossvalidationOpenTox::Validation::RepeatedCrossValidation

Get repeated crossvalidations



477
478
479
# File 'lib/model.rb', line 477

def repeated_crossvalidation
  OpenTox::Validation::RepeatedCrossValidation.find repeated_crossvalidation_id # full class name required
end

#training_datasetOpenTox::Dataset

Get training dataset

Returns:



453
454
455
# File 'lib/model.rb', line 453

def training_dataset
  model.training_dataset
end