Class: Nimbus::Tree

Inherits:
Object
  • Object
show all
Defined in:
lib/nimbus/tree.rb

Overview

Tree object representing a random tree.

A tree is generated following this steps:

  • 1: Calculate loss function for the individuals in the node (first node contains all the individuals).

  • 2: Take a random sample of the SNPs (size m << total count of SNPs)

  • 3: Compute the loss function for the split of the sample based on value of every SNP.

  • 4: If the SNP with minimum loss function also minimizes the general loss of the node, split the individuals sample in three nodes, based on value for that SNP [0, 1, or 2]

  • 5: Repeat from 1 for every node until:

    • a) The individuals count in that node is < minimum size OR

    • b) None of the SNP splits has a loss function smaller than the node loss function

  • 6) When a node stops, label the node with the average fenotype value (for regression problems) or the majority class (for classification problems) of the individuals in the node.

Direct Known Subclasses

ClassificationTree, RegressionTree

Constant Summary collapse

NODE_SPLIT_01_2 =
"zero"
NODE_SPLIT_0_12 =
"two"

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(options) ⇒ Tree

Initialize Tree object with the configuration (as in Nimbus::Configuration.tree) options received.



25
26
27
28
29
# File 'lib/nimbus/tree.rb', line 25

def initialize(options)
  @snp_total_count = options[:snp_total_count]
  @snp_sample_size = options[:snp_sample_size]
  @node_min_size = options[:tree_node_min_size]
end

Instance Attribute Details

#generalization_errorObject

Returns the value of attribute generalization_error.



18
19
20
# File 'lib/nimbus/tree.rb', line 18

def generalization_error
  @generalization_error
end

#id_to_fenotypeObject

Returns the value of attribute id_to_fenotype.



19
20
21
# File 'lib/nimbus/tree.rb', line 19

def id_to_fenotype
  @id_to_fenotype
end

#importancesObject

Returns the value of attribute importances.



18
19
20
# File 'lib/nimbus/tree.rb', line 18

def importances
  @importances
end

#individualsObject

Returns the value of attribute individuals.



19
20
21
# File 'lib/nimbus/tree.rb', line 19

def individuals
  @individuals
end

#node_min_sizeObject

Returns the value of attribute node_min_size.



18
19
20
# File 'lib/nimbus/tree.rb', line 18

def node_min_size
  @node_min_size
end

#predictionsObject

Returns the value of attribute predictions.



18
19
20
# File 'lib/nimbus/tree.rb', line 18

def predictions
  @predictions
end

#snp_sample_sizeObject

Returns the value of attribute snp_sample_size.



18
19
20
# File 'lib/nimbus/tree.rb', line 18

def snp_sample_size
  @snp_sample_size
end

#snp_total_countObject

Returns the value of attribute snp_total_count.



18
19
20
# File 'lib/nimbus/tree.rb', line 18

def snp_total_count
  @snp_total_count
end

#structureObject

Returns the value of attribute structure.



18
19
20
# File 'lib/nimbus/tree.rb', line 18

def structure
  @structure
end

#used_snpsObject

Returns the value of attribute used_snps.



18
19
20
# File 'lib/nimbus/tree.rb', line 18

def used_snps
  @used_snps
end

Class Method Details

.traverse(tree_structure, data) ⇒ Object

Class method to traverse a single individual through a tree structure.

Returns the prediction for that individual (the label of the final node reached by the individual).

Raises:



57
58
59
60
61
62
63
64
65
66
67
# File 'lib/nimbus/tree.rb', line 57

def self.traverse(tree_structure, data)
  return tree_structure if tree_structure.is_a?(Numeric) || tree_structure.is_a?(String)

  raise Nimbus::TreeError, "Forest data has invalid structure. Please check your forest data (file)." if !(tree_structure.is_a?(Hash) && tree_structure.keys.size == 1)

  branch = tree_structure.values.first
  split_type = branch[1].to_s
  datum = data_traversing_value(data[tree_structure.keys.first - 1], split_type)

  return self.traverse(branch[datum], data)
end

Instance Method Details

#build_node(individuals_ids, y_hat) ⇒ Object

Creates a node by taking a random sample of the SNPs and computing the loss function for every split by SNP of that sample.



43
44
# File 'lib/nimbus/tree.rb', line 43

def build_node(individuals_ids, y_hat)
end

#estimate_importances(oob_ids) ⇒ Object

Estimation of importance for every SNP.



51
52
# File 'lib/nimbus/tree.rb', line 51

def estimate_importances(oob_ids)
end

#generalization_error_from_oob(oob_ids) ⇒ Object

Compute generalization error for the tree.



47
48
# File 'lib/nimbus/tree.rb', line 47

def generalization_error_from_oob(oob_ids)
end

#seed(all_individuals, individuals_sample, ids_fenotypes) ⇒ Object

Creates the structure of the tree, as a hash of SNP splits and values.

It just initializes the needed variables and then defines the first node of the tree. The rest of the structure of the tree is computed recursively building every node calling build_node.



35
36
37
38
39
40
# File 'lib/nimbus/tree.rb', line 35

def seed(all_individuals, individuals_sample, ids_fenotypes)
  @individuals = all_individuals
  @id_to_fenotype = ids_fenotypes
  @predictions = {}
  @used_snps = []
end