Class: DataModeler::DatasetGen
- Inherits:
-
Object
- Object
- DataModeler::DatasetGen
- Includes:
- DataModeler::Dataset::ConvertingTimeAndIndices, DataModeler::Dataset::IteratingBasedOnNext
- Defined in:
- lib/data_modeler/dataset/dataset_gen.rb,
lib/data_modeler/exceptions.rb
Overview
Build train and test datasets for each run of the training.
This diagram should help understanding how it works (win is the input+look_ahead window for first training target)
----------------------------------------> data (time)
|win|train1|t1| -> train starts after window, test after training
|train2|t2| -> train starts after window + 1 tset
|train3|t3| -> train starts after window + 2 tset
Note how the test sets line up. This allows the testing results plots to be continuous, no model is tested on data on which itself has been trained, and all data is used multiple times
Defined Under Namespace
Classes: NoDataLeft, NotEnoughDataError
Instance Attribute Summary collapse
-
#data ⇒ Object
readonly
Returns the value of attribute data.
-
#ds_args ⇒ Object
readonly
Returns the value of attribute ds_args.
-
#first_idx ⇒ Object
readonly
Returns the value of attribute first_idx.
-
#nrows ⇒ Object
readonly
Returns the value of attribute nrows.
-
#test_size ⇒ Object
readonly
Returns the value of attribute test_size.
-
#train_size ⇒ Object
readonly
Returns the value of attribute train_size.
Instance Method Summary collapse
-
#initialize(data, ds_args:, train_size:, test_size:, min_nruns: 1) ⇒ DatasetGen
constructor
@train_size: how many points to predict for each training set @test_size: how many points to predict for each test set.
-
#next ⇒ Array<Dataset, Dataset>
Returns the next pair [trainset, testset] and increments the counter.
-
#peek ⇒ Array<Dataset, Dataset>
Returns the next pair [trainset, testset].
-
#test(nrun) ⇒ Dataset
Builds test set for the training.
-
#to_a ⇒ Array<Array<Array<...>>]
Returns an array of arrays (list of inputs-targets pairs).
-
#to_ds_a ⇒ Array<Array[Dataset]>
Returns an array of datasets.
-
#train(nrun) ⇒ Dataset
Builds training set for the training.
Methods included from DataModeler::Dataset::ConvertingTimeAndIndices
Methods included from DataModeler::Dataset::IteratingBasedOnNext
Constructor Details
#initialize(data, ds_args:, train_size:, test_size:, min_nruns: 1) ⇒ DatasetGen
@train_size: how many points to predict for each training set @test_size: how many points to predict for each test set
25 26 27 28 29 30 31 32 33 34 35 |
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 25 def initialize data, ds_args:, train_size:, test_size:, min_nruns: 1 @data = data @ds_args = ds_args @first_idx = first_idx @train_size = train_size @test_size = test_size reset_iteration @nrows = data[:time].size validate_enough_data_for min_nruns end |
Instance Attribute Details
#data ⇒ Object (readonly)
Returns the value of attribute data.
15 16 17 |
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 15 def data @data end |
#ds_args ⇒ Object (readonly)
Returns the value of attribute ds_args.
15 16 17 |
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 15 def ds_args @ds_args end |
#first_idx ⇒ Object (readonly)
Returns the value of attribute first_idx.
15 16 17 |
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 15 def first_idx @first_idx end |
#nrows ⇒ Object (readonly)
Returns the value of attribute nrows.
15 16 17 |
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 15 def nrows @nrows end |
#test_size ⇒ Object (readonly)
Returns the value of attribute test_size.
15 16 17 |
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 15 def test_size @test_size end |
#train_size ⇒ Object (readonly)
Returns the value of attribute train_size.
15 16 17 |
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 15 def train_size @train_size end |
Instance Method Details
#next ⇒ Array<Dataset, Dataset>
Returns the next pair [trainset, testset] and increments the counter
73 74 75 |
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 73 def next peek.tap { @local_nrun += 1 } end |
#peek ⇒ Array<Dataset, Dataset>
Returns the next pair [trainset, testset]
67 68 69 |
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 67 def peek [self.train(@local_nrun), self.test(@local_nrun)] end |
#test(nrun) ⇒ Dataset
we already checked pre-training there’s enough data for the test too
Builds test set for the training
55 56 57 58 59 |
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 55 def test nrun first = min_eligible_trg + (nrun-1) * test_size + train_size last = first + test_size DataModeler::Dataset.new data, ds_args.merge(first_idx: first, end_idx: last) end |
#to_a ⇒ Array<Array<Array<...>>]
Returns an array of arrays (list of inputs-targets pairs)
87 88 89 90 91 |
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 87 def to_a to_ds_a.collect do |train_test_for_run| train_test_for_run.collect &:to_a end end |
#to_ds_a ⇒ Array<Array[Dataset]>
Returns an array of datasets
84 |
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 84 alias_method :to_ds_a, :to_a |
#train(nrun) ⇒ Dataset
Builds training set for the training
43 44 45 46 47 48 49 |
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 43 def train nrun first = min_eligible_trg + (nrun-1) * test_size last = first + train_size # make sure there's enough data for both train and test raise NoDataLeft unless last + test_size < nrows DataModeler::Dataset.new data, ds_args.merge(first_idx: first, end_idx: last) end |