Class: DataModeler::DatasetGen
- Inherits:
-
Object
- Object
- DataModeler::DatasetGen
- Includes:
- ConvertingTimeAndIndices, IteratingBasedOnNext
- Defined in:
- lib/data_modeler/dataset/dataset_gen.rb,
lib/data_modeler/exceptions.rb
Overview
Build train and test datasets for each run of the training.
This diagram should help understanding how it works (win is the input+look_ahead window for first training target)
----------------------------------------> data (time)
|win|train1|t1| -> train starts after window, test after training
|train2|t2| -> train starts after window + 1 tset
|train3|t3| -> train starts after window + 2 tset
Note how the test sets line up. This allows the testing results plots to be continuous, no model is tested on data on which itself has been trained, and all data is used multiple times
Defined Under Namespace
Classes: NoDataLeft, NotEnoughDataError
Instance Attribute Summary collapse
-
#data ⇒ Object
readonly
Returns the value of attribute data.
-
#ds_args ⇒ Object
readonly
Returns the value of attribute ds_args.
-
#first_idx ⇒ Object
readonly
Returns the value of attribute first_idx.
-
#nrows ⇒ Object
readonly
Returns the value of attribute nrows.
-
#test_size ⇒ Object
readonly
Returns the value of attribute test_size.
-
#train_size ⇒ Object
readonly
Returns the value of attribute train_size.
Instance Method Summary collapse
-
#initialize(data, ds_args:, train_size:, test_size:, min_nruns: 1) ⇒ DatasetGen
constructor
@train_size: how many points to predict for each training set @test_size: how many points to predict for each test set.
-
#next ⇒ Array<Dataset, Dataset>
Returns the next pair [trainset, testset] and increments the counter.
-
#peek ⇒ Array<Dataset, Dataset>
Returns the next pair [trainset, testset].
-
#test(nrun) ⇒ Dataset
Builds test set for the training.
-
#to_a ⇒ Array<Array<Array<...>>]
Array<Array<Array<…>>].
- #to_ds_a ⇒ Array<Array[Dataset]>
-
#train(nrun) ⇒ Dataset
Builds training set for the training.
Methods included from ConvertingTimeAndIndices
Methods included from IteratingBasedOnNext
Constructor Details
#initialize(data, ds_args:, train_size:, test_size:, min_nruns: 1) ⇒ DatasetGen
@train_size: how many points to predict for each training set @test_size: how many points to predict for each test set
25 26 27 28 29 30 31 32 33 34 35 |
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 25 def initialize data, ds_args:, train_size:, test_size:, min_nruns: 1 @data = data @ds_args = ds_args @first_idx = first_idx @train_size = train_size @test_size = test_size @local_nrun = 1 # used to iterate over nruns with #next @nrows = data[:time].size validate_enough_data_for min_nruns end |
Instance Attribute Details
#data ⇒ Object (readonly)
Returns the value of attribute data.
15 16 17 |
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 15 def data @data end |
#ds_args ⇒ Object (readonly)
Returns the value of attribute ds_args.
15 16 17 |
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 15 def ds_args @ds_args end |
#first_idx ⇒ Object (readonly)
Returns the value of attribute first_idx.
15 16 17 |
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 15 def first_idx @first_idx end |
#nrows ⇒ Object (readonly)
Returns the value of attribute nrows.
15 16 17 |
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 15 def nrows @nrows end |
#test_size ⇒ Object (readonly)
Returns the value of attribute test_size.
15 16 17 |
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 15 def test_size @test_size end |
#train_size ⇒ Object (readonly)
Returns the value of attribute train_size.
15 16 17 |
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 15 def train_size @train_size end |
Instance Method Details
#next ⇒ Array<Dataset, Dataset>
Returns the next pair [trainset, testset] and increments the counter
69 70 71 |
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 69 def next peek.tap { @local_nrun += 1 } end |
#peek ⇒ Array<Dataset, Dataset>
Returns the next pair [trainset, testset]
61 62 63 |
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 61 def peek [self.train(@local_nrun), self.test(@local_nrun)] end |
#test(nrun) ⇒ Dataset
we already checked pre-training there’s enough data for the test too
Builds test set for the training
53 54 55 56 57 |
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 53 def test nrun first = min_eligible_trg + (nrun-1) * test_size + train_size last = first + test_size DataModeler::Dataset.new data, ds_args.merge(first_idx: first, end_idx: last) end |
#to_a ⇒ Array<Array<Array<...>>]
Returns Array<Array<Array<…>>].
80 81 82 83 84 |
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 80 def to_a to_ds_a.collect do |run| run.collect &:to_a end end |
#to_ds_a ⇒ Array<Array[Dataset]>
78 |
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 78 alias_method :to_ds_a, :to_a |
#train(nrun) ⇒ Dataset
Builds training set for the training
41 42 43 44 45 46 47 |
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 41 def train nrun first = min_eligible_trg + (nrun-1) * test_size last = first + train_size # make sure there's enough data for both train and test raise NoDataLeft unless last + test_size < nrows DataModeler::Dataset.new data, ds_args.merge(first_idx: first, end_idx: last) end |