Class: DataModeler::Dataset::Generator
- Inherits:
-
Object
- Object
- DataModeler::Dataset::Generator
- Includes:
- ConvertingTimeAndIndices, IteratingBasedOnNext
- Defined in:
- lib/data_modeler/dataset/generator.rb,
lib/data_modeler/helper.rb
Overview
Build train and test datasets for each run of the training.
Train and test sets are seen as moving windows on the data. Alignment is designed to provide continuous testing results over (most of) the data. The following diagram exemplifies this: the training sets ‘t1`, `t2` and `t3` are aligned such that their results can be plotted countinuously against the obserevations. (b) is the amount of data covering for the input+look_ahead window uset for the first target.
data: ----------------------> (time, datapoints)
run1: (b)|train1|t1| -> train starts after (b), test after training
run2: |train2|t2| -> train starts after (b) + 1 tset
run3: |train3|t3| -> train starts after (b) + 2 tset
Note how the test sets line up. This allows the testing results plots to be continuous, while no model is tested on data on which itself has been trained. All data is used multiple times, alternately both as train and test sets.
Defined Under Namespace
Classes: NoDataLeft, NotEnoughDataError
Instance Attribute Summary collapse
-
#data ⇒ Object
readonly
Returns the value of attribute data.
-
#ds_args ⇒ Object
readonly
Returns the value of attribute ds_args.
-
#first_idx ⇒ Object
readonly
Returns the value of attribute first_idx.
-
#nrows ⇒ Object
readonly
Returns the value of attribute nrows.
-
#test_size ⇒ Object
readonly
Returns the value of attribute test_size.
-
#train_size ⇒ Object
readonly
Returns the value of attribute train_size.
Instance Method Summary collapse
-
#initialize(data, ds_args:, train_size:, test_size:, min_nruns: 1) ⇒ Generator
constructor
A new instance of Generator.
-
#next ⇒ Array<Dataset, Dataset>
Returns the next pair ‘[trainset, testset]` and increments the counter.
-
#peek ⇒ Array<Dataset, Dataset>
Returns the next pair ‘[trainset, testset]`.
-
#test(nrun) ⇒ Dataset
Builds test sets for model testing.
-
#to_a ⇒ Array<Array<Array<...>>]
Returns an array of arrays (list of inputs-targets pairs).
-
#to_ds_a ⇒ Array<Array[Dataset]>
Returns an array of datasets.
-
#train(nrun) ⇒ Dataset
Builds training sets for model training.
Methods included from ConvertingTimeAndIndices
Methods included from IteratingBasedOnNext
Constructor Details
#initialize(data, ds_args:, train_size:, test_size:, min_nruns: 1) ⇒ Generator
Returns a new instance of Generator.
30 31 32 33 34 35 36 37 38 39 40 |
# File 'lib/data_modeler/dataset/generator.rb', line 30 def initialize data, ds_args:, train_size:, test_size:, min_nruns: 1 @data = data @ds_args = ds_args @first_idx = first_idx @train_size = train_size @test_size = test_size reset_iteration @nrows = data[:time].size validate_enough_data_for min_nruns end |
Instance Attribute Details
#data ⇒ Object (readonly)
Returns the value of attribute data.
19 20 21 |
# File 'lib/data_modeler/dataset/generator.rb', line 19 def data @data end |
#ds_args ⇒ Object (readonly)
Returns the value of attribute ds_args.
19 20 21 |
# File 'lib/data_modeler/dataset/generator.rb', line 19 def ds_args @ds_args end |
#first_idx ⇒ Object (readonly)
Returns the value of attribute first_idx.
19 20 21 |
# File 'lib/data_modeler/dataset/generator.rb', line 19 def first_idx @first_idx end |
#nrows ⇒ Object (readonly)
Returns the value of attribute nrows.
19 20 21 |
# File 'lib/data_modeler/dataset/generator.rb', line 19 def nrows @nrows end |
#test_size ⇒ Object (readonly)
Returns the value of attribute test_size.
19 20 21 |
# File 'lib/data_modeler/dataset/generator.rb', line 19 def test_size @test_size end |
#train_size ⇒ Object (readonly)
Returns the value of attribute train_size.
19 20 21 |
# File 'lib/data_modeler/dataset/generator.rb', line 19 def train_size @train_size end |
Instance Method Details
#next ⇒ Array<Dataset, Dataset>
Returns the next pair ‘[trainset, testset]` and increments the counter
80 81 82 |
# File 'lib/data_modeler/dataset/generator.rb', line 80 def next peek.tap { @local_nrun += 1 } end |
#peek ⇒ Array<Dataset, Dataset>
Returns the next pair ‘[trainset, testset]`
74 75 76 |
# File 'lib/data_modeler/dataset/generator.rb', line 74 def peek [self.train(@local_nrun), self.test(@local_nrun)] end |
#test(nrun) ⇒ Dataset
train or test have no meaning alone, and train always comes first. Hence, ‘#train` checks if enough `data` is available for both `train`+`test`.
Builds test sets for model testing
62 63 64 65 66 |
# File 'lib/data_modeler/dataset/generator.rb', line 62 def test nrun first = min_eligible_trg + (nrun-1) * test_size + train_size last = first + test_size DataModeler::Dataset.new data, ds_args.merge(first_idx: first, end_idx: last) end |
#to_a ⇒ Array<Array<Array<...>>]
Returns an array of arrays (list of inputs-targets pairs)
94 95 96 97 98 |
# File 'lib/data_modeler/dataset/generator.rb', line 94 def to_a to_ds_a.collect do |train_test_for_run| train_test_for_run.collect &:to_a end end |
#to_ds_a ⇒ Array<Array[Dataset]>
Returns an array of datasets
91 |
# File 'lib/data_modeler/dataset/generator.rb', line 91 alias_method :to_ds_a, :to_a |
#train(nrun) ⇒ Dataset
train or test have no meaning alone, and train always comes first. Hence, ‘#train` checks if enough `data` is available for both `train`+`test`.
Builds training sets for model training
50 51 52 53 54 55 |
# File 'lib/data_modeler/dataset/generator.rb', line 50 def train nrun first = min_eligible_trg + (nrun-1) * test_size last = first + train_size raise NoDataLeft unless last + test_size < nrows # make sure there's enough data DataModeler::Dataset.new data, ds_args.merge(first_idx: first, end_idx: last) end |