Class: DataModeler::DatasetGen

Inherits:
Object
  • Object
show all
Includes:
DataModeler::Dataset::ConvertingTimeAndIndices, DataModeler::Dataset::IteratingBasedOnNext
Defined in:
lib/data_modeler/dataset/dataset_gen.rb,
lib/data_modeler/exceptions.rb

Overview

Build train and test datasets for each run of the training.

This diagram should help understanding how it works (win is the input+look_ahead window for first training target)

----------------------------------------> data (time)
|win|train1|t1|       -> train starts after window, test after training
       |train2|t2|    -> train starts after window + 1 tset
          |train3|t3| -> train starts after window + 2 tset

Note how the test sets line up. This allows the testing results plots to be continuous, no model is tested on data on which itself has been trained, and all data is used multiple times

Defined Under Namespace

Classes: NoDataLeft, NotEnoughDataError

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from DataModeler::Dataset::ConvertingTimeAndIndices

#idx, #time

Methods included from DataModeler::Dataset::IteratingBasedOnNext

#each, #map

Constructor Details

#initialize(data, ds_args:, train_size:, test_size:, min_nruns: 1) ⇒ DatasetGen

@train_size: how many points to predict for each training set @test_size: how many points to predict for each test set

Parameters:

  • data (Hash-like)

    the data, in an object that can be accessed by keys and return a time series per each key. It is required to include and be sorted by a series named time, and for all series to have equal length.

  • ds_args (Hash)

    parameters for the Datasets: inputs, targets, first_idx, end_idx, ntimes. Check class Dataset for details.



25
26
27
28
29
30
31
32
33
34
35
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 25

def initialize data, ds_args:, train_size:, test_size:, min_nruns: 1
  @data = data
  @ds_args = ds_args
  @first_idx = first_idx
  @train_size = train_size
  @test_size = test_size
  reset_iteration

  @nrows = data[:time].size
  validate_enough_data_for min_nruns
end

Instance Attribute Details

#dataObject (readonly)

Returns the value of attribute data.



15
16
17
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 15

def data
  @data
end

#ds_argsObject (readonly)

Returns the value of attribute ds_args.



15
16
17
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 15

def ds_args
  @ds_args
end

#first_idxObject (readonly)

Returns the value of attribute first_idx.



15
16
17
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 15

def first_idx
  @first_idx
end

#nrowsObject (readonly)

Returns the value of attribute nrows.



15
16
17
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 15

def nrows
  @nrows
end

#test_sizeObject (readonly)

Returns the value of attribute test_size.



15
16
17
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 15

def test_size
  @test_size
end

#train_sizeObject (readonly)

Returns the value of attribute train_size.



15
16
17
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 15

def train_size
  @train_size
end

Instance Method Details

#nextArray<Dataset, Dataset>

Returns the next pair [trainset, testset] and increments the counter

Returns:



73
74
75
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 73

def next
  peek.tap { @local_nrun += 1 }
end

#peekArray<Dataset, Dataset>

Returns the next pair [trainset, testset]

Returns:



67
68
69
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 67

def peek
  [self.train(@local_nrun), self.test(@local_nrun)]
end

#test(nrun) ⇒ Dataset

Note:

we already checked pre-training there’s enough data for the test too

Builds test set for the training

Parameters:

  • nrun (Integer)

    will build different train+test for each run

Returns:



55
56
57
58
59
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 55

def test nrun
  first = min_eligible_trg + (nrun-1) * test_size + train_size
  last = first + test_size
  DataModeler::Dataset.new data, ds_args.merge(first_idx: first, end_idx: last)
end

#to_aArray<Array<Array<...>>]

Returns an array of arrays (list of inputs-targets pairs)

Returns:

  • (Array<Array<Array<...>>])

    Array<Array<Array<…>>]



87
88
89
90
91
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 87

def to_a
  to_ds_a.collect do |train_test_for_run|
    train_test_for_run.collect &:to_a
  end
end

#to_ds_aArray<Array[Dataset]>

Returns an array of datasets

Returns:



84
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 84

alias_method :to_ds_a, :to_a

#train(nrun) ⇒ Dataset

Builds training set for the training

Parameters:

  • nrun (Integer)

    will build different train+test for each run

Returns:

Raises:

  • (NoDataLeft)

    when there’s not enough data left for a full train+test



43
44
45
46
47
48
49
# File 'lib/data_modeler/dataset/dataset_gen.rb', line 43

def train nrun
  first = min_eligible_trg + (nrun-1) * test_size
  last = first + train_size
  # make sure there's enough data for both train and test
  raise NoDataLeft unless last + test_size < nrows
  DataModeler::Dataset.new data, ds_args.merge(first_idx: first, end_idx: last)
end