Module: Daru::Maths::Statistics::Vector

Included in:
Vector
Defined in:
lib/daru/maths/statistics/vector.rb

Instance Method Summary collapse

Instance Method Details

#acf(max_lags = nil) ⇒ Object

Calculates the autocorrelation coefficients of the series.

The first element is always 1, since that is the correlation of the series with itself.

Examples:

ts = Daru::Vector.new((1..100).map { rand })

ts.acf   # => array with first 21 autocorrelations
ts.acf 3 # => array with first 3 autocorrelations


518
519
520
521
522
523
524
525
526
527
528
529
530
531
# File 'lib/daru/maths/statistics/vector.rb', line 518

def acf(max_lags = nil)
  max_lags ||= (10 * Math.log10(size)).to_i

  (0..max_lags).map do |i|
    if i == 0
      1.0
    else
      m = self.mean
      # can't use Pearson coefficient since the mean for the lagged series should
      # be the same as the regular series
      ((self - m) * (self.lag(i) - m)).sum / self.variance_sample / (self.size - 1)
    end
  end
end

#acvf(demean = true, unbiased = true) ⇒ Object

Provides autocovariance.

Options

  • :demean = true; optional. Supply false if series is not to be demeaned

  • :unbiased = true; optional. true/false for unbiased/biased form of autocovariance

Returns

Autocovariance value



543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
# File 'lib/daru/maths/statistics/vector.rb', line 543

def acvf(demean = true, unbiased = true)
  opts = {
    demean: true,
    unbaised: true
  }.merge(opts)

  demean   = opts[:demean]
  unbiased = opts[:unbiased]
  if demean
    demeaned_series = self - self.mean
  else
    demeaned_series = self
  end

  n = (10 * Math.log10(size)).to_i + 1
  m = self.mean
  if unbiased
    d = Array.new(self.size, self.size)
  else
    d = ((1..self.size).to_a.reverse)[0..n]
  end

  0.upto(n - 1).map do |i|
    (demeaned_series * (self.lag(i) - m)).sum / d[i]
  end
end

#average_deviation_population(m = nil) ⇒ Object Also known as: adp



205
206
207
208
209
210
211
# File 'lib/daru/maths/statistics/vector.rb', line 205

def average_deviation_population m=nil
  type == :numeric or raise TypeError, "Vector must be numeric"
  m ||= mean
  (@data.inject( 0 ) { |memo, val| 
    @missing_values.has_key?(val) ? memo : ( val - m ).abs + memo
  }).quo( n_valid )
end

#box_cox_transformation(lambda) ⇒ Object

:nodoc:



288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
# File 'lib/daru/maths/statistics/vector.rb', line 288

def box_cox_transformation lambda # :nodoc:
  raise "Should be a numeric" unless @type == :numeric

  self.recode do |x|
    if !x.nil?
      if(lambda == 0)
        Math.log(x)
      else
        (x ** lambda - 1).quo(lambda)
      end
    else
      nil
    end
  end
end

#centerObject

Center data by subtracting the mean from each non-nil value.



270
271
272
# File 'lib/daru/maths/statistics/vector.rb', line 270

def center
  self - mean
end

#coefficient_of_variationObject Also known as: cov



106
107
108
# File 'lib/daru/maths/statistics/vector.rb', line 106

def coefficient_of_variation
  standard_deviation_sample / mean
end

#count(value = false) ⇒ Object

Retrieves number of cases which comply condition. If block given, retrieves number of instances where block returns true. If other values given, retrieves the frequency for this value. If no value given, counts the number of non-nil elements in the Vector.



114
115
116
117
118
119
120
121
122
123
# File 'lib/daru/maths/statistics/vector.rb', line 114

def count value=false
  if block_given?
    @data.inject(0){ |memo, val| memo += 1 if yield val; memo}
  elsif value
    val = frequencies[value]
    val.nil? ? 0 : val
  else
    size - @missing_positions.size
  end
end

#cumsumObject

Calculate cumulative sum of Vector



571
572
573
574
575
576
577
578
579
580
581
582
583
584
# File 'lib/daru/maths/statistics/vector.rb', line 571

def cumsum
  result = []
  acc = 0
  @data.each do |d|
    if @missing_values.has_key?(d)
      result << nil
    else
      acc += d
      result << acc
    end
  end

  Daru::Vector.new(result, index: @index)
end

#dichotomize(low = nil) ⇒ Object

Dichotomize the vector with 0 and 1, based on lowest value. If parameter is defined, this value and lower will be 0 and higher, 1.



255
256
257
258
259
260
261
262
263
264
265
266
267
# File 'lib/daru/maths/statistics/vector.rb', line 255

def dichotomize(low = nil)
  low ||= factors.min

  self.recode do |x|
    if x.nil? 
      nil
    elsif x > low
      1
    else
      0
    end
  end
end

#diff(max_lags = 1) ⇒ Daru::Vector

Performs the difference of the series. Note: The first difference of series is X(t) - X(t-1) But, second difference of series is NOT X(t) - X(t-2) It is the first difference of the first difference

> (X(t) - X(t-1)) - (X(t-1) - X(t-2))

Arguments

  • max_lags: integer, (default: 1), number of differences reqd.

Examples:

Using #diff


ts = Daru::Vector.new((1..10).map { rand })
         # => [0.69, 0.23, 0.44, 0.71, ...]

ts.diff   # => [nil, -0.46, 0.21, 0.27, ...]

Returns:



385
386
387
388
389
390
391
392
393
# File 'lib/daru/maths/statistics/vector.rb', line 385

def diff(max_lags = 1)
  ts = self
  difference = []
  max_lags.times do
    difference = ts - ts.lag
    ts = difference
  end
  difference
end

#ema(n = 10, wilder = false) ⇒ Daru::Vector

Exponential Moving Average. Calculates an exponential moving average of the series using a specified parameter. If wilder is false (the default) then the EMA uses a smoothing value of 2 / (n + 1), if it is true then it uses the Welles Wilder smoother of 1 / n.

Warning for EMA usage: EMAs are unstable for small series, as they use a lot more than n observations to calculate. The series is stable if the size of the series is >= 3.45 * (n + 1)

Examples:

Using ema


ts = (1..100).map { rand }.to_ts
         # => [0.69, 0.23, 0.44, 0.71, ...]

# first 9 observations are nil
ts.ema   # => [ ... nil, 0.509... , 0.433..., ... ]

Parameters:

  • n (Integer) (defaults to: 10)

    (10) Loopback length.

  • wilder (TrueClass, FalseClass) (defaults to: false)

    (false) If true, 1/n value is used for smoothing; if false, uses 2/(n+1) value

Returns:



469
470
471
472
473
474
475
476
477
478
479
480
481
482
# File 'lib/daru/maths/statistics/vector.rb', line 469

def ema(n = 10, wilder = false)
  smoother = wilder ? 1.0 / n : 2.0 / (n + 1)
  # need to start everything from the first non-nil observation
  start = @data.index { |i| i != nil }
  # first n - 1 observations are nil
  base = [nil] * (start + n - 1)
  # nth observation is just a moving average
  base << @data[start...(start + n)].inject(0.0) { |s, a| a.nil? ? s : s + a } / n
  (start + n).upto size - 1 do |i|
    base << self[i] * smoother + (1 - smoother) * base.last
  end

  Daru::Vector.new(base, index: @index)
end

#factorsObject

Retrieve unique values of non-nil data



52
53
54
# File 'lib/daru/maths/statistics/vector.rb', line 52

def factors
  only_valid.uniq.reset_index!
end

#freqsObject



86
87
88
# File 'lib/daru/maths/statistics/vector.rb', line 86

def freqs
  Daru::Vector.new(frequencies)
end

#frequenciesObject



76
77
78
79
80
81
82
83
84
# File 'lib/daru/maths/statistics/vector.rb', line 76

def frequencies
  @data.inject({}) do |hash, element|
    unless element.nil?
      hash[element] ||= 0
      hash[element] += 1
    end
    hash
  end
end

#kurtosis(m = nil) ⇒ Object



195
196
197
198
199
200
201
202
203
# File 'lib/daru/maths/statistics/vector.rb', line 195

def kurtosis m=nil
  if @data.respond_to? :kurtosis
    @data.kurtosis
  else
    m ||= mean
    fo  = @data.inject(0){ |a, x| a + ((x - m) ** 4) }
    fo.quo((@size - @missing_positions.size) * standard_deviation_sample(m) ** 4) - 3
  end
end

#macd(fast = 12, slow = 26, signal = 9) ⇒ Object

Moving Average Convergence-Divergence. Calculates the MACD (moving average convergence-divergence) of the time series - this is a comparison of a fast EMA with a slow EMA.

Arguments

  • fast: integer, (default = 12) - fast component of MACD

  • slow: integer, (default = 26) - slow component of MACD

  • signal: integer, (default = 9) - signal component of MACD

Usage

ts = Daru::Vector.new((1..100).map { rand })
         # => [0.69, 0.23, 0.44, 0.71, ...]
ts.macd(13)

Returns

Array of two Daru::Vectors - comparison of fast EMA with slow and EMA with signal value



503
504
505
506
# File 'lib/daru/maths/statistics/vector.rb', line 503

def macd(fast = 12, slow = 26, signal = 9)
  series = ema(fast) - ema(slow)
  [series, series.ema(signal)]
end

#max(return_type = :stored_type) ⇒ Object

Maximum element of the vector.

Parameters:

  • return_type (Symbol) (defaults to: :stored_type)

    Data type of the returned value. Defaults to returning only the maximum number but passing :vector will return a Daru::Vector with the index of the corresponding maximum value.



61
62
63
64
65
66
67
68
# File 'lib/daru/maths/statistics/vector.rb', line 61

def max return_type=:stored_type
  max_value = @data.max
  if return_type == :vector
    Daru::Vector.new({index_of(max_value) => max_value}, name: @name, dtype: @dtype)
  else
    max_value
  end
end

#max_indexDaru::Vector

Return a Vector with the max element and its index.

Returns:



72
73
74
# File 'lib/daru/maths/statistics/vector.rb', line 72

def max_index
  max :vector
end

#meanObject



8
9
10
# File 'lib/daru/maths/statistics/vector.rb', line 8

def mean
  @data.mean
end

#medianObject



28
29
30
# File 'lib/daru/maths/statistics/vector.rb', line 28

def median
  @data.respond_to?(:median) ? @data.median : percentile(50)
end

#median_absolute_deviationObject Also known as: mad



37
38
39
40
# File 'lib/daru/maths/statistics/vector.rb', line 37

def median_absolute_deviation
  m = median
  recode {|val| (val - m).abs }.median
end

#minObject



20
21
22
# File 'lib/daru/maths/statistics/vector.rb', line 20

def min
  @data.min
end

#modeObject



32
33
34
35
# File 'lib/daru/maths/statistics/vector.rb', line 32

def mode
  freqs = frequencies.values
  @data[freqs.index(freqs.max)]
end

#percentile(q, strategy = :midpoint) ⇒ Object Also known as: percentil

Returns the value of the percentile q

Accepts an optional second argument specifying the strategy to interpolate when the requested percentile lies between two data points a and b Valid strategies are:

  • :midpoint (Default): (a + b) / 2

  • :linear : a + (b - a) * d where d is the decimal part of the index between a and b.

References

This is the NIST recommended method (en.wikipedia.org/wiki/Percentile#NIST_method)



223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
# File 'lib/daru/maths/statistics/vector.rb', line 223

def percentile(q, strategy = :midpoint)
  sorted = only_valid(:array).sort

  case strategy
  when :midpoint
    v = (n_valid * q).quo(100)
    if(v.to_i!=v)
      sorted[v.to_i]
    else
      (sorted[(v-0.5).to_i].to_f + sorted[(v+0.5).to_i]).quo(2)
    end
  when :linear
    index = (q / 100.0) * (n_valid + 1)

    k = index.truncate
    d = index % 1

    if k == 0
      sorted[0]
    elsif k >= sorted.size
      sorted[-1]
    else
      sorted[k - 1] + d * (sorted[k] - sorted[k - 1])
    end
  else
    raise NotImplementedError.new "Unknown strategy #{strategy.to_s}"
  end
end

#productObject



16
17
18
# File 'lib/daru/maths/statistics/vector.rb', line 16

def product
  @data.product
end

#proportion(value = 1) ⇒ Object



135
136
137
# File 'lib/daru/maths/statistics/vector.rb', line 135

def proportion value=1
  frequencies[value].quo(n_valid).to_f
end

#proportionsObject



90
91
92
93
# File 'lib/daru/maths/statistics/vector.rb', line 90

def proportions
  len = n_valid
  frequencies.inject({}) { |hash, arr| hash[arr[0]] = arr[1] / len; hash }
end

#rangeObject



24
25
26
# File 'lib/daru/maths/statistics/vector.rb', line 24

def range
  max - min
end

#rankedObject



95
96
97
98
99
100
101
102
103
104
# File 'lib/daru/maths/statistics/vector.rb', line 95

def ranked
  sum = 0
  r = frequencies.sort.inject( {} ) do |memo, val|
    memo[val[0]] = ((sum + 1) + (sum + val[1])).quo(2)
    sum += val[1]
    memo
  end

  recode { |e| r[e] }
end

#rolling(function, n = 10) ⇒ Daru::Vector

Calculate the rolling function for a loopback value.

Examples:

Using #rolling

ts = Daru::Vector.new((1..100).map { rand })
         # => [0.69, 0.23, 0.44, 0.71, ...]
# first 9 observations are nil
ts.rolling(:mean)    # => [ ... nil, 0.484... , 0.445... , 0.513 ... , ... ]

Parameters:

  • function (Symbol)

    The rolling function to be applied. Can be any function applicatble to Daru::Vector (:mean, :median, :count, :min, :max, etc.)

  • n (Integer) (defaults to: 10)

    (10) A non-negative value which serves as the loopback length.

Returns:



407
408
409
410
411
412
413
414
# File 'lib/daru/maths/statistics/vector.rb', line 407

def rolling function, n=10
  Daru::Vector.new(
    [nil] * (n - 1) + 
    (0..(size - n)).map do |i|
      Daru::Vector.new(@data[i...(i + n)]).send(function)
    end, index: @index
  )
end

#rolling_countObject

Calculate rolling non-missing count

Parameters:

  • n (Integer)

    (10) Loopback length



440
441
442
443
444
# File 'lib/daru/maths/statistics/vector.rb', line 440

[:count, :mean, :median, :max, :min, :sum, :std, :variance].each do |meth|
  define_method("rolling_#{meth}".to_sym) do |n=10|
    rolling(meth, n)
  end
end

#rolling_maxObject

Calculate rolling max value

Parameters:

  • n (Integer)

    (10) Loopback length



440
441
442
443
444
# File 'lib/daru/maths/statistics/vector.rb', line 440

[:count, :mean, :median, :max, :min, :sum, :std, :variance].each do |meth|
  define_method("rolling_#{meth}".to_sym) do |n=10|
    rolling(meth, n)
  end
end

#rolling_meanObject

Calculate rolling average

Parameters:

  • n (Integer)

    (10) Loopback length



440
441
442
443
444
# File 'lib/daru/maths/statistics/vector.rb', line 440

[:count, :mean, :median, :max, :min, :sum, :std, :variance].each do |meth|
  define_method("rolling_#{meth}".to_sym) do |n=10|
    rolling(meth, n)
  end
end

#rolling_medianObject

Calculate rolling median

Parameters:

  • n (Integer)

    (10) Loopback length



440
441
442
443
444
# File 'lib/daru/maths/statistics/vector.rb', line 440

[:count, :mean, :median, :max, :min, :sum, :std, :variance].each do |meth|
  define_method("rolling_#{meth}".to_sym) do |n=10|
    rolling(meth, n)
  end
end

#rolling_minObject

Calculate rolling min value

Parameters:

  • n (Integer)

    (10) Loopback length



440
441
442
443
444
# File 'lib/daru/maths/statistics/vector.rb', line 440

[:count, :mean, :median, :max, :min, :sum, :std, :variance].each do |meth|
  define_method("rolling_#{meth}".to_sym) do |n=10|
    rolling(meth, n)
  end
end

#rolling_stdObject

Calculate rolling standard deviation

Parameters:

  • n (Integer)

    (10) Loopback length



440
441
442
443
444
# File 'lib/daru/maths/statistics/vector.rb', line 440

[:count, :mean, :median, :max, :min, :sum, :std, :variance].each do |meth|
  define_method("rolling_#{meth}".to_sym) do |n=10|
    rolling(meth, n)
  end
end

#rolling_sumObject

Calculate rolling sum

Parameters:

  • n (Integer)

    (10) Loopback length



440
441
442
443
444
# File 'lib/daru/maths/statistics/vector.rb', line 440

[:count, :mean, :median, :max, :min, :sum, :std, :variance].each do |meth|
  define_method("rolling_#{meth}".to_sym) do |n=10|
    rolling(meth, n)
  end
end

#rolling_varianceObject

Calculate rolling variance

Parameters:

  • n (Integer)

    (10) Loopback length



440
441
442
443
444
# File 'lib/daru/maths/statistics/vector.rb', line 440

[:count, :mean, :median, :max, :min, :sum, :std, :variance].each do |meth|
  define_method("rolling_#{meth}".to_sym) do |n=10|
    rolling(meth, n)
  end
end

#sample_with_replacement(sample = 1) ⇒ Object

Returns an random sample of size n, with replacement, only with non-nil data.

In all the trails, every item have the same probability of been selected.



333
334
335
336
337
338
339
340
341
# File 'lib/daru/maths/statistics/vector.rb', line 333

def sample_with_replacement(sample=1)
  if @data.respond_to? :sample_with_replacement
    @data.sample_with_replacement sample
  else
    valid = missing_positions.empty? ? self : self.only_valid
    vds = valid.size
    (0...sample).collect{ valid[rand(vds)] }
  end
end

#sample_without_replacement(sample = 1) ⇒ Object

Returns an random sample of size n, without replacement, only with valid data.

Every element could only be selected once.

A sample of the same size of the vector is the vector itself.



349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
# File 'lib/daru/maths/statistics/vector.rb', line 349

def sample_without_replacement(sample=1)
  if @data.respond_to? :sample_without_replacement
    @data.sample_without_replacement sample
  else
    valid = missing_positions.empty? ? self : self.only_valid 
    raise ArgumentError, "Sample size couldn't be greater than n" if 
      sample > valid.size
    out  = []
    size = valid.size
    while out.size < sample
      value = rand(size)
      out.push(value) if !out.include?(value)
    end

    out.collect{|i| valid[i]}
  end
end

#skew(m = nil) ⇒ Object

Calculate skewness using (sigma(xi - mean)^3)/((N)*std_dev_sample^3)



185
186
187
188
189
190
191
192
193
# File 'lib/daru/maths/statistics/vector.rb', line 185

def skew m=nil
  if @data.respond_to? :skew
    @data.skew
  else
    m ||= mean
    th  = @data.inject(0) { |memo, val| memo + ((val - m)**3) }
    th.quo ((@size - @missing_positions.size) * (standard_deviation_sample(m)**3))
  end
end

#standard_deviation_population(m = nil) ⇒ Object Also known as: sdp



166
167
168
169
170
171
172
173
# File 'lib/daru/maths/statistics/vector.rb', line 166

def standard_deviation_population m=nil
  m ||= mean
  if @data.respond_to? :standard_deviation_population
    @data.standard_deviation_population(m)
  else
    Math::sqrt(variance_population(m))
  end
end

#standard_deviation_sample(m = nil) ⇒ Object Also known as: sds, sd



175
176
177
178
179
180
181
182
# File 'lib/daru/maths/statistics/vector.rb', line 175

def standard_deviation_sample m=nil
  m ||= mean
  if @data.respond_to? :standard_deviation_sample
    @data.standard_deviation_sample m
  else
    Math::sqrt(variance_sample(m))
  end
end

#standard_errorObject Also known as: se



43
44
45
# File 'lib/daru/maths/statistics/vector.rb', line 43

def standard_error
  standard_deviation_sample/(Math::sqrt((n_valid)))
end

#standardize(use_population = false) ⇒ Object

Standardize data.

Arguments

  • use_population - Pass as true if you want to use population

standard deviation instead of sample standard deviation.



280
281
282
283
284
285
286
# File 'lib/daru/maths/statistics/vector.rb', line 280

def standardize use_population=false
  m ||= mean
  sd = use_population ? sdp : sds
  return Daru::Vector.new([nil]*@size) if m.nil? or sd == 0.0

  vector_standardized_compute m, sd
end

#sumObject



12
13
14
# File 'lib/daru/maths/statistics/vector.rb', line 12

def sum
  @data.sum
end

#sum_of_squared_deviationObject



47
48
49
# File 'lib/daru/maths/statistics/vector.rb', line 47

def sum_of_squared_deviation
  (@data.inject(0) { |a,x| x.square + a } - (sum.square.quo(n_valid)).to_f).to_f
end

#sum_of_squares(m = nil) ⇒ Object Also known as: ss



159
160
161
162
163
164
# File 'lib/daru/maths/statistics/vector.rb', line 159

def sum_of_squares(m=nil)
  m ||= mean
  @data.inject(0) { |memo, val| 
    @missing_values.has_key?(val) ? memo : (memo + (val - m)**2) 
  }
end

#value_countsObject

Count number of occurences of each value in the Vector



126
127
128
129
130
131
132
133
# File 'lib/daru/maths/statistics/vector.rb', line 126

def value_counts
  values = {}
  @data.each do |d|
    values[d] ? values[d] += 1 : values[d] = 1
  end

  Daru::Vector.new(values)
end

#variance_population(m = nil) ⇒ Object

Population variance with denominator (N)



150
151
152
153
154
155
156
157
# File 'lib/daru/maths/statistics/vector.rb', line 150

def variance_population m=nil
  m ||= mean
  if @data.respond_to? :variance_population
    @data.variance_population m
  else
    sum_of_squares(m).quo((n_valid)).to_f            
  end
end

#variance_sample(m = nil) ⇒ Object Also known as: variance

Sample variance with denominator (N-1)



140
141
142
143
144
145
146
147
# File 'lib/daru/maths/statistics/vector.rb', line 140

def variance_sample m=nil
  m ||= self.mean
  if @data.respond_to? :variance_sample
    @data.variance_sample m
  else
    sum_of_squares(m).quo((n_valid) - 1)
  end
end

#vector_centered_compute(m) ⇒ Object



319
320
321
322
323
324
325
326
# File 'lib/daru/maths/statistics/vector.rb', line 319

def vector_centered_compute(m)
  if @data.respond_to? :vector_centered_compute
    @data.vector_centered_compute(m)
  else
    Daru::Vector.new @data.collect { |x| x.nil? ? nil : x.to_f-m },
      index: index, name: name, dtype: dtype
  end
end

#vector_percentileObject

Replace each non-nil value in the vector with its percentile.



305
306
307
308
# File 'lib/daru/maths/statistics/vector.rb', line 305

def vector_percentile
  c = size - missing_positions.size
  ranked.recode! { |i| i.nil? ? nil : (i.quo(c)*100).to_f }
end

#vector_standardized_compute(m, sd) ⇒ Object



310
311
312
313
314
315
316
317
# File 'lib/daru/maths/statistics/vector.rb', line 310

def vector_standardized_compute(m,sd)
  if @data.respond_to? :vector_standardized_compute
    @data.vector_standardized_compute(m,sd)
  else
    Daru::Vector.new @data.collect { |x| x.nil? ? nil : (x.to_f - m).quo(sd) },
      index: index, name: name, dtype: dtype
  end
end