Class: DaruLite::Vector

Inherits:
Object show all
Extended by:
Gem::Deprecate
Includes:
Maths::Arithmetic::Vector, Maths::Statistics::Vector, Aggregatable, Calculatable, Convertible, Duplicatable, Fetchable, Filterable, Indexable, Iterable, Joinable, Missable, Queryable, Setable, Sortable, Enumerable
Defined in:
lib/daru_lite/vector.rb,
lib/daru_lite/vector/setable.rb,
lib/daru_lite/vector/iterable.rb,
lib/daru_lite/vector/joinable.rb,
lib/daru_lite/vector/missable.rb,
lib/daru_lite/vector/sortable.rb,
lib/daru_lite/vector/fetchable.rb,
lib/daru_lite/vector/indexable.rb,
lib/daru_lite/vector/queryable.rb,
lib/daru_lite/vector/filterable.rb,
lib/daru_lite/vector/convertible.rb,
lib/daru_lite/vector/aggregatable.rb,
lib/daru_lite/vector/calculatable.rb,
lib/daru_lite/vector/duplicatable.rb

Overview

rubocop:disable Metrics/ClassLength

Defined Under Namespace

Modules: Aggregatable, Calculatable, Convertible, Duplicatable, Fetchable, Filterable, Indexable, Iterable, Joinable, Missable, Queryable, Setable, Sortable

Constant Summary collapse

DATE_REGEXP =
/^(\d{2}-\d{2}-\d{4}|\d{4}-\d{2}-\d{2})$/

Constants included from Sortable

Sortable::DEFAULT_SORTER

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Queryable

#all?, #any?, #empty?, #include_values?, #match

Methods included from Sortable

#reorder, #reorder!, #sort, #sort_by_index, #sorted_data

Methods included from Setable

#[]=, #set_at

Methods included from Missable

#has_missing_data?, #n_valid, #only_missing, #only_valid, #replace_nils, #replace_nils!, #rolling_fillna, #rolling_fillna!

Methods included from Joinable

#concat

Methods included from Iterable

#apply_method, #each, #each_index, #each_with_index, #map!, #recode, #recode!, #replace_values, #verify

Methods included from Indexable

#detach_index, #has_index?, #index=, #index_of, #indexes, #reindex, #reindex!, #reset_index!

Methods included from Filterable

#apply_where, #delete_if, #keep_if, #only_numerics, #reject_values, #uniq, #where

Methods included from Fetchable

#[], #at, #cut, #get_sub_vector, #head, #last, #positions, #split_by_separator, #split_by_separator_freq, #tail

Methods included from Duplicatable

#clone_structure, #dup

Methods included from Convertible

#to_a, #to_df, #to_h, #to_html, #to_html_tbody, #to_html_thead, #to_json, #to_matrix, #to_s

Methods included from Calculatable

#count_values, #numeric_summary, #object_summary, #summary

Methods included from Aggregatable

#group_by

Methods included from Maths::Statistics::Vector

#acf, #acvf, #average_deviation_population, #box_cox_transformation, #center, #coefficient_of_variation, #count, #covariance_population, #covariance_sample, #cumsum, #describe, #dichotomize, #diff, #ema, #emsd, #emv, #factors, #frequencies, #index_of_max, #index_of_max_by, #index_of_min, #index_of_min_by, #kurtosis, #macd, #max, #max_by, #max_index, #mean, #median, #median_absolute_deviation, #min, #min_by, #mode, #percent_change, #percentile, #product, #proportion, #proportions, #range, #ranked, #rolling, #rolling_count, #rolling_max, #rolling_mean, #rolling_median, #rolling_min, #rolling_std, #rolling_sum, #rolling_variance, #sample_with_replacement, #sample_without_replacement, #skew, #standard_deviation_population, #standard_deviation_sample, #standard_error, #standardize, #sum, #sum_of_squared_deviation, #sum_of_squares, #value_counts, #variance_population, #variance_sample, #vector_centered_compute, #vector_percentile, #vector_standardized_compute

Methods included from Maths::Arithmetic::Vector

#%, #*, #**, #+, #-, #/, #abs, #add, #exp, #round, #sqrt

Constructor Details

#initialize(source, opts = {}) ⇒ Vector

Create a Vector object.

Arguments

Hash. If Array, a numeric index will be created if not supplied in the options. Specifying more index elements than actual values in source will insert nil into the surplus index elements. When a Hash is specified, the keys of the Hash are taken as the index elements and the corresponding values as the values that populate the vector.

Options

  • :name - Name of the vector

  • :index - Index of the vector

  • :dtype - The underlying data type. Can be :array.

Default :array.

  • :missing_values - An Array of the values that are to be treated as ‘missing’.

nil is the default missing value.

Usage

vecarr = DaruLite::Vector.new [1,2,3,4], index: [:a, :e, :i, :o]
vechsh = DaruLite::Vector.new({a: 1, e: 2, i: 3, o: 4})

Parameters:

  • source (Array, Hash)
    • Supply elements in the form of an Array or a



163
164
165
166
167
168
169
170
171
172
# File 'lib/daru_lite/vector.rb', line 163

def initialize(source, opts = {})
  if opts[:type] == :category
    # Initialize category type vector
    extend DaruLite::Category
    initialize_category source, opts
  else
    # Initialize non-category type vector
    initialize_vector source, opts
  end
end

Dynamic Method Handling

This class handles dynamic methods through the method_missing method

#method_missing(name, *args) ⇒ Object



566
567
568
569
570
571
572
573
574
# File 'lib/daru_lite/vector.rb', line 566

def method_missing(name, *args, &)
  if name =~ /^([^=]+)=/
    self[Regexp.last_match(1).to_sym] = args[0]
  elsif has_index?(name)
    self[name]
  else
    super
  end
end

Instance Attribute Details

#dataObject (readonly)

Store vector data in an array



134
135
136
# File 'lib/daru_lite/vector.rb', line 134

def data
  @data
end

#dtypeObject (readonly)

The underlying dtype of the Vector. Can be :array.



124
125
126
# File 'lib/daru_lite/vector.rb', line 124

def dtype
  @dtype
end

#indexObject (readonly)

The row index. Can be either DaruLite::Index or DaruLite::MultiIndex.



122
123
124
# File 'lib/daru_lite/vector.rb', line 122

def index
  @index
end

#labelsObject

Store a hash of labels for values. Supplementary only. Recommend using index for proper usage.



132
133
134
# File 'lib/daru_lite/vector.rb', line 132

def labels
  @labels
end

#missing_positionsObject (readonly)

An Array or the positions in the vector that are being treated as ‘missing’.



127
128
129
# File 'lib/daru_lite/vector.rb', line 127

def missing_positions
  @missing_positions
end

#nameObject (readonly)

The name of the DaruLite::Vector. String.



120
121
122
# File 'lib/daru_lite/vector.rb', line 120

def name
  @name
end

#nm_dtypeObject (readonly)

Returns the value of attribute nm_dtype.



125
126
127
# File 'lib/daru_lite/vector.rb', line 125

def nm_dtype
  @nm_dtype
end

Class Method Details

.[](*indexes) ⇒ Object

Create a vector using (almost) any object

  • Array: flattened

  • Range: transformed using to_a

  • DaruLite::Vector

  • Numeric and string values

Description

The ‘Vector.[]` class method creates a vector from almost any object that has a `#to_a` method defined on it. It is similar to R’s ‘c` method.

Usage

a = DaruLite::Vector[1,2,3,4,6..10]
#=>
# <DaruLite::Vector:99448510 @name = nil @size = 9 >
#   nil
# 0   1
# 1   2
# 2   3
# 3   4
# 4   6
# 5   7
# 6   8
# 7   9
# 8  10


88
89
90
91
92
93
# File 'lib/daru_lite/vector.rb', line 88

def [](*indexes)
  values = indexes.map do |a|
    a.respond_to?(:to_a) ? a.to_a : a
  end.flatten
  DaruLite::Vector.new(values)
end

._load(data) ⇒ Object

:nodoc:



95
96
97
98
99
100
101
# File 'lib/daru_lite/vector.rb', line 95

def _load(data) # :nodoc:
  h = Marshal.load(data)
  DaruLite::Vector.new(h[:data],
                       index: h[:index],
                       name: h[:name],
                       dtype: h[:dtype], missing_values: h[:missing_values])
end

.coerce(data, options = {}) ⇒ Object



103
104
105
106
107
108
109
110
111
112
# File 'lib/daru_lite/vector.rb', line 103

def coerce(data, options = {})
  case data
  when DaruLite::Vector
    data
  when Array, Hash
    new(data, options)
  else
    raise ArgumentError, "Can't coerce #{data.class} to #{self}"
  end
end

.new_with_size(n, opts = {}, &block) ⇒ Object

Create a new vector by specifying the size and an optional value and block to generate values.

Description

The new_with_size class method lets you create a DaruLite::Vector by specifying the size as the argument. The optional block, if supplied, is run once for populating each element in the Vector.

The result of each run of the block is the value that is ultimately assigned to that position in the Vector.

Options

:value All the rest like .new



55
56
57
58
59
# File 'lib/daru_lite/vector.rb', line 55

def new_with_size(n, opts = {}, &block)
  value = opts.delete :value
  block ||= ->(_) { value }
  DaruLite::Vector.new Array.new(n, &block), opts
end

Instance Method Details

#==(other) ⇒ Object

Two vectors are equal if they have the exact same index values corresponding with the exact same elements. Name is ignored.



176
177
178
179
180
181
182
183
184
185
186
# File 'lib/daru_lite/vector.rb', line 176

def ==(other)
  case other
  when DaruLite::Vector
    @index == other.index && size == other.size &&
      each_with_index.with_index.all? do |(e, index), position|
        e == other.at(position) && index == other.index.to_a[position]
      end
  else
    super
  end
end

#_dumpObject

:nodoc:



536
537
538
539
540
541
542
543
# File 'lib/daru_lite/vector.rb', line 536

def _dump(*) # :nodoc:
  Marshal.dump(
    data: @data.to_a,
    dtype: @dtype,
    name: @name,
    index: @index
  )
end

#bootstrap(estimators, nr, s = nil) ⇒ Object

Bootstrap

Generate nr resamples (with replacement) of size s from vector, computing each estimate from estimators over each resample. estimators could be a) Hash with variable names as keys and lambdas as values

a.bootstrap(:log_s2=>lambda {|v| Math.log(v.variance)},1000)

b) Array with names of method to bootstrap

a.bootstrap([:mean, :sd],1000)

c) A single method to bootstrap

a.jacknife(:mean, 1000)

If s is nil, is set to vector size by default.

Returns a DataFrame where each vector is a vector of length nr containing the computed resample estimates.



449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
# File 'lib/daru_lite/vector.rb', line 449

def bootstrap(estimators, nr, s = nil)
  s ||= size
  h_est, es, bss = prepare_bootstrap(estimators)

  nr.times do
    bs = sample_with_replacement(s)
    es.each do |estimator|
      bss[estimator].push(h_est[estimator].call(bs))
    end
  end

  es.each do |est|
    bss[est] = DaruLite::Vector.new bss[est]
  end

  DaruLite::DataFrame.new bss
end

#cast(opts = {}) ⇒ Object

Cast a vector to a new data type.

Options

  • :dtype - :array for Ruby Array..

Raises:

  • (ArgumentError)


297
298
299
300
301
302
# File 'lib/daru_lite/vector.rb', line 297

def cast(opts = {})
  dt = opts[:dtype]
  raise ArgumentError, "Unsupported dtype #{opts[:dtype]}" unless dt == :array

  @data = cast_vector_to dt unless @dtype == dt
end

#category?true, false

Tells if vector is categorical or not.

Examples:

dv = DaruLite::Vector.new [1, 2, 3], type: :category
dv.category?
# => true

Returns:

  • (true, false)

    true if vector is of type category, false otherwise



350
351
352
# File 'lib/daru_lite/vector.rb', line 350

def category?
  type == :category
end

#daru_lite_vectorObject Also known as: dv

:nocov:



546
547
548
# File 'lib/daru_lite/vector.rb', line 546

def daru_lite_vector(*)
  self
end

#db_typeObject

Returns the database type for the vector, according to its content



514
515
516
517
518
519
520
521
522
523
524
525
# File 'lib/daru_lite/vector.rb', line 514

def db_type
  # first, detect any character not number
  if @data.any? { |v| v.to_s =~ DATE_REGEXP }
    'DATE'
  elsif @data.any? { |v| v.to_s =~ /[^0-9e.-]/ }
    'VARCHAR (255)'
  elsif @data.any? { |v| v.to_s.include?('.') }
    'DOUBLE'
  else
    'INTEGER'
  end
end

#delete(element) ⇒ Object

Delete an element by value



305
306
307
# File 'lib/daru_lite/vector.rb', line 305

def delete(element)
  delete_at index_of(element)
end

#delete_at(index) ⇒ Object

Delete element by index



310
311
312
313
314
315
# File 'lib/daru_lite/vector.rb', line 310

def delete_at(index)
  @data.delete_at @index[index]
  @index = DaruLite::Index.new(@index.to_a - [index])

  update_position_cache
end

#delete_at_position(position) ⇒ Object

Delete element by position



318
319
320
321
322
323
# File 'lib/daru_lite/vector.rb', line 318

def delete_at_position(position)
  @data.delete_at(position)
  @index = @index.delete_at(position)

  update_position_cache
end

#in(other) ⇒ Object

Comparator for checking if any of the elements in other exist in self.

Examples:

Usage of ‘in`.

vector = DaruLite::Vector.new([1,2,3,4,5])
vector.where(vector.in([3,5]))
#=>
##<DaruLite::Vector:82215960 @name = nil @size = 2 >
#    nil
#  2   3
#  4   5

Parameters:

  • other (Array, DaruLite::Vector)

    A collection which has elements that need to be checked for in self.



255
256
257
258
259
260
261
262
# File 'lib/daru_lite/vector.rb', line 255

def in(other)
  other = other.zip(Array.new(other.size, 0)).to_h
  DaruLite::Core::Query::BoolArray.new(
    @data.each_with_object([]) do |d, memo|
      memo << (other.key?(d))
    end
  )
end

#inspect(spacing = 20, threshold = 15) ⇒ Object

Over rides original inspect for pretty printing in irb



411
412
413
414
415
416
417
418
419
420
421
422
# File 'lib/daru_lite/vector.rb', line 411

def inspect(spacing = 20, threshold = 15)
  row_headers = index.is_a?(MultiIndex) ? index.sparse_tuples : index.to_a

  "#<#{self.class}(#{size})#{':category' if category?}>\n" +
    Formatters::Table.format(
      to_a.lazy.zip,
      headers: @name && [@name],
      row_headers: row_headers,
      threshold: threshold,
      spacing: spacing
    )
end

#is_values(*values) ⇒ DaruLite::Vector

Note:

Do not use it to check for Float::NAN as Float::NAN == Float::NAN is false

Return vector of booleans with value at ith position is either true or false depending upon whether value at position i is equal to any of the values passed in the argument or not

Examples:

dv = DaruLite::Vector.new [1, 2, 3, 2, 1]
dv.is_values 1, 2
# => #<DaruLite::Vector(5)>
#     0  true
#     1  true
#     2 false
#     3  true
#     4  true

Parameters:

  • values (Array)

    values to equate with

Returns:



288
289
290
# File 'lib/daru_lite/vector.rb', line 288

def is_values(*values)
  DaruLite::Vector.new values.map { |v| eq(v) }.inject(:|)
end

#jackknife(estimators, k = 1) ⇒ Object

Jacknife

Returns a dataset with jacknife delete-k estimators estimators could be: a) Hash with variable names as keys and lambdas as values

a.jacknife(:log_s2=>lambda {|v| Math.log(v.variance)})

b) Array with method names to jacknife

a.jacknife([:mean, :sd])

c) A single method to jacknife

a.jacknife(:mean)

k represent the block size for block jacknife. By default is set to 1, for classic delete-one jacknife.

Returns a dataset where each vector is an vector of length cases/k containing the computed jacknife estimates.

Reference:

  • Sawyer, S. (2005). Resampling Data: Using a Statistical Jacknife.



484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
# File 'lib/daru_lite/vector.rb', line 484

def jackknife(estimators, k = 1) # rubocop:disable Metrics/MethodLength
  raise "n should be divisible by k:#{k}" unless (size % k).zero?

  nb = (size / k).to_i
  h_est, es, ps = prepare_bootstrap(estimators)

  est_n = es.to_h { |v| [v, h_est[v].call(self)] }

  nb.times do |i|
    other = @data.dup
    other.slice!(i * k, k)
    other = DaruLite::Vector.new other

    es.each do |estimator|
      # Add pseudovalue
      ps[estimator].push(
        (nb * est_n[estimator]) - ((nb - 1) * h_est[estimator].call(other))
      )
    end
  end

  es.each do |est|
    ps[est] = DaruLite::Vector.new ps[est]
  end
  DaruLite::DataFrame.new ps
end

#lag(k = 1) ⇒ DaruLite::Vector

Lags the series by ‘k` periods.

Lags the series by ‘k` periods, “shifting” data and inserting `nil`s from beginning or end of a vector, while preserving original vector’s size.

‘k` can be positive or negative integer. If `k` is positive, `nil`s are inserted at the beginning of the vector, otherwise they are inserted at the end.

Examples:

Lag a vector with different periods ‘k`


ts = DaruLite::Vector.new(1..5)
            # => [1, 2, 3, 4, 5]

ts.lag      # => [nil, 1, 2, 3, 4]
ts.lag(1)   # => [nil, 1, 2, 3, 4]
ts.lag(2)   # => [nil, nil, 1, 2, 3]
ts.lag(-1)  # => [2, 3, 4, 5, nil]

Parameters:

  • k (Integer) (defaults to: 1)

    “shift” the series by ‘k` periods. `k` can be positive or negative. (default = 1)

Returns:

  • (DaruLite::Vector)

    a new vector with “shifted” inital values and ‘nil` values inserted. The return vector is the same length as the orignal vector.



398
399
400
401
402
403
404
405
406
407
408
# File 'lib/daru_lite/vector.rb', line 398

def lag(k = 1)
  case k
  when 0 then dup
  when 1...size
    copy(([nil] * k) + data.to_a)
  when -size..-1
    copy(data.to_a[k.abs...size])
  else
    copy([])
  end
end

#numeric?Boolean

Returns:

  • (Boolean)


264
265
266
# File 'lib/daru_lite/vector.rb', line 264

def numeric?
  type == :numeric
end

#object?Boolean

Returns:

  • (Boolean)


268
269
270
# File 'lib/daru_lite/vector.rb', line 268

def object?
  type == :object
end

#rename(new_name) ⇒ Object Also known as: name=

Give the vector a new name

Parameters:

  • new_name (Symbol)

    The new name.



427
428
429
430
# File 'lib/daru_lite/vector.rb', line 427

def rename(new_name)
  @name = new_name
  self
end

#respond_to_missing?(name, include_private = false) ⇒ Boolean

Returns:

  • (Boolean)


576
577
578
# File 'lib/daru_lite/vector.rb', line 576

def respond_to_missing?(name, include_private = false)
  name.to_s.end_with?('=') || has_index?(name) || super
end

#save(filename) ⇒ Object

Save the vector to a file

Arguments

  • filename - Path of file where the vector is to be saved



532
533
534
# File 'lib/daru_lite/vector.rb', line 532

def save(filename)
  DaruLite::IO.save self, filename
end

#sizeObject



115
116
117
# File 'lib/daru_lite/vector.rb', line 115

def size
  @data.size
end

#splitted(sep = ',') ⇒ Object

Return an Array with the data splitted by a separator.

a=DaruLite::Vector.new(["a,b","c,d","a,b","d"])
a.splitted
  =>
[["a","b"],["c","d"],["a","b"],["d"]]


359
360
361
362
363
364
365
366
367
368
369
# File 'lib/daru_lite/vector.rb', line 359

def splitted(sep = ',')
  @data.map do |s|
    if s.nil?
      nil
    elsif s.respond_to? :split
      s.split sep
    else
      [s]
    end
  end
end

#to_category(opts = {}) ⇒ DaruLite::Vector

Converts a non category type vector to category type vector.

Parameters:

  • opts (Hash) (defaults to: {})

    options to convert to category

Options Hash (opts):

  • :ordered (true, false)

    Specify if vector is ordered or not. If it is ordered, it can be sorted and min, max like functions would work

  • :categories (Array)

    set categories in the specified order

Returns:



559
560
561
562
563
564
# File 'lib/daru_lite/vector.rb', line 559

def to_category(opts = {})
  dv = DaruLite::Vector.new to_a, type: :category, name: @name, index: @index
  dv.ordered = opts[:ordered] || false
  dv.categories = opts[:categories] if opts[:categories]
  dv
end

#typeObject

The type of data contained in the vector. Can be :object.

Running through the data to figure out the kind of data is delayed to the last possible moment.



329
330
331
332
333
334
335
336
337
338
339
340
341
342
# File 'lib/daru_lite/vector.rb', line 329

def type
  if @type.nil? || @possibly_changed_type
    @type = :numeric
    each do |e|
      next if e.nil? || e.is_a?(Numeric)

      @type = :object
      break
    end
    @possibly_changed_type = false
  end

  @type
end