Class: Daru::Vector

Inherits:
Object
  • Object
show all
Includes:
Maths::Arithmetic::Vector, Maths::Statistics::Vector, Plotting::Vector, Enumerable
Defined in:
lib/daru/vector.rb,
lib/daru/extensions/rserve.rb

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Plotting::Vector

#plot

Methods included from Maths::Statistics::Vector

#acf, #acvf, #average_deviation_population, #box_cox_transformation, #center, #coefficient_of_variation, #count, #covariance_population, #covariance_sample, #cumsum, #describe, #dichotomize, #diff, #ema, #emsd, #emv, #factors, #freqs, #frequencies, #kurtosis, #macd, #max, #max_index, #mean, #median, #median_absolute_deviation, #min, #mode, #percent_change, #percentile, #product, #proportion, #proportions, #range, #ranked, #rolling, #rolling_count, #rolling_max, #rolling_mean, #rolling_median, #rolling_min, #rolling_std, #rolling_sum, #rolling_variance, #sample_with_replacement, #sample_without_replacement, #skew, #standard_deviation_population, #standard_deviation_sample, #standard_error, #standardize, #sum, #sum_of_squared_deviation, #sum_of_squares, #value_counts, #variance_population, #variance_sample, #vector_centered_compute, #vector_percentile, #vector_standardized_compute

Methods included from Maths::Arithmetic::Vector

#%, #*, #**, #+, #-, #/, #abs, #exp, #round, #sqrt

Constructor Details

#initialize(source, opts = {}) ⇒ Vector

Create a Vector object.

Arguments

Hash. If Array, a numeric index will be created if not supplied in the options. Specifying more index elements than actual values in source will insert nil into the surplus index elements. When a Hash is specified, the keys of the Hash are taken as the index elements and the corresponding values as the values that populate the vector.

Options

  • :name - Name of the vector

  • :index - Index of the vector

  • :dtype - The underlying data type. Can be :array, :nmatrix or :gsl.

Default :array.

  • :nm_dtype - For NMatrix, the data type of the numbers. See the NMatrix docs for

further information on supported data type.

  • :missing_values - An Array of the values that are to be treated as 'missing'.

nil is the default missing value.

Usage

vecarr = Daru::Vector.new [1,2,3,4], index: [:a, :e, :i, :o]
vechsh = Daru::Vector.new({a: 1, e: 2, i: 3, o: 4})

Parameters:

  • source (Array, Hash)
    • Supply elements in the form of an Array or a



95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
# File 'lib/daru/vector.rb', line 95

def initialize source, opts={}
  index = nil
  if source.is_a?(Hash)
    index  = source.keys
    source = source.values
  else
    index  = opts[:index]
    source ||= []
  end
  name = opts[:name]
  set_name name

  @metadata = opts[:metadata] || {}

  @data  = cast_vector_to(opts[:dtype] || :array, source, opts[:nm_dtype])
  @index = try_create_index(index || @data.size)

  if @index.size > @data.size
    cast(dtype: :array) # NM with nils seg faults
    (@index.size - @data.size).times { @data << nil }
  elsif @index.size < @data.size
    raise IndexError, "Expected index size >= vector size. Index size : #{@index.size}, vector size : #{@data.size}"
  end

  @possibly_changed_type = true
  set_missing_values opts[:missing_values]
  set_missing_positions
  set_size
end

Dynamic Method Handling

This class handles dynamic methods through the method_missing method

#method_missing(name, *args, &block) ⇒ Object



1152
1153
1154
1155
1156
1157
1158
1159
1160
# File 'lib/daru/vector.rb', line 1152

def method_missing(name, *args, &block)
  if name =~ /(.+)\=/
    self[name] = args[0]
  elsif has_index?(name)
    self[name]
  else
    super(name, *args, &block)
  end
end

Instance Attribute Details

#dataObject (readonly)

Store vector data in an array



61
62
63
# File 'lib/daru/vector.rb', line 61

def data
  @data
end

#dtypeObject (readonly)

The underlying dtype of the Vector. Can be either :array, :nmatrix or :gsl.



50
51
52
# File 'lib/daru/vector.rb', line 50

def dtype
  @dtype
end

#indexObject

The row index. Can be either Daru::Index or Daru::MultiIndex.



46
47
48
# File 'lib/daru/vector.rb', line 46

def index
  @index
end

#labelsObject

Store a hash of labels for values. Supplementary only. Recommend using index for proper usage.



59
60
61
# File 'lib/daru/vector.rb', line 59

def labels
  @labels
end

#metadataObject

Attach arbitrary metadata to vector (usu a hash)



63
64
65
# File 'lib/daru/vector.rb', line 63

def 
  @metadata
end

#missing_positionsObject (readonly)

An Array or the positions in the vector that are being treated as 'missing'.



56
57
58
# File 'lib/daru/vector.rb', line 56

def missing_positions
  @missing_positions
end

#nameObject (readonly)

The name of the Daru::Vector. String.



44
45
46
# File 'lib/daru/vector.rb', line 44

def name
  @name
end

#nm_dtypeObject (readonly)

If the dtype is :nmatrix, this attribute represents the data type of the underlying NMatrix object. See NMatrix docs for more details on NMatrix data types.



54
55
56
# File 'lib/daru/vector.rb', line 54

def nm_dtype
  @nm_dtype
end

#sizeObject (readonly)

The total number of elements of the vector.



48
49
50
# File 'lib/daru/vector.rb', line 48

def size
  @size
end

Class Method Details

.[](*args) ⇒ Object

Create a vector using (almost) any object

  • Array: flattened

  • Range: transformed using to_a

  • Daru::Vector

  • Numeric and string values

Description

The `Vector.[]` class method creates a vector from almost any object that has a `#to_a` method defined on it. It is similar to R's `c` method.

Usage

a = Daru::Vector[1,2,3,4,6..10]
#=>
# <Daru::Vector:99448510 @name = nil @size = 9 >
#   nil
# 0   1
# 1   2
# 2   3
# 3   4
# 4   6
# 5   7
# 6   8
# 7   9
# 8  10


177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
# File 'lib/daru/vector.rb', line 177

def self.[](*args)
  values = []
  args.each do |a|
    case a
    when Array
      values.concat a.flatten
    when Daru::Vector
      values.concat a.to_a
    when Range
      values.concat a.to_a
    else
      values << a
    end
  end
  Daru::Vector.new(values)
end

._load(data) ⇒ Object

:nodoc:



1138
1139
1140
1141
1142
1143
1144
# File 'lib/daru/vector.rb', line 1138

def self._load(data) # :nodoc:
  h = Marshal.load(data)
  Daru::Vector.new(h[:data],
    index: h[:index],
    name: h[:name], metadata: h[:metadata],
    dtype: h[:dtype], missing_values: h[:missing_values])
end

.new_with_size(n, opts = {}, &block) ⇒ Object

Create a new vector by specifying the size and an optional value and block to generate values.

Description

The new_with_size class method lets you create a Daru::Vector by specifying the size as the argument. The optional block, if supplied, is run once for populating each element in the Vector.

The result of each run of the block is the value that is ultimately assigned to that position in the Vector.

Options

:value All the rest like .new



140
141
142
143
144
145
146
147
148
# File 'lib/daru/vector.rb', line 140

def self.new_with_size n, opts={}, &block
  value = opts[:value]
  opts.delete :value
  if block
    Daru::Vector.new Array.new(n) { |i| block.call(i) }, opts
  else
    Daru::Vector.new Array.new(n) { value }, opts
  end
end

Instance Method Details

#==(other) ⇒ Object

Two vectors are equal if the have the exact same index values corresponding with the exact same elements. Name is ignored.



287
288
289
290
291
292
293
294
295
# File 'lib/daru/vector.rb', line 287

def == other
  case other
  when Daru::Vector
    @index == other.index && @size == other.size &&
      @index.all? { |index| self[index] == other[index] }
  else
    super
  end
end

#[](*input_indexes) ⇒ Object

Get one or more elements with specified index or a range.

Usage

# For vectors employing single layer Index

v[:one, :two] # => Daru::Vector with indexes :one and :two
v[:one]       # => Single element
v[:one..:three] # => Daru::Vector with indexes :one, :two and :three

# For vectors employing hierarchial multi index


205
206
207
208
209
210
211
212
213
214
215
216
217
# File 'lib/daru/vector.rb', line 205

def [](*input_indexes)
  # Get a proper index object
  indexes = @index[*input_indexes]

  # If one object is asked return it
  return @data[indexes] if indexes.is_a? Numeric

  # Form a new Vector using indexes and return it
  Daru::Vector.new(
    indexes.map { |loc| @data[@index[loc]] },
    name: @name, metadata: @metadata.dup, index: indexes.conform(input_indexes), dtype: @dtype
  )
end

#[]=(*location, value) ⇒ Object

Just like in Hashes, you can specify the index label of the Daru::Vector and assign an element an that place in the Daru::Vector.

Usage

v = Daru::Vector.new([1,2,3], index: [:a, :b, :c])
v[:a] = 999
#=>
##<Daru::Vector:90257920 @name = nil @size = 3 >
#    nil
#  a 999
#  b   2
#  c   3


232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
# File 'lib/daru/vector.rb', line 232

def []=(*location, value)
  cast(dtype: :array) if value.nil? && dtype != :array

  @possibly_changed_type = true if @type == :object  && (value.nil? ||
    value.is_a?(Numeric))
  @possibly_changed_type = true if @type == :numeric && (!value.is_a?(Numeric) &&
    !value.nil?)

  pos = @index[*location]

  if pos.is_a?(Numeric)
    @data[pos] = value
  else
    begin
      pos.each { |tuple| self[tuple] = value }
    rescue NoMethodError
      raise IndexError, "Specified index #{pos.inspect} does not exist."
    end
  end

  set_size
  set_missing_positions unless Daru.lazy_update
end

#_dumpObject

:nodoc:



1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
# File 'lib/daru/vector.rb', line 1127

def _dump(*) # :nodoc:
  Marshal.dump(
    data:           @data.to_a,
    dtype:          @dtype,
    name:           @name,
    metadata:       @metadata,
    index:          @index,
    missing_values: @missing_values
  )
end

#all?(&block) ⇒ Boolean

Returns:

  • (Boolean)


514
515
516
# File 'lib/daru/vector.rb', line 514

def all? &block
  @data.data.all?(&block)
end

#any?(&block) ⇒ Boolean

Returns:

  • (Boolean)


510
511
512
# File 'lib/daru/vector.rb', line 510

def any? &block
  @data.data.any?(&block)
end

#bootstrap(estimators, nr, s = nil) ⇒ Object

Bootstrap

Generate nr resamples (with replacement) of size s from vector, computing each estimate from estimators over each resample. estimators could be a) Hash with variable names as keys and lambdas as values

a.bootstrap(:log_s2=>lambda {|v| Math.log(v.variance)},1000)

b) Array with names of method to bootstrap

a.bootstrap([:mean, :sd],1000)

c) A single method to bootstrap

a.jacknife(:mean, 1000)

If s is nil, is set to vector size by default.

Returns a DataFrame where each vector is a vector of length nr containing the computed resample estimates.



987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
# File 'lib/daru/vector.rb', line 987

def bootstrap(estimators, nr, s=nil)
  s ||= size
  h_est, es, bss = prepare_bootstrap(estimators)

  nr.times do
    bs = sample_with_replacement(s)
    es.each do |estimator|
      bss[estimator].push(h_est[estimator].call(bs))
    end
  end

  es.each do |est|
    bss[est] = Daru::Vector.new bss[est]
  end

  Daru::DataFrame.new bss
end

#cast(opts = {}) ⇒ Object

Cast a vector to a new data type.

Options

  • :dtype - :array for Ruby Array. :nmatrix for NMatrix.

Raises:

  • (ArgumentError)


448
449
450
451
452
453
454
# File 'lib/daru/vector.rb', line 448

def cast opts={}
  dt = opts[:dtype]
  raise ArgumentError, "Unsupported dtype #{opts[:dtype]}" unless
    dt == :array || dt == :nmatrix || dt == :gsl

  @data = cast_vector_to dt unless @dtype == dt
end

#clone_structureObject

Copies the structure of the vector (i.e the index, size, etc.) and fills all all values with nils.



1114
1115
1116
# File 'lib/daru/vector.rb', line 1114

def clone_structure
  Daru::Vector.new(([nil]*@size), name: @name, metadata: @metadata.dup, index: @index.dup)
end

#concat(element, index) ⇒ Object Also known as: push, <<

Append an element to the vector by specifying the element and index

Raises:

  • (IndexError)


431
432
433
434
435
436
437
438
439
# File 'lib/daru/vector.rb', line 431

def concat element, index
  raise IndexError, 'Expected new unique index' if @index.include? index

  @index |= [index]
  @data[@index[index]] = element

  set_size
  set_missing_positions unless Daru.lazy_update
end

#daru_vectorObject Also known as: dv



1146
1147
1148
# File 'lib/daru/vector.rb', line 1146

def daru_vector(*)
  self
end

#db_typeObject

Returns the database type for the vector, according to its content



1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
# File 'lib/daru/vector.rb', line 1097

def db_type
  # first, detect any character not number
  if @data.find { |v| v.to_s=~/\d{2,2}-\d{2,2}-\d{4,4}/ } ||
     @data.find { |v| v.to_s=~/\d{4,4}-\d{2,2}-\d{2,2}/ }

    return 'DATE'
  elsif @data.find { |v| v.to_s=~/[^0-9e.-]/ }
    return 'VARCHAR (255)'
  elsif @data.find { |v| v.to_s=~/\./ }
    return 'DOUBLE'
  else
    return 'INTEGER'
  end
end

#delete(element) ⇒ Object

Delete an element by value



457
458
459
# File 'lib/daru/vector.rb', line 457

def delete element
  delete_at index_of(element)
end

#delete_at(index) ⇒ Object

Delete element by index



462
463
464
465
466
467
468
# File 'lib/daru/vector.rb', line 462

def delete_at index
  @data.delete_at @index[index]
  @index = Daru::Index.new(@index.to_a - [index])

  set_size
  set_missing_positions unless Daru.lazy_update
end

#delete_ifObject

Delete an element if block returns true. Destructive.



593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
# File 'lib/daru/vector.rb', line 593

def delete_if
  return to_enum(:delete_if) unless block_given?

  keep_e = []
  keep_i = []
  each_with_index do |n, i|
    unless yield(n)
      keep_e << n
      keep_i << i
    end
  end

  @data = cast_vector_to @dtype, keep_e
  @index = Daru::Index.new(keep_i)
  set_missing_positions unless Daru.lazy_update
  set_size

  self
end

#detach_indexObject



777
778
779
780
781
782
# File 'lib/daru/vector.rb', line 777

def detach_index
  Daru::DataFrame.new(
    index: @index.to_a,
    values: @data.to_a
  )
end

#dupObject

Duplicate elements and indexes



968
969
970
# File 'lib/daru/vector.rb', line 968

def dup
  Daru::Vector.new @data.dup, name: @name, metadata: @metadata.dup, index: @index.dup
end

#each(&block) ⇒ Object



15
16
17
18
19
20
# File 'lib/daru/vector.rb', line 15

def each(&block)
  return to_enum(:each) unless block_given?

  @data.each(&block)
  self
end

#each_index(&block) ⇒ Object



22
23
24
25
26
27
# File 'lib/daru/vector.rb', line 22

def each_index(&block)
  return to_enum(:each_index) unless block_given?

  @index.each(&block)
  self
end

#each_with_indexObject



29
30
31
32
33
34
# File 'lib/daru/vector.rb', line 29

def each_with_index
  return to_enum(:each_with_index) unless block_given?

  @index.each { |i| yield(self[i], i) }
  self
end

#empty?Boolean

Returns:

  • (Boolean)


420
421
422
# File 'lib/daru/vector.rb', line 420

def empty?
  @index.empty?
end

#exists?(value) ⇒ Boolean

Returns true if the value passed is actually exists or is not marked as a *missing value*.

Returns:

  • (Boolean)


572
573
574
# File 'lib/daru/vector.rb', line 572

def exists? value
  !@missing_values.key?(self[index_of(value)])
end

#has_index?(index) ⇒ Boolean

Returns true if an index exists

Returns:

  • (Boolean)


795
796
797
# File 'lib/daru/vector.rb', line 795

def has_index? index
  @index.include? index
end

#has_missing_data?Boolean Also known as: flawed?

Reports whether missing data is present in the Vector.

Returns:

  • (Boolean)


425
426
427
# File 'lib/daru/vector.rb', line 425

def has_missing_data?
  !missing_positions.empty?
end

#head(q = 10) ⇒ Object



412
413
414
# File 'lib/daru/vector.rb', line 412

def head q=10
  self[0..(q-1)]
end

#in(other) ⇒ Object

Comparator for checking if any of the elements in other exist in self.

Examples:

Usage of `in`.

vector = Daru::Vector.new([1,2,3,4,5])
vector.where(vector.in([3,5]))
#=>
##<Daru::Vector:82215960 @name = nil @size = 2 >
#    nil
#  2   3
#  4   5

Parameters:

  • other (Array, Daru::Vector)

    A collection which has elements that need to be checked for in self.



363
364
365
366
367
368
369
370
# File 'lib/daru/vector.rb', line 363

def in other
  other = Hash[other.zip(Array.new(other.size, 0))]
  Daru::Core::Query::BoolArray.new(
    @data.each_with_object([]) do |d, memo|
      memo << (other.key?(d) ? true : false)
    end
  )
end

#index_of(element) ⇒ Object

Get index of element



493
494
495
496
497
498
# File 'lib/daru/vector.rb', line 493

def index_of element
  case dtype
  when :array then @index.key @data.index { |x| x.eql? element }
  else @index.key @data.index(element)
  end
end

#inspect(spacing = 20, threshold = 15) ⇒ Object

Over rides original inspect for pretty printing in irb



903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
# File 'lib/daru/vector.rb', line 903

def inspect spacing=20, threshold=15
  longest =
    [
      @name.to_s.size,
      (@index.to_a.map(&:to_s).map(&:size).max || 0),
      (@data.map(&:to_s).map(&:size).max || 0),
      3 # 'nil'.size
    ].max

  content   = ''
  longest   = spacing if longest > spacing
  name      = @name || 'nil'
    = @metadata || 'nil'
  formatter = "\n%#{longest}.#{longest}s %#{longest}.#{longest}s"
  content  += "\n#<#{self.class}:#{object_id} @name = #{name} @metadata = #{} @size = #{size} >"

  content += formatter % ['', name]
  @index.each_with_index do |index, num|
    content += formatter % [index.to_s, (self[*index] || 'nil').to_s]
    if num > threshold
      content += formatter % ['...', '...']
      break
    end
  end
  content += "\n"

  content
end

#is_nil?Boolean

Returns a vector which has true in the position where the element in self is nil, and false otherwise.

Usage

v = Daru::Vector.new([1,2,4,nil])
v.is_nil?
# =>
#<Daru::Vector:89421000 @name = nil @size = 4 >
#      nil
#  0  false
#  1  false
#  2  false
#  3  true

Returns:

  • (Boolean)


721
722
723
724
725
726
727
728
# File 'lib/daru/vector.rb', line 721

def is_nil?
  nil_truth_vector = clone_structure
  @index.each do |idx|
    nil_truth_vector[idx] = self[idx].nil? ? true : false
  end

  nil_truth_vector
end

#jackknife(estimators, k = 1) ⇒ Object

Jacknife

Returns a dataset with jacknife delete-k estimators estimators could be: a) Hash with variable names as keys and lambdas as values

a.jacknife(:log_s2=>lambda {|v| Math.log(v.variance)})

b) Array with method names to jacknife

a.jacknife([:mean, :sd])

c) A single method to jacknife

a.jacknife(:mean)

k represent the block size for block jacknife. By default is set to 1, for classic delete-one jacknife.

Returns a dataset where each vector is an vector of length cases/k containing the computed jacknife estimates.

Reference:

  • Sawyer, S. (2005). Resampling Data: Using a Statistical Jacknife.



1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
# File 'lib/daru/vector.rb', line 1022

def jackknife(estimators, k=1)
  raise "n should be divisible by k:#{k}" unless size % k==0

  nb = (size / k).to_i
  h_est, es, ps = prepare_bootstrap(estimators)

  est_n = es.map { |v| [v, h_est[v].call(self)] }.to_h

  nb.times do |i|
    other = @data.dup
    other.slice!(i*k, k)
    other = Daru::Vector.new other

    es.each do |estimator|
      # Add pseudovalue
      ps[estimator].push(
        nb * est_n[estimator] - (nb-1) * h_est[estimator].call(other)
      )
    end
  end

  es.each do |est|
    ps[est] = Daru::Vector.new ps[est]
  end
  Daru::DataFrame.new ps
end

#keep_ifObject

Keep an element if block returns true. Destructive.



614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
# File 'lib/daru/vector.rb', line 614

def keep_if
  return to_enum(:keep_if) unless block_given?

  keep_e = []
  keep_i = []
  each_with_index do |n, i|
    if yield(n)
      keep_e << n
      keep_i << i
    end
  end

  @data = cast_vector_to @dtype, keep_e
  @index = Daru::Index.new(keep_i)
  set_missing_positions unless Daru.lazy_update
  set_size

  self
end

#lag(k = 1) ⇒ Object

Lags the series by k periods.

The convention is to set the oldest observations (the first ones in the series) to nil so that the size of the lagged series is the same as the original.

Usage:

ts = Daru::Vector.new((1..10).map { rand })
        # => [0.69, 0.23, 0.44, 0.71, ...]

ts.lag   # => [nil, 0.69, 0.23, 0.44, ...]
ts.lag(2) # => [nil, nil, 0.69, 0.23, ...]


767
768
769
770
771
772
773
774
775
# File 'lib/daru/vector.rb', line 767

def lag k=1
  return dup if k == 0

  dat = @data.to_a.dup
  (dat.size - 1).downto(k) { |i| dat[i] = dat[i - k] }
  (0...k).each { |i| dat[i] = nil }

  Daru::Vector.new(dat, index: @index, name: @name, metadata: @metadata.dup)
end

#map!(&block) ⇒ Object



36
37
38
39
40
41
# File 'lib/daru/vector.rb', line 36

def map!(&block)
  return to_enum(:map!) unless block_given?
  @data.map!(&block)
  update
  self
end

#missing_valuesObject

The values to be treated as 'missing'. nil is the default missing type. To set missing values see the missing_values= method.



258
259
260
# File 'lib/daru/vector.rb', line 258

def missing_values
  @missing_values.keys
end

#missing_values=(values) ⇒ Object

Assign an Array to treat certain values as 'missing'.

Usage

v = Daru::Vector.new [1,2,3,4,5]
v.missing_values = [3]
v.update
v.missing_positions
#=> [2]


271
272
273
274
# File 'lib/daru/vector.rb', line 271

def missing_values= values
  set_missing_values values
  set_missing_positions unless Daru.lazy_update
end

#n_validObject

number of non-missing elements



790
791
792
# File 'lib/daru/vector.rb', line 790

def n_valid
  @size - missing_positions.size
end

#not_nil?Boolean

Opposite of #is_nil?

Returns:

  • (Boolean)


731
732
733
734
735
736
737
738
# File 'lib/daru/vector.rb', line 731

def not_nil?
  nil_truth_vector = clone_structure
  @index.each do |idx|
    nil_truth_vector[idx] = self[idx].nil? ? false : true
  end

  nil_truth_vector
end

#only_missing(as_a = :vector) ⇒ Object

Returns a Vector containing only missing data (preserves indexes).



1076
1077
1078
1079
1080
1081
1082
# File 'lib/daru/vector.rb', line 1076

def only_missing as_a=:vector
  if as_a == :vector
    self[*missing_positions]
  elsif as_a == :array
    self[*missing_positions].to_a
  end
end

#only_numericsObject

Returns a Vector with only numerical data. Missing data is included but non-Numeric objects are excluded. Preserves index.



1086
1087
1088
1089
1090
1091
1092
1093
1094
# File 'lib/daru/vector.rb', line 1086

def only_numerics
  numeric_indexes = []

  each_with_index do |v, i|
    numeric_indexes << i if v.is_a?(Numeric) || @missing_values.key?(v)
  end

  self[*numeric_indexes]
end

#only_valid(as_a = :vector, duplicate = true) ⇒ Object

Creates a new vector consisting only of non-nil data

Arguments

as an Array. Otherwise will return a Daru::Vector.

vector, setting this to false will return the same vector. Otherwise, a duplicate will be returned irrespective of presence of missing data.



1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
# File 'lib/daru/vector.rb', line 1060

def only_valid as_a=:vector, duplicate=true
  return dup if !has_missing_data? && as_a == :vector && duplicate
  return self if !has_missing_data? && as_a == :vector && !duplicate
  return to_a if !has_missing_data? && as_a != :vector

  new_index = @index.to_a - missing_positions
  new_vector = new_index.map do |idx|
    self[idx]
  end

  return new_vector if as_a != :vector

  Daru::Vector.new new_vector, index: new_index, name: @name, metadata: @metadata.dup, dtype: dtype
end

#recode(dt = nil, &block) ⇒ Object

Like map, but returns a Daru::Vector with the returned values.



577
578
579
580
581
# File 'lib/daru/vector.rb', line 577

def recode dt=nil, &block
  return to_enum(:recode) unless block_given?

  dup.recode! dt, &block
end

#recode!(dt = nil, &block) ⇒ Object

Destructive version of recode!



584
585
586
587
588
589
590
# File 'lib/daru/vector.rb', line 584

def recode! dt=nil, &block
  return to_enum(:recode!) unless block_given?

  @data.map!(&block).data
  @data = cast_vector_to(dt || @dtype)
  self
end

#reindex(new_index) ⇒ Object

Create a new vector with a different index, and preserve the indexing of current elements.



934
935
936
937
938
939
940
941
942
# File 'lib/daru/vector.rb', line 934

def reindex new_index
  vector = Daru::Vector.new([], index: new_index, name: @name, metadata: @metadata.dup)

  new_index.each do |idx|
    vector[idx] = @index.include?(idx) ? self[idx] : nil
  end

  vector
end

#rename(new_name) ⇒ Object

Give the vector a new name

Parameters:

  • new_name (Symbol)

    The new name.



958
959
960
961
962
963
964
965
# File 'lib/daru/vector.rb', line 958

def rename new_name
  if new_name.is_a?(Numeric)
    @name = new_name
    return
  end

  @name = new_name
end

#replace_nils(replacement) ⇒ Object

Non-destructive version of #replace_nils!



785
786
787
# File 'lib/daru/vector.rb', line 785

def replace_nils replacement
  dup.replace_nils!(replacement)
end

#replace_nils!(replacement) ⇒ Object

Replace all nils in the vector with the value passed as an argument. Destructive. See #replace_nils for non-destructive version

Arguments

  • replacement - The value which should replace all nils



746
747
748
749
750
751
752
# File 'lib/daru/vector.rb', line 746

def replace_nils! replacement
  missing_positions.each do |idx|
    self[idx] = replacement
  end

  self
end

#report_building(b) ⇒ Object



873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
# File 'lib/daru/vector.rb', line 873

def report_building b
  b.section(name: name) do |s|
    s.text "n :#{size}"
    s.text "n valid:#{n_valid}"
    if @type == :object
      s.text  "factors: #{factors.to_a.join(',')}"
      s.text  "mode: #{mode}"

      s.table(name: 'Distribution') do |t|
        frequencies.sort_by(&:to_s).each do |k,v|
          key = @index.include?(k) ? @index[k] : k
          t.row [key, v, ('%0.2f%%' % (v.quo(n_valid)*100))]
        end
      end
    end

    s.text "median: #{median}" if @type==:numeric || @type==:numeric
    if @type==:numeric
      s.text 'mean: %0.4f' % mean
      if sd
        s.text 'std.dev.: %0.4f' % sd
        s.text 'std.err.: %0.4f' % se
        s.text 'skew: %0.4f' % skew
        s.text 'kurtosis: %0.4f' % kurtosis
      end
    end
  end
end

#reset_index!Object



702
703
704
705
# File 'lib/daru/vector.rb', line 702

def reset_index!
  @index = Daru::Index.new(Array.new(size) { |i| i })
  self
end

#save(filename) ⇒ Object

Save the vector to a file

Arguments

  • filename - Path of file where the vector is to be saved



1123
1124
1125
# File 'lib/daru/vector.rb', line 1123

def save filename
  Daru::IO.save self, filename
end

#sort(opts = {}) ⇒ Object

Sorts a vector according to its values. If a block is specified, the contents will be evaluated and data will be swapped whenever the block evaluates to true. Defaults to ascending order sorting. Any missing values will be put at the end of the vector. Preserves indexing. Default sort algorithm is quick sort.

Options

  • :ascending - if false, will sort in descending order. Defaults to true.

  • :type - Specify the sorting algorithm. Only supports quick_sort for now.

Usage

v = Daru::Vector.new ["My first guitar", "jazz", "guitar"]
# Say you want to sort these strings by length.
v.sort(ascending: false) { |a,b| a.length <=> b.length }


534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
# File 'lib/daru/vector.rb', line 534

def sort opts={}
  opts = {
    ascending: true
  }.merge(opts)

  vector_index = @data.each_with_index
  vector_index =
    if block_given?
      vector_index.sort { |a,b| yield(a[0], b[0]) }
    else
      vector_index.sort { |(av, ai), (bv, bi)|
        if !av.nil? && !bv.nil?
          av <=> bv
        elsif av.nil? && bv.nil?
          ai <=> bi
        elsif av.nil?
          opts[:ascending] ? -1 : 1
        else
          opts[:ascending] ? 1 : -1
        end
      }
    end
  vector_index.reverse! unless opts[:ascending]
  vector, index = vector_index.transpose
  old_index = @index.to_a
  index = index.map { |i| old_index[i] }

  Daru::Vector.new(vector, index: index, name: @name, metadata: @metadata.dup, dtype: @dtype)
end

#sorted_data(&block) ⇒ Object

Just sort the data and get an Array in return using Enumerable#sort. Non-destructive.



566
567
568
# File 'lib/daru/vector.rb', line 566

def sorted_data &block
  @data.to_a.sort(&block)
end

#split_by_separator(sep = ',') ⇒ Object

Returns a hash of Vectors, defined by the different values defined on the fields Example:

a=Daru::Vector.new(["a,b","c,d","a,b"])
a.split_by_separator
=>  {"a"=>#<Daru::Vector:0x7f2dbcc09d88
      @data=[1, 0, 1]>,
     "b"=>#<Daru::Vector:0x7f2dbcc09c48
      @data=[1, 1, 0]>,
    "c"=>#<Daru::Vector:0x7f2dbcc09b08
      @data=[0, 1, 1]>}


675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
# File 'lib/daru/vector.rb', line 675

def split_by_separator sep=','
  split_data = splitted sep
  factors = split_data.flatten.uniq.compact

  out = factors.map { |x| [x, []] }.to_h

  split_data.each do |r|
    if r.nil?
      factors.each do |f|
        out[f].push(nil)
      end
    else
      factors.each do |f|
        out[f].push(r.include?(f) ? 1 : 0)
      end
    end
  end

  out.map { |k, v| [k, Daru::Vector.new(v)] }.to_h
end

#split_by_separator_freq(sep = ',') ⇒ Object



696
697
698
699
700
# File 'lib/daru/vector.rb', line 696

def split_by_separator_freq(sep=',')
  split_by_separator(sep).map do |k, v|
    [k, v.inject { |s,x| s+x.to_i }]
  end.to_h
end

#splitted(sep = ',') ⇒ Object

Return an Array with the data splitted by a separator.

a=Daru::Vector.new(["a,b","c,d","a,b","d"])
a.splitted
  =>
[["a","b"],["c","d"],["a","b"],["d"]]


650
651
652
653
654
655
656
657
658
659
660
# File 'lib/daru/vector.rb', line 650

def splitted sep=','
  @data.map do |s|
    if s.nil?
      nil
    elsif s.respond_to? :split
      s.split sep
    else
      [s]
    end
  end
end

#summary(method = :to_text) ⇒ Object

Create a summary of the Vector using Report Builder.



869
870
871
# File 'lib/daru/vector.rb', line 869

def summary(method=:to_text)
  ReportBuilder.new(no_title: true).add(self).send(method)
end

#tail(q = 10) ⇒ Object



416
417
418
# File 'lib/daru/vector.rb', line 416

def tail q=10
  self[(@size - q)..(@size-1)]
end

#to_aObject

Return an array



827
828
829
# File 'lib/daru/vector.rb', line 827

def to_a
  @data.to_a
end

#to_gslObject

If dtype != gsl, will convert data to GSL::Vector with to_a. Otherwise returns the stored GSL::Vector object.

Raises:

  • (NoMethodError)


816
817
818
819
# File 'lib/daru/vector.rb', line 816

def to_gsl
  raise NoMethodError, 'Install gsl-nmatrix for access to this functionality.' unless Daru.has_gsl?
  dtype == :gsl ? @data.data : GSL::Vector.alloc(only_valid(:array).to_a)
end

#to_hObject

Convert to hash (explicit). Hash keys are indexes and values are the correspoding elements



822
823
824
# File 'lib/daru/vector.rb', line 822

def to_h
  @index.map { |index| [index, self[index]] }.to_h
end

#to_html(threshold = 30) ⇒ Object

Convert to html for iruby



837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
# File 'lib/daru/vector.rb', line 837

def to_html threshold=30
  name = @name || 'nil'
  html = '<table>' \
    '<tr>' \
      '<th colspan="2">' \
        "Daru::Vector:#{object_id} " + " size: #{size}" \
      '</th>' \
    '</tr>'
  html += '<tr><th> </th><th>' + name.to_s + '</th></tr>'
  @index.each_with_index do |index, num|
    html += '<tr><td>' + index.to_s + '</td>' + '<td>' + self[index].to_s + '</td></tr>'

    next if num <= threshold
    html += '<tr><td>...</td><td>...</td></tr>'

    last_index = @index.to_a.last
    html += '<tr>' \
              '<td>' + last_index.to_s       + '</td>' \
              '<td>' + self[last_index].to_s + '</td>' \
            '</tr>'
    break
  end
  html += '</table>'

  html
end

#to_jsonObject

Convert the hash from to_h to json



832
833
834
# File 'lib/daru/vector.rb', line 832

def to_json(*)
  to_h.to_json
end

#to_matrix(axis = :horizontal) ⇒ Object

Convert Vector to a horizontal or vertical Ruby Matrix.

Arguments

  • axis - Specify whether you want a :horizontal or a :vertical matrix.



804
805
806
807
808
809
810
811
812
# File 'lib/daru/vector.rb', line 804

def to_matrix axis=:horizontal
  if axis == :horizontal
    Matrix[to_a]
  elsif axis == :vertical
    Matrix.columns([to_a])
  else
    raise ArgumentError, "axis should be either :horizontal or :vertical, not #{axis}"
  end
end

#to_REXPObject

rubocop:disable Style/MethodName



17
18
19
# File 'lib/daru/extensions/rserve.rb', line 17

def to_REXP # rubocop:disable Style/MethodName
  Rserve::REXP::Wrapper.wrap(to_a)
end

#to_sObject



864
865
866
# File 'lib/daru/vector.rb', line 864

def to_s
  to_html
end

#typeObject

The type of data contained in the vector. Can be :object or :numeric. If the underlying dtype is an NMatrix, this method will return the data type of the NMatrix object.

Running through the data to figure out the kind of data is delayed to the last possible moment.



476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
# File 'lib/daru/vector.rb', line 476

def type
  return @data.nm_dtype if dtype == :nmatrix

  if @type.nil? || @possibly_changed_type
    @type = :numeric
    each do |e|
      next if e.nil? || e.is_a?(Numeric)
      @type = :object
      break
    end
    @possibly_changed_type = false
  end

  @type
end

#uniqObject

Keep only unique elements of the vector alongwith their indexes.



501
502
503
504
505
506
507
508
# File 'lib/daru/vector.rb', line 501

def uniq
  uniq_vector = @data.uniq
  new_index   = uniq_vector.each_with_object([]) do |element, acc|
    acc << index_of(element)
  end

  Daru::Vector.new uniq_vector, name: @name, metadata: @metadata.dup, index: new_index, dtype: @dtype
end

#updateObject

Method for updating the metadata (i.e. missing value positions) of the after assingment/deletion etc. are complete. This is provided so that time is not wasted in creating the metadata for the vector each time assignment/deletion of elements is done. Updating data this way is called lazy loading. To set or unset lazy loading, see the .lazy_update= method.



281
282
283
# File 'lib/daru/vector.rb', line 281

def update
  Daru.lazy_update and set_missing_positions
end

#verifyObject

Reports all values that doesn't comply with a condition. Returns a hash with the index of data and the invalid data.



636
637
638
639
640
641
642
643
# File 'lib/daru/vector.rb', line 636

def verify
  h = {}
  (0...size).each do |i|
    h[i] = @data[i] unless yield(@data[i])
  end

  h
end

#where(bool_arry) ⇒ Object

Return a new vector based on the contents of a boolean array. Use with the comparator methods to obtain meaningful results. See this notebook for a good overview of using #where.

Parameters:

  • bool_arry (Daru::Core::Query::BoolArray, Array<TrueClass, FalseClass>)

    The collection containing the true of false values. Each element in the Vector corresponding to a `true` in the bool_arry will be returned alongwith it's index.



408
409
410
# File 'lib/daru/vector.rb', line 408

def where bool_arry
  Daru::Core::Query.vector_where @data.to_a, @index.to_a, bool_arry, dtype
end