Class: Daru::Vector
- Inherits:
-
Object
- Object
- Daru::Vector
- Includes:
- Maths::Arithmetic::Vector, Maths::Statistics::Vector, Plotting::Vector, Enumerable
- Defined in:
- lib/daru/vector.rb,
lib/daru/extensions/rserve.rb
Instance Attribute Summary collapse
-
#dtype ⇒ Object
readonly
The underlying dtype of the Vector.
-
#index ⇒ Object
The row index.
-
#labels ⇒ Object
Store a hash of labels for values.
-
#missing_positions ⇒ Object
readonly
An Array or the positions in the vector that are being treated as ‘missing’.
-
#name ⇒ Object
readonly
The name of the Daru::Vector.
-
#nm_dtype ⇒ Object
readonly
If the dtype is :nmatrix, this attribute represents the data type of the underlying NMatrix object.
-
#size ⇒ Object
readonly
The total number of elements of the vector.
Class Method Summary collapse
-
.[](*args) ⇒ Object
Create a vector using (almost) any object * Array: flattened * Range: transformed using to_a * Daru::Vector * Numeric and string values.
-
._load(data) ⇒ Object
:nodoc:.
-
.new_with_size(n, opts = {}, &block) ⇒ Object
Create a new vector by specifying the size and an optional value and block to generate values.
Instance Method Summary collapse
-
#==(other) ⇒ Object
Two vectors are equal if the have the exact same index values corresponding with the exact same elements.
-
#[](*indexes) ⇒ Object
Get one or more elements with specified index or a range.
-
#[]=(*location, value) ⇒ Object
Just like in Hashes, you can specify the index label of the Daru::Vector and assign an element an that place in the Daru::Vector.
-
#_dump(depth) ⇒ Object
:nodoc:.
- #all?(&block) ⇒ Boolean
- #any?(&block) ⇒ Boolean
-
#bootstrap(estimators, nr, s = nil) ⇒ Object
Bootstrap Generate
nr
resamples (with replacement) of sizes
from vector, computing each estimate fromestimators
over each resample. -
#cast(opts = {}) ⇒ Object
Cast a vector to a new data type.
-
#clone_structure ⇒ Object
Copies the structure of the vector (i.e the index, size, etc.) and fills all all values with nils.
-
#concat(element, index) ⇒ Object
(also: #push, #<<)
Append an element to the vector by specifying the element and index.
- #daru_vector(*name) ⇒ Object (also: #dv)
-
#db_type(dbs = :mysql) ⇒ Object
Returns the database type for the vector, according to its content.
-
#delete(element) ⇒ Object
Delete an element by value.
-
#delete_at(index) ⇒ Object
Delete element by index.
-
#delete_if(&block) ⇒ Object
Delete an element if block returns true.
- #detach_index ⇒ Object
-
#dup ⇒ Object
Duplicate elements and indexes.
- #each(&block) ⇒ Object
- #each_index(&block) ⇒ Object
- #each_with_index(&block) ⇒ Object
-
#exists?(value) ⇒ Boolean
Returns true if the value passed is actually exists or is not marked as a *missing value*.
-
#has_index?(index) ⇒ Boolean
Returns true if an index exists.
-
#has_missing_data? ⇒ Boolean
(also: #flawed?)
Reports whether missing data is present in the Vector.
- #head(q = 10) ⇒ Object
-
#in(other) ⇒ Object
Comparator for checking if any of the elements in other exist in self.
-
#index_of(element) ⇒ Object
Get index of element.
-
#initialize(source, opts = {}) ⇒ Vector
constructor
Create a Vector object.
-
#inspect(spacing = 20, threshold = 15) ⇒ Object
Over rides original inspect for pretty printing in irb.
-
#is_nil? ⇒ Boolean
Returns a vector which has true in the position where the element in self is nil, and false otherwise.
-
#jackknife(estimators, k = 1) ⇒ Object
Jacknife Returns a dataset with jacknife delete-
k
estimators
estimators
could be: a) Hash with variable names as keys and lambdas as values a.jacknife(:log_s2=>lambda {|v| Math.log(v.variance)}) b) Array with method names to jacknife a.jacknife([:mean, :sd]) c) A single method to jacknife a.jacknife(:mean)k
represent the block size for block jacknife. -
#keep_if(&block) ⇒ Object
Keep an element if block returns true.
-
#lag(k = 1) ⇒ Object
Lags the series by k periods.
- #map!(&block) ⇒ Object
- #method_missing(name, *args, &block) ⇒ Object
-
#missing_values ⇒ Object
The values to be treated as ‘missing’.
-
#missing_values=(values) ⇒ Object
Assign an Array to treat certain values as ‘missing’.
-
#n_valid ⇒ Object
number of non-missing elements.
-
#not_nil? ⇒ Boolean
Opposite of #is_nil?.
-
#only_missing(as_a = :vector) ⇒ Object
Returns a Vector containing only missing data (preserves indexes).
-
#only_numerics ⇒ Object
Returns a Vector with only numerical data.
-
#only_valid(as_a = :vector, duplicate = true) ⇒ Object
Creates a new vector consisting only of non-nil data.
-
#recode(dt = nil, &block) ⇒ Object
Like map, but returns a Daru::Vector with the returned values.
-
#recode!(dt = nil, &block) ⇒ Object
Destructive version of recode!.
-
#reindex(new_index) ⇒ Object
Create a new vector with a different index, and preserve the indexing of current elements.
-
#rename(new_name) ⇒ Object
Give the vector a new name.
-
#replace_nils(replacement) ⇒ Object
Non-destructive version of #replace_nils!.
-
#replace_nils!(replacement) ⇒ Object
Replace all nils in the vector with the value passed as an argument.
- #report_building(b) ⇒ Object
- #reset_index! ⇒ Object
-
#save(filename) ⇒ Object
Save the vector to a file.
-
#sort(opts = {}, &block) ⇒ Object
Sorts a vector according to its values.
-
#sorted_data(&block) ⇒ Object
Just sort the data and get an Array in return using Enumerable#sort.
-
#split_by_separator(sep = ",") ⇒ Object
Returns a hash of Vectors, defined by the different values defined on the fields Example:.
- #split_by_separator_freq(sep = ",") ⇒ Object
-
#splitted(sep = ",") ⇒ Object
Return an Array with the data splitted by a separator.
-
#summary(method = :to_text) ⇒ Object
Create a summary of the Vector using Report Builder.
- #tail(q = 10) ⇒ Object
-
#to_a ⇒ Object
Return an array.
-
#to_gsl ⇒ Object
If dtype != gsl, will convert data to GSL::Vector with to_a.
-
#to_hash ⇒ Object
Convert to hash.
-
#to_html(threshold = 30) ⇒ Object
Convert to html for iruby.
-
#to_json(*args) ⇒ Object
Convert the hash from to_hash to json.
-
#to_matrix(axis = :horizontal) ⇒ Object
Convert Vector to a horizontal or vertical Ruby Matrix.
- #to_REXP ⇒ Object
- #to_s ⇒ Object
-
#type ⇒ Object
The type of data contained in the vector.
-
#uniq ⇒ Object
Keep only unique elements of the vector alongwith their indexes.
-
#update ⇒ Object
Method for updating the metadata (i.e. missing value positions) of the after assingment/deletion etc.
-
#verify(&block) ⇒ Object
Reports all values that doesn’t comply with a condition.
-
#where(bool_arry) ⇒ Object
Return a new vector based on the contents of a boolean array.
Methods included from Plotting::Vector
Methods included from Maths::Statistics::Vector
#acf, #acvf, #average_deviation_population, #box_cox_transformation, #center, #coefficient_of_variation, #count, #cumsum, #dichotomize, #diff, #ema, #factors, #freqs, #frequencies, #kurtosis, #macd, #max, #max_index, #mean, #median, #median_absolute_deviation, #min, #mode, #percentile, #product, #proportion, #proportions, #range, #ranked, #rolling, #rolling_count, #rolling_max, #rolling_mean, #rolling_median, #rolling_min, #rolling_std, #rolling_sum, #rolling_variance, #sample_with_replacement, #sample_without_replacement, #skew, #standard_deviation_population, #standard_deviation_sample, #standard_error, #standardize, #sum, #sum_of_squared_deviation, #sum_of_squares, #value_counts, #variance_population, #variance_sample, #vector_centered_compute, #vector_percentile, #vector_standardized_compute
Methods included from Maths::Arithmetic::Vector
#%, #*, #**, #+, #-, #/, #abs, #exp, #round, #sqrt
Constructor Details
#initialize(source, opts = {}) ⇒ Vector
Create a Vector object.
Arguments
Hash. If Array, a numeric index will be created if not supplied in the options. Specifying more index elements than actual values in source will insert nil into the surplus index elements. When a Hash is specified, the keys of the Hash are taken as the index elements and the corresponding values as the values that populate the vector.
Options
-
:name
- Name of the vector -
:index
- Index of the vector -
:dtype
- The underlying data type. Can be :array, :nmatrix or :gsl.
Default :array.
-
:nm_dtype
- For NMatrix, the data type of the numbers. See the NMatrix docs for
further information on supported data type.
-
:missing_values
- An Array of the values that are to be treated as ‘missing’.
nil is the default missing value.
Usage
vecarr = Daru::Vector.new [1,2,3,4], index: [:a, :e, :i, :o]
vechsh = Daru::Vector.new({a: 1, e: 2, i: 3, o: 4})
93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
# File 'lib/daru/vector.rb', line 93 def initialize source, opts={} index = nil if source.is_a?(Hash) index = source.keys source = source.values else index = opts[:index] source = source || [] end name = opts[:name] set_name name @data = cast_vector_to(opts[:dtype] || :array, source, opts[:nm_dtype]) @index = try_create_index(index || @data.size) if @index.size > @data.size cast(dtype: :array) # NM with nils seg faults (@index.size - @data.size).times { @data << nil } elsif @index.size < @data.size raise IndexError, "Expected index size >= vector size. Index size : #{@index.size}, vector size : #{@data.size}" end @possibly_changed_type = true set_missing_values opts[:missing_values] set_missing_positions set_size end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(name, *args, &block) ⇒ Object
1202 1203 1204 1205 1206 1207 1208 1209 1210 |
# File 'lib/daru/vector.rb', line 1202 def method_missing(name, *args, &block) if name.match(/(.+)\=/) self[name] = args[0] elsif has_index?(name) self[name] else super(name, *args, &block) end end |
Instance Attribute Details
#dtype ⇒ Object (readonly)
The underlying dtype of the Vector. Can be either :array, :nmatrix or :gsl.
52 53 54 |
# File 'lib/daru/vector.rb', line 52 def dtype @dtype end |
#index ⇒ Object
The row index. Can be either Daru::Index or Daru::MultiIndex.
48 49 50 |
# File 'lib/daru/vector.rb', line 48 def index @index end |
#labels ⇒ Object
Store a hash of labels for values. Supplementary only. Recommend using index for proper usage.
61 62 63 |
# File 'lib/daru/vector.rb', line 61 def labels @labels end |
#missing_positions ⇒ Object (readonly)
An Array or the positions in the vector that are being treated as ‘missing’.
58 59 60 |
# File 'lib/daru/vector.rb', line 58 def missing_positions @missing_positions end |
#name ⇒ Object (readonly)
The name of the Daru::Vector. String.
46 47 48 |
# File 'lib/daru/vector.rb', line 46 def name @name end |
#nm_dtype ⇒ Object (readonly)
If the dtype is :nmatrix, this attribute represents the data type of the underlying NMatrix object. See NMatrix docs for more details on NMatrix data types.
56 57 58 |
# File 'lib/daru/vector.rb', line 56 def nm_dtype @nm_dtype end |
#size ⇒ Object (readonly)
The total number of elements of the vector.
50 51 52 |
# File 'lib/daru/vector.rb', line 50 def size @size end |
Class Method Details
.[](*args) ⇒ Object
Create a vector using (almost) any object
-
Array: flattened
-
Range: transformed using to_a
-
Daru::Vector
-
Numeric and string values
Description
The ‘Vector.[]` class method creates a vector from almost any object that has a `#to_a` method defined on it. It is similar to R’s ‘c` method.
Usage
a = Daru::Vector[1,2,3,4,6..10]
#=>
# <Daru::Vector:99448510 @name = nil @size = 9 >
# nil
# 0 1
# 1 2
# 2 3
# 3 4
# 4 6
# 5 7
# 6 8
# 7 9
# 8 10
174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 |
# File 'lib/daru/vector.rb', line 174 def self.[](*args) values = [] args.each do |a| case a when Array values.concat a.flatten when Daru::Vector values.concat a.to_a when Range values.concat a.to_a else values << a end end Daru::Vector.new(values) end |
._load(data) ⇒ Object
:nodoc:
1190 1191 1192 1193 1194 |
# File 'lib/daru/vector.rb', line 1190 def self._load(data) # :nodoc: h = Marshal.load(data) Daru::Vector.new(h[:data], index: h[:index], name: h[:name], dtype: h[:dtype], missing_values: h[:missing_values]) end |
.new_with_size(n, opts = {}, &block) ⇒ Object
Create a new vector by specifying the size and an optional value and block to generate values.
Description
The new_with_size class method lets you create a Daru::Vector by specifying the size as the argument. The optional block, if supplied, is run once for populating each element in the Vector.
The result of each run of the block is the value that is ultimately assigned to that position in the Vector.
Options
:value All the rest like .new
136 137 138 139 140 141 142 143 144 145 |
# File 'lib/daru/vector.rb', line 136 def self.new_with_size n, opts={}, &block value = opts[:value] opts.delete :value if block vector = Daru::Vector.new n.times.map { |i| block.call(i) }, opts else vector = Daru::Vector.new n.times.map { value }, opts end vector end |
Instance Method Details
#==(other) ⇒ Object
Two vectors are equal if the have the exact same index values corresponding with the exact same elements. Name is ignored.
325 326 327 328 329 330 331 332 333 334 335 |
# File 'lib/daru/vector.rb', line 325 def == other case other when Daru::Vector @index == other.index and @size == other.size and @index.all? do |index| self[index] == other[index] end else super end end |
#[](*indexes) ⇒ Object
Get one or more elements with specified index or a range.
Usage
# For vectors employing single layer Index
v[:one, :two] # => Daru::Vector with indexes :one and :two
v[:one] # => Single element
v[:one..:three] # => Daru::Vector with indexes :one, :two and :three
# For vectors employing hierarchial multi index
202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 |
# File 'lib/daru/vector.rb', line 202 def [](*indexes) location = indexes[0] if @index.is_a?(MultiIndex) sub_index = @index[indexes] result = if sub_index.is_a?(Integer) @data[sub_index] else elements = sub_index.map do |tuple| @data[@index[tuple]] end if !indexes[0].is_a?(Range) and indexes.size < @index.width sub_index = sub_index.drop_left_level indexes.size end Daru::Vector.new( elements, index: sub_index, name: @name, dtype: @dtype) end return result else raise TypeError, "Invalid index type #{location.inspect}.\ \nUsage: v[:a, :b] gives elements with keys :a and :b for vector v." if location.is_a? Array unless indexes[1] case location when Range first = location.first last = location.last indexes = @index.slice first, last else pos = @index[location] if pos.is_a?(Numeric) return @data[pos] else indexes = pos end end else indexes = indexes.map { |e| named_index_for(e) } end begin Daru::Vector.new( indexes.map { |loc| @data[@index[loc]] }, name: @name, index: indexes, dtype: @dtype) rescue NoMethodError raise IndexError, "Specified index #{pos.inspect} does not exist." end end end |
#[]=(*location, value) ⇒ Object
267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 |
# File 'lib/daru/vector.rb', line 267 def []=(*location, value) cast(dtype: :array) if value.nil? and dtype != :array @possibly_changed_type = true if @type == :object and (value.nil? or value.is_a?(Numeric)) @possibly_changed_type = true if @type == :numeric and (!value.is_a?(Numeric) and !value.nil?) location = location[0] unless @index.is_a?(MultiIndex) pos = @index[location] if pos.is_a?(Numeric) @data[pos] = value else begin pos.each { |tuple| self[tuple] = value } rescue NoMethodError raise IndexError, "Specified index #{pos.inspect} does not exist." end end set_size set_missing_positions unless Daru.lazy_update end |
#_dump(depth) ⇒ Object
:nodoc:
1181 1182 1183 1184 1185 1186 1187 1188 |
# File 'lib/daru/vector.rb', line 1181 def _dump(depth) # :nodoc: Marshal.dump({ data: @data.to_a, dtype: @dtype, name: @name, index: @index, missing_values: @missing_values}) end |
#all?(&block) ⇒ Boolean
553 554 555 |
# File 'lib/daru/vector.rb', line 553 def all? &block @data.data.all?(&block) end |
#any?(&block) ⇒ Boolean
549 550 551 |
# File 'lib/daru/vector.rb', line 549 def any? &block @data.data.any?(&block) end |
#bootstrap(estimators, nr, s = nil) ⇒ Object
Bootstrap
Generate nr
resamples (with replacement) of size s
from vector, computing each estimate from estimators
over each resample. estimators
could be a) Hash with variable names as keys and lambdas as values
a.bootstrap(:log_s2=>lambda {|v| Math.log(v.variance)},1000)
b) Array with names of method to bootstrap
a.bootstrap([:mean, :sd],1000)
c) A single method to bootstrap
a.jacknife(:mean, 1000)
If s is nil, is set to vector size by default.
Returns a DataFrame where each vector is a vector of length nr
containing the computed resample estimates.
1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 |
# File 'lib/daru/vector.rb', line 1041 def bootstrap(estimators, nr, s=nil) s ||= size h_est, es, bss = prepare_bootstrap(estimators) nr.times do |i| bs = sample_with_replacement(s) es.each do |estimator| bss[estimator].push(h_est[estimator].call(bs)) end end es.each do |est| bss[est] = Daru::Vector.new bss[est] end Daru::DataFrame.new bss end |
#cast(opts = {}) ⇒ Object
Cast a vector to a new data type.
Options
-
:dtype
- :array for Ruby Array. :nmatrix for NMatrix.
486 487 488 489 490 491 492 |
# File 'lib/daru/vector.rb', line 486 def cast opts={} dt = opts[:dtype] raise ArgumentError, "Unsupported dtype #{opts[:dtype]}" unless dt == :array or dt == :nmatrix or dt == :gsl @data = cast_vector_to dt unless @dtype == dt end |
#clone_structure ⇒ Object
Copies the structure of the vector (i.e the index, size, etc.) and fills all all values with nils.
1168 1169 1170 |
# File 'lib/daru/vector.rb', line 1168 def clone_structure Daru::Vector.new(([nil]*@size), name: @name, index: @index.dup) end |
#concat(element, index) ⇒ Object Also known as: push, <<
Append an element to the vector by specifying the element and index
469 470 471 472 473 474 475 476 477 |
# File 'lib/daru/vector.rb', line 469 def concat element, index raise IndexError, "Expected new unique index" if @index.include? index @index = @index | [index] @data[@index[index]] = element set_size set_missing_positions unless Daru.lazy_update end |
#daru_vector(*name) ⇒ Object Also known as: dv
1196 1197 1198 |
# File 'lib/daru/vector.rb', line 1196 def daru_vector *name self end |
#db_type(dbs = :mysql) ⇒ Object
Returns the database type for the vector, according to its content
1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 |
# File 'lib/daru/vector.rb', line 1153 def db_type(dbs=:mysql) # first, detect any character not number if @data.find {|v| v.to_s=~/\d{2,2}-\d{2,2}-\d{4,4}/} or @data.find {|v| v.to_s=~/\d{4,4}-\d{2,2}-\d{2,2}/} return "DATE" elsif @data.find {|v| v.to_s=~/[^0-9e.-]/ } return "VARCHAR (255)" elsif @data.find {|v| v.to_s=~/\./} return "DOUBLE" else return "INTEGER" end end |
#delete(element) ⇒ Object
Delete an element by value
495 496 497 |
# File 'lib/daru/vector.rb', line 495 def delete element self.delete_at index_of(element) end |
#delete_at(index) ⇒ Object
Delete element by index
500 501 502 503 504 505 506 |
# File 'lib/daru/vector.rb', line 500 def delete_at index @data.delete_at @index[index] @index = Daru::Index.new(@index.to_a - [index]) set_size set_missing_positions unless Daru.lazy_update end |
#delete_if(&block) ⇒ Object
Delete an element if block returns true. Destructive.
627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 |
# File 'lib/daru/vector.rb', line 627 def delete_if &block return to_enum(:delete_if) unless block_given? keep_e = [] keep_i = [] each_with_index do |n, i| unless yield(n) keep_e << n keep_i << i end end @data = cast_vector_to @dtype, keep_e @index = Daru::Index.new(keep_i) set_missing_positions unless Daru.lazy_update set_size self end |
#detach_index ⇒ Object
820 821 822 823 824 825 |
# File 'lib/daru/vector.rb', line 820 def detach_index Daru::DataFrame.new({ index: @index.to_a, values: @data.to_a }) end |
#dup ⇒ Object
Duplicate elements and indexes
1022 1023 1024 |
# File 'lib/daru/vector.rb', line 1022 def dup Daru::Vector.new @data.dup, name: @name, index: @index.dup end |
#each(&block) ⇒ Object
17 18 19 20 21 22 |
# File 'lib/daru/vector.rb', line 17 def each(&block) return to_enum(:each) unless block_given? @data.each(&block) self end |
#each_index(&block) ⇒ Object
24 25 26 27 28 29 |
# File 'lib/daru/vector.rb', line 24 def each_index(&block) return to_enum(:each_index) unless block_given? @index.each(&block) self end |
#each_with_index(&block) ⇒ Object
31 32 33 34 35 36 |
# File 'lib/daru/vector.rb', line 31 def each_with_index(&block) return to_enum(:each_with_index) unless block_given? @index.each { |i| yield(self[i], i) } self end |
#exists?(value) ⇒ Boolean
Returns true if the value passed is actually exists or is not marked as a *missing value*.
606 607 608 |
# File 'lib/daru/vector.rb', line 606 def exists? value !@missing_values.has_key?(self[index_of(value)]) end |
#has_index?(index) ⇒ Boolean
Returns true if an index exists
838 839 840 |
# File 'lib/daru/vector.rb', line 838 def has_index? index @index.include? index end |
#has_missing_data? ⇒ Boolean Also known as: flawed?
Reports whether missing data is present in the Vector.
462 463 464 |
# File 'lib/daru/vector.rb', line 462 def has_missing_data? !missing_positions.empty? end |
#head(q = 10) ⇒ Object
453 454 455 |
# File 'lib/daru/vector.rb', line 453 def head q=10 self[0..(q-1)] end |
#in(other) ⇒ Object
Comparator for checking if any of the elements in other exist in self.
403 404 405 406 407 408 409 410 411 |
# File 'lib/daru/vector.rb', line 403 def in other other = Hash[other.zip(Array.new(other.size, 0))] Daru::Core::Query::BoolArray.new( @data.inject([]) do |memo, d| memo << (other.has_key?(d) ? true : false) memo end ) end |
#index_of(element) ⇒ Object
Get index of element
534 535 536 |
# File 'lib/daru/vector.rb', line 534 def index_of element @index.key @data.index(element) end |
#inspect(spacing = 20, threshold = 15) ⇒ Object
Over rides original inspect for pretty printing in irb
957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 |
# File 'lib/daru/vector.rb', line 957 def inspect spacing=20, threshold=15 longest = [@name.to_s.size, (@index.to_a.map(&:to_s).map(&:size).max || 0), (@data .map(&:to_s).map(&:size).max || 0), 'nil'.size].max content = "" longest = spacing if longest > spacing name = @name || 'nil' formatter = "\n%#{longest}.#{longest}s %#{longest}.#{longest}s" content += "\n#<" + self.class.to_s + ":" + self.object_id.to_s + " @name = " + name.to_s + " @size = " + size.to_s + " >" content += sprintf formatter, "", name @index.each_with_index do |index, num| content += sprintf formatter, index.to_s, (self[*index] || 'nil').to_s if num > threshold content += sprintf formatter, '...', '...' break end end content += "\n" content end |
#is_nil? ⇒ Boolean
764 765 766 767 768 769 770 771 |
# File 'lib/daru/vector.rb', line 764 def is_nil? nil_truth_vector = clone_structure @index.each do |idx| nil_truth_vector[idx] = self[idx].nil? ? true : false end nil_truth_vector end |
#jackknife(estimators, k = 1) ⇒ Object
Jacknife
Returns a dataset with jacknife delete-k
estimators
estimators
could be: a) Hash with variable names as keys and lambdas as values
a.jacknife(:log_s2=>lambda {|v| Math.log(v.variance)})
b) Array with method names to jacknife
a.jacknife([:mean, :sd])
c) A single method to jacknife
a.jacknife(:mean)
k
represent the block size for block jacknife. By default is set to 1, for classic delete-one jacknife.
Returns a dataset where each vector is an vector of length cases
/k
containing the computed jacknife estimates.
Reference:
-
Sawyer, S. (2005). Resampling Data: Using a Statistical Jacknife.
1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 |
# File 'lib/daru/vector.rb', line 1076 def jackknife(estimators, k=1) raise "n should be divisible by k:#{k}" unless size % k==0 nb = (size / k).to_i h_est, es, ps = prepare_bootstrap(estimators) est_n = es.inject({}) do |h,v| h[v] = h_est[v].call(self) h end nb.times do |i| other = @data.dup other.slice!(i*k, k) other = Daru::Vector.new other es.each do |estimator| # Add pseudovalue ps[estimator].push( nb * est_n[estimator] - (nb-1) * h_est[estimator].call(other)) end end es.each do |est| ps[est] = Daru::Vector.new ps[est] end Daru::DataFrame.new ps end |
#keep_if(&block) ⇒ Object
Keep an element if block returns true. Destructive.
648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 |
# File 'lib/daru/vector.rb', line 648 def keep_if &block return to_enum(:keep_if) unless block_given? keep_e = [] keep_i = [] each_with_index do |n, i| if yield(n) keep_e << n keep_i << i end end @data = cast_vector_to @dtype, keep_e @index = Daru::Index.new(keep_i) set_missing_positions unless Daru.lazy_update set_size self end |
#lag(k = 1) ⇒ Object
Lags the series by k periods.
The convention is to set the oldest observations (the first ones in the series) to nil so that the size of the lagged series is the same as the original.
Usage:
ts = Daru::Vector.new((1..10).map { rand })
# => [0.69, 0.23, 0.44, 0.71, ...]
ts.lag # => [nil, 0.69, 0.23, 0.44, ...]
ts.lag(2) # => [nil, nil, 0.69, 0.23, ...]
810 811 812 813 814 815 816 817 818 |
# File 'lib/daru/vector.rb', line 810 def lag k=1 return self.dup if k == 0 dat = @data.to_a.dup (dat.size - 1).downto(k) { |i| dat[i] = dat[i - k] } (0...k).each { |i| dat[i] = nil } Daru::Vector.new(dat, index: @index, name: @name) end |
#map!(&block) ⇒ Object
38 39 40 41 42 43 |
# File 'lib/daru/vector.rb', line 38 def map!(&block) return to_enum(:map!) unless block_given? @data.map!(&block) update self end |
#missing_values ⇒ Object
The values to be treated as ‘missing’. nil is the default missing type. To set missing values see the missing_values= method.
294 295 296 |
# File 'lib/daru/vector.rb', line 294 def missing_values @missing_values.keys end |
#missing_values=(values) ⇒ Object
307 308 309 310 |
# File 'lib/daru/vector.rb', line 307 def missing_values= values set_missing_values values set_missing_positions unless Daru.lazy_update end |
#n_valid ⇒ Object
number of non-missing elements
833 834 835 |
# File 'lib/daru/vector.rb', line 833 def n_valid @size - missing_positions.size end |
#not_nil? ⇒ Boolean
Opposite of #is_nil?
774 775 776 777 778 779 780 781 |
# File 'lib/daru/vector.rb', line 774 def not_nil? nil_truth_vector = clone_structure @index.each do |idx| nil_truth_vector[idx] = self[idx].nil? ? false : true end nil_truth_vector end |
#only_missing(as_a = :vector) ⇒ Object
Returns a Vector containing only missing data (preserves indexes).
1132 1133 1134 1135 1136 1137 1138 |
# File 'lib/daru/vector.rb', line 1132 def only_missing as_a=:vector if as_a == :vector self[*missing_positions] elsif as_a == :array self[*missing_positions].to_a end end |
#only_numerics ⇒ Object
Returns a Vector with only numerical data. Missing data is included but non-Numeric objects are excluded. Preserves index.
1142 1143 1144 1145 1146 1147 1148 1149 1150 |
# File 'lib/daru/vector.rb', line 1142 def only_numerics numeric_indexes = [] each_with_index do |v, i| numeric_indexes << i if(v.kind_of?(Numeric) or @missing_values.has_key?(v)) end self[*numeric_indexes] end |
#only_valid(as_a = :vector, duplicate = true) ⇒ Object
Creates a new vector consisting only of non-nil data
Arguments
as an Array. Otherwise will return a Daru::Vector.
vector, setting this to false will return the same vector. Otherwise, a duplicate will be returned irrespective of presence of missing data.
1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 |
# File 'lib/daru/vector.rb', line 1116 def only_valid as_a=:vector, duplicate=true return self.dup if !has_missing_data? and as_a == :vector and duplicate return self if !has_missing_data? and as_a == :vector and !duplicate return self.to_a if !has_missing_data? and as_a != :vector new_index = @index.to_a - missing_positions new_vector = new_index.map do |idx| self[idx] end return new_vector if as_a != :vector Daru::Vector.new new_vector, index: new_index, name: @name, dtype: dtype end |
#recode(dt = nil, &block) ⇒ Object
Like map, but returns a Daru::Vector with the returned values.
611 612 613 614 615 |
# File 'lib/daru/vector.rb', line 611 def recode dt=nil, &block return to_enum(:recode) unless block_given? dup.recode! dt, &block end |
#recode!(dt = nil, &block) ⇒ Object
Destructive version of recode!
618 619 620 621 622 623 624 |
# File 'lib/daru/vector.rb', line 618 def recode! dt=nil, &block return to_enum(:recode!) unless block_given? @data.map!(&block).data @data = cast_vector_to(dt || @dtype) self end |
#reindex(new_index) ⇒ Object
Create a new vector with a different index, and preserve the indexing of current elements.
984 985 986 987 988 989 990 991 992 993 994 995 996 |
# File 'lib/daru/vector.rb', line 984 def reindex new_index vector = Daru::Vector.new([], index: new_index, name: @name) new_index.each do |idx| if @index.include?(idx) vector[idx] = self[idx] else vector[idx] = nil end end vector end |
#rename(new_name) ⇒ Object
Give the vector a new name
1012 1013 1014 1015 1016 1017 1018 1019 |
# File 'lib/daru/vector.rb', line 1012 def rename new_name if new_name.is_a?(Numeric) @name = new_name return end @name = new_name end |
#replace_nils(replacement) ⇒ Object
Non-destructive version of #replace_nils!
828 829 830 |
# File 'lib/daru/vector.rb', line 828 def replace_nils replacement self.dup.replace_nils!(replacement) end |
#replace_nils!(replacement) ⇒ Object
Replace all nils in the vector with the value passed as an argument. Destructive. See #replace_nils for non-destructive version
Arguments
-
replacement
- The value which should replace all nils
789 790 791 792 793 794 795 |
# File 'lib/daru/vector.rb', line 789 def replace_nils! replacement missing_positions.each do |idx| self[idx] = replacement end self end |
#report_building(b) ⇒ Object
927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 |
# File 'lib/daru/vector.rb', line 927 def report_building b b.section(:name => name) do |s| s.text "n :#{size}" s.text "n valid:#{n_valid}" if @type == :object s.text "factors: #{factors.to_a.join(',')}" s.text "mode: #{mode}" s.table(:name => "Distribution") do |t| frequencies.sort_by { |a| a.to_s }.each do |k,v| key = @index.include?(k) ? @index[k] : k t.row [key, v , ("%0.2f%%" % (v.quo(n_valid)*100))] end end end s.text "median: #{median.to_s}" if (@type==:numeric or @type==:numeric) if @type==:numeric s.text "mean: %0.4f" % mean if sd s.text "std.dev.: %0.4f" % sd s.text "std.err.: %0.4f" % se s.text "skew: %0.4f" % skew s.text "kurtosis: %0.4f" % kurtosis end end end end |
#reset_index! ⇒ Object
745 746 747 748 |
# File 'lib/daru/vector.rb', line 745 def reset_index! @index = Daru::Index.new(Array.new(size) { |i| i }) self end |
#save(filename) ⇒ Object
Save the vector to a file
Arguments
-
filename - Path of file where the vector is to be saved
1177 1178 1179 |
# File 'lib/daru/vector.rb', line 1177 def save filename Daru::IO.save self, filename end |
#sort(opts = {}, &block) ⇒ Object
Sorts a vector according to its values. If a block is specified, the contents will be evaluated and data will be swapped whenever the block evaluates to true. Defaults to ascending order sorting. Any missing values will be put at the end of the vector. Preserves indexing. Default sort algorithm is quick sort.
Options
-
:ascending
- if false, will sort in descending order. Defaults to true. -
:type
- Specify the sorting algorithm. Only supports quick_sort for now.
Usage
v = Daru::Vector.new ["My first guitar", "jazz", "guitar"]
# Say you want to sort these strings by length.
v.sort(ascending: false) { |a,b| a.length <=> b.length }
573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 |
# File 'lib/daru/vector.rb', line 573 def sort opts={}, &block opts = { ascending: true, type: :quick_sort }.merge(opts) block = lambda { |a,b| return a <=> b if !(a.nil? || b.nil?) if a.nil? && b.nil? 0 elsif a.nil? -1 else 1 end } unless block order = opts[:ascending] ? :ascending : :descending vector, index = send(opts[:type], @data.to_a.dup, @index.to_a, order, &block) index = Daru::Index.new index Daru::Vector.new(vector, index: index, name: @name, dtype: @dtype) end |
#sorted_data(&block) ⇒ Object
Just sort the data and get an Array in return using Enumerable#sort. Non-destructive.
600 601 602 |
# File 'lib/daru/vector.rb', line 600 def sorted_data &block @data.to_a.sort(&block) end |
#split_by_separator(sep = ",") ⇒ Object
Returns a hash of Vectors, defined by the different values defined on the fields Example:
a=Daru::Vector.new(["a,b","c,d","a,b"])
a.split_by_separator
=> {"a"=>#<Daru::Vector:0x7f2dbcc09d88
@data=[1, 0, 1]>,
"b"=>#<Daru::Vector:0x7f2dbcc09c48
@data=[1, 1, 0]>,
"c"=>#<Daru::Vector:0x7f2dbcc09b08
@data=[0, 1, 1]>}
711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 |
# File 'lib/daru/vector.rb', line 711 def split_by_separator sep="," split_data = splitted sep factors = split_data.flatten.uniq.compact out = factors.inject({}) do |h,x| h[x] = [] h end split_data.each do |r| if r.nil? factors.each do |f| out[f].push(nil) end else factors.each do |f| out[f].push(r.include?(f) ? 1:0) end end end out.inject({}) do |s,v| s[v[0]] = Daru::Vector.new v[1] s end end |
#split_by_separator_freq(sep = ",") ⇒ Object
738 739 740 741 742 743 |
# File 'lib/daru/vector.rb', line 738 def split_by_separator_freq(sep=",") split_by_separator(sep).inject({}) do |a,v| a[v[0]] = v[1].inject { |s,x| s+x.to_i } a end end |
#splitted(sep = ",") ⇒ Object
Return an Array with the data splitted by a separator.
a=Daru::Vector.new(["a,b","c,d","a,b","d"])
a.splitted
=>
[["a","b"],["c","d"],["a","b"],["d"]]
686 687 688 689 690 691 692 693 694 695 696 |
# File 'lib/daru/vector.rb', line 686 def splitted sep="," @data.map do |s| if s.nil? nil elsif s.respond_to? :split s.split sep else [s] end end end |
#summary(method = :to_text) ⇒ Object
Create a summary of the Vector using Report Builder.
923 924 925 |
# File 'lib/daru/vector.rb', line 923 def summary(method = :to_text) ReportBuilder.new(no_title: true).add(self).send(method) end |
#tail(q = 10) ⇒ Object
457 458 459 |
# File 'lib/daru/vector.rb', line 457 def tail q=10 self[(@size - q)..(@size-1)] end |
#to_a ⇒ Object
Return an array
880 881 882 |
# File 'lib/daru/vector.rb', line 880 def to_a @data.to_a end |
#to_gsl ⇒ Object
If dtype != gsl, will convert data to GSL::Vector with to_a. Otherwise returns the stored GSL::Vector object.
859 860 861 862 863 864 865 866 867 868 869 |
# File 'lib/daru/vector.rb', line 859 def to_gsl if Daru.has_gsl? if dtype == :gsl return @data.data else GSL::Vector.alloc only_valid(:array).to_a end else raise NoMethodError, "Install gsl-nmatrix for access to this functionality." end end |
#to_hash ⇒ Object
Convert to hash. Hash keys are indexes and values are the correspoding elements
872 873 874 875 876 877 |
# File 'lib/daru/vector.rb', line 872 def to_hash @index.inject({}) do |hsh, index| hsh[index] = self[index] hsh end end |
#to_html(threshold = 30) ⇒ Object
Convert to html for iruby
890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 |
# File 'lib/daru/vector.rb', line 890 def to_html threshold=30 name = @name || 'nil' html = "<table>" + "<tr>" + "<th colspan=\"2\">" + "Daru::Vector:#{self.object_id} " + " size: #{size}" + "</th>" + "</tr>" html += '<tr><th> </th><th>' + name.to_s + '</th></tr>' @index.each_with_index do |index, num| html += '<tr><td>' + index.to_s + '</td>' + '<td>' + self[index].to_s + '</td></tr>' if num > threshold html += '<tr><td>...</td><td>...</td></tr>' last_index = @index.to_a.last html += '<tr>' + '<td>' + last_index.to_s + '</td>' + '<td>' + self[last_index].to_s + '</td>' + '</tr>' break end end html += '</table>' html end |
#to_json(*args) ⇒ Object
Convert the hash from to_hash to json
885 886 887 |
# File 'lib/daru/vector.rb', line 885 def to_json *args self.to_hash.to_json end |
#to_matrix(axis = :horizontal) ⇒ Object
Convert Vector to a horizontal or vertical Ruby Matrix.
Arguments
-
axis
- Specify whether you want a :horizontal or a :vertical matrix.
847 848 849 850 851 852 853 854 855 |
# File 'lib/daru/vector.rb', line 847 def to_matrix axis=:horizontal if axis == :horizontal Matrix[to_a] elsif axis == :vertical Matrix.columns([to_a]) else raise ArgumentError, "axis should be either :horizontal or :vertical, not #{axis}" end end |
#to_REXP ⇒ Object
17 18 19 |
# File 'lib/daru/extensions/rserve.rb', line 17 def to_REXP Rserve::REXP::Wrapper.wrap(self.to_a) end |
#to_s ⇒ Object
918 919 920 |
# File 'lib/daru/vector.rb', line 918 def to_s to_html end |
#type ⇒ Object
The type of data contained in the vector. Can be :object or :numeric. If the underlying dtype is an NMatrix, this method will return the data type of the NMatrix object.
Running through the data to figure out the kind of data is delayed to the last possible moment.
514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 |
# File 'lib/daru/vector.rb', line 514 def type return @data.nm_dtype if dtype == :nmatrix if @type.nil? or @possibly_changed_type @type = :numeric self.each do |e| unless e.nil? unless e.is_a?(Numeric) @type = :object break end end end @possibly_changed_type = false end @type end |
#uniq ⇒ Object
Keep only unique elements of the vector alongwith their indexes.
539 540 541 542 543 544 545 546 547 |
# File 'lib/daru/vector.rb', line 539 def uniq uniq_vector = @data.uniq new_index = uniq_vector.inject([]) do |acc, element| acc << index_of(element) acc end Daru::Vector.new uniq_vector, name: @name, index: new_index, dtype: @dtype end |
#update ⇒ Object
Method for updating the metadata (i.e. missing value positions) of the after assingment/deletion etc. are complete. This is provided so that time is not wasted in creating the metadata for the vector each time assignment/deletion of elements is done. Updating data this way is called lazy loading. To set or unset lazy loading, see the .lazy_update= method.
317 318 319 320 321 |
# File 'lib/daru/vector.rb', line 317 def update if Daru.lazy_update set_missing_positions end end |
#verify(&block) ⇒ Object
Reports all values that doesn’t comply with a condition. Returns a hash with the index of data and the invalid data.
670 671 672 673 674 675 676 677 678 679 |
# File 'lib/daru/vector.rb', line 670 def verify &block h = {} (0...size).each do |i| if !(yield @data[i]) h[i] = @data[i] end end h end |
#where(bool_arry) ⇒ Object
Return a new vector based on the contents of a boolean array. Use with the comparator methods to obtain meaningful results. See this notebook for a good overview of using #where.
449 450 451 |
# File 'lib/daru/vector.rb', line 449 def where bool_arry Daru::Core::Query.vector_where @data.to_a, @index.to_a, bool_arry, self.dtype end |