Class: Statsample::Vector
- Includes:
- Enumerable, Summarizable, VectorShorthands, Writable
- Defined in:
- lib/statsample/vector.rb,
lib/statsample/vector/gsl.rb,
lib/statsample/rserve_extension.rb
Overview
Collection of values on one dimension. Works as a column on a Spreadsheet.
Usage
The fast way to create a vector uses Array.to_vector or Array.to_numeric.
v=[1,2,3,4].to_vector(:numeric)
v=[1,2,3,4].to_numeric
Defined Under Namespace
Modules: GSL_
Instance Attribute Summary collapse
-
#data ⇒ Object
readonly
Original data.
-
#data_with_nils ⇒ Object
readonly
Original data, with all missing values replaced by nils.
-
#date_data_with_nils ⇒ Object
readonly
Date date, with all missing values replaced by nils.
-
#labels ⇒ Object
Change label for specific values.
-
#missing_data ⇒ Object
readonly
Missing values array.
-
#missing_values ⇒ Object
Array of values considered as missing.
-
#name ⇒ Object
Name of vector.
-
#today_values ⇒ Object
Array of values considered as “Today”, with date type.
-
#type ⇒ Object
Level of measurement.
-
#valid_data ⇒ Object
readonly
Valid data.
Class Method Summary collapse
-
.[](*args) ⇒ Object
Create a vector using (almost) any object * Array: flattened * Range: transformed using to_a * Statsample::Vector * Numeric and string values.
-
._load(data) ⇒ Object
:nodoc:.
-
.new_numeric(n, val = nil, &block) ⇒ Object
Create a new numeric type vector Parameters [n] Size [val] Value of each value [&block] If block provided, is used to set the values of vector.
-
.new_scale(n, val = nil, &block) ⇒ Object
Deprecated.
Instance Method Summary collapse
- #*(v) ⇒ Object
-
#+(v) ⇒ Object
Vector sum.
-
#-(v) ⇒ Object
Vector rest.
-
#==(v2) ⇒ Object
Vector equality.
-
#[](i) ⇒ Object
Retrieves i element of data.
-
#[]=(i, v) ⇒ Object
Set i element of data.
-
#_check_type(t) ⇒ Object
:nodoc:.
-
#_dump(i) ⇒ Object
:nodoc:.
-
#_frequencies ⇒ Object
:nodoc:.
-
#_set_valid_data_intern ⇒ Object
:nodoc:.
-
#_vector_ari(method, v) ⇒ Object
:nodoc:.
-
#add(v, update_valid = true) ⇒ Object
Add a value at the end of the vector.
-
#average_deviation_population(m = nil) ⇒ Object
(also: #adp)
Population average deviation (denominator N) author: Al Chou.
-
#bootstrap(estimators, nr, s = nil) ⇒ Object
Bootstrap Generate
nrresamples (with replacement) of sizesfrom vector, computing each estimate fromestimatorsover each resample. -
#box_cox_transformation(lambda) ⇒ Object
:nodoc:.
-
#can_be_date? ⇒ Boolean
Return true if all data is Date, “today” values or nil.
-
#can_be_numeric? ⇒ Boolean
Return true if all data is Numeric or nil.
-
#check_type(t) ⇒ Object
:nodoc:.
-
#coefficient_of_variation ⇒ Object
(also: #cov)
Coefficient of variation Calculed with the sample standard deviation.
-
#count(x = false) ⇒ Object
Retrieves number of cases which comply condition.
-
#db_type(dbs = 'mysql') ⇒ Object
Returns the database type for the vector, according to its content.
-
#dichotomize(low = nil) ⇒ Object
Dicotomize the vector with 0 and 1, based on lowest value If parameter if defined, this value and lower will be 0 and higher, 1.
-
#dup ⇒ Object
Creates a duplicate of the Vector.
-
#dup_empty ⇒ Object
Returns an empty duplicate of the vector.
-
#each ⇒ Object
Iterate on each item.
-
#each_index ⇒ Object
Iterate on each item, retrieving index.
-
#factors ⇒ Object
Retrieves uniques values for data.
-
#frequencies ⇒ Object
:nodoc:.
-
#has_missing_data? ⇒ Boolean
(also: #flawed?)
Retrieves true if data has one o more missing values.
-
#histogram(bins = 10) ⇒ Object
With a fixnum, creates X bins within the range of data With an Array, each value will be a cut point.
-
#initialize(data = [], type = :object, opts = Hash.new) ⇒ Vector
constructor
Creates a new Vector object.
- #inspect ⇒ Object
-
#is_valid?(x) ⇒ Boolean
Return true if a value is valid (not nil and not included on missing values).
-
#jacknife(estimators, k = 1) ⇒ Object
Jacknife Returns a dataset with jacknife delete-
kestimatorsestimatorscould be: a) Hash with variable names as keys and lambdas as values a.jacknife(:log_s2=>lambda {|v| Math.log(v.variance)}) b) Array with method names to jacknife a.jacknife([:mean, :sd]) c) A single method to jacknife a.jacknife(:mean)krepresent the block size for block jacknife. -
#kurtosis(m = nil) ⇒ Object
Kurtosis of the sample.
-
#labeling(x) ⇒ Object
(also: #label)
Retrieves label for value x.
-
#max ⇒ Object
Maximum value.
-
#mean ⇒ Object
The arithmetical mean of data.
-
#median ⇒ Object
Return the median (percentil 50).
- #median_absolute_deviation ⇒ Object (also: #mad)
-
#min ⇒ Object
Minimun value.
-
#mode ⇒ Object
Returns the most frequent item.
-
#n_valid ⇒ Object
The numbers of item with valid data.
-
#percentil(q, strategy = :midpoint) ⇒ Object
Percentil Returns the value of the percentile q.
-
#product ⇒ Object
Product of all values on the sample.
-
#proportion(v = 1) ⇒ Object
Proportion of a given value.
- #proportion_confidence_interval_t(n_poblation, margin = 0.95, v = 1) ⇒ Object
- #proportion_confidence_interval_z(n_poblation, margin = 0.95, v = 1) ⇒ Object
-
#proportions ⇒ Object
Returns a hash with the distribution of proportions of the sample.
- #push(v) ⇒ Object
-
#range ⇒ Object
The range of the data (max - min).
-
#ranked(type = :numeric) ⇒ Object
Returns a ranked vector.
-
#recode(type = nil) ⇒ Object
Returns a new vector, with data modified by block.
-
#recode! ⇒ Object
Modifies current vector, with data modified by block.
- #report_building(b) ⇒ Object
-
#sample_with_replacement(sample = 1) ⇒ Object
Returns an random sample of size n, with replacement, only with valid data.
-
#sample_without_replacement(sample = 1) ⇒ Object
Returns an random sample of size n, without replacement, only with valid data.
-
#set_valid_data ⇒ Object
Update valid_data, missing_data, data_with_nils and gsl at the end of an insertion.
-
#set_valid_data_intern ⇒ Object
:nodoc:.
-
#size ⇒ Object
(also: #n)
Size of total data.
-
#skew(m = nil) ⇒ Object
Skewness of the sample.
-
#split_by_separator(sep = Statsample::SPLIT_TOKEN) ⇒ Object
Returns a hash of Vectors, defined by the different values defined on the fields Example:.
- #split_by_separator_freq(sep = Statsample::SPLIT_TOKEN) ⇒ Object
-
#splitted(sep = Statsample::SPLIT_TOKEN) ⇒ Object
Return an array with the data splitted by a separator.
-
#standard_deviation_population(m = nil) ⇒ Object
(also: #sdp)
Population Standard deviation (denominator N).
-
#standard_deviation_sample(m = nil) ⇒ Object
(also: #sds, #sd)
Sample Standard deviation (denominator n-1).
-
#standard_error ⇒ Object
(also: #se)
Standard error of the distribution mean Calculated using sd/sqrt(n).
-
#sum ⇒ Object
The sum of values for the data.
-
#sum_of_squared_deviation ⇒ Object
Sum of squared deviation.
-
#sum_of_squares(m = nil) ⇒ Object
(also: #ss)
Sum of squares for the data around a value.
- #to_a ⇒ Object (also: #to_ary)
-
#to_matrix(dir = :horizontal) ⇒ Object
Ugly name.
- #to_REXP ⇒ Object
- #to_s ⇒ Object
-
#variance_population(m = nil) ⇒ Object
Population variance (denominator N).
-
#variance_proportion(n_poblation, v = 1) ⇒ Object
Variance of p, according to poblation size.
-
#variance_sample(m = nil) ⇒ Object
(also: #variance)
Sample Variance (denominator n-1).
-
#variance_total(n_poblation, v = 1) ⇒ Object
Variance of p, according to poblation size.
-
#vector_centered ⇒ Object
(also: #centered)
Return a centered vector.
-
#vector_centered_compute(m) ⇒ Object
:nodoc:.
-
#vector_labeled ⇒ Object
Returns a Vector with data with labels replaced by the label.
-
#vector_percentil ⇒ Object
Return a vector with values replaced with the percentiles of each values.
-
#vector_standarized(use_population = false) ⇒ Object
(also: #standarized)
Return a vector usign the standarized values for data with sd with denominator n-1.
-
#vector_standarized_compute(m, sd) ⇒ Object
:nodoc:.
-
#verify ⇒ Object
Reports all values that doesn’t comply with a condition.
Methods included from VectorShorthands
#to_numeric, #to_scale, #to_vector
Methods included from Summarizable
Methods included from Writable
Constructor Details
#initialize(data = [], type = :object, opts = Hash.new) ⇒ Vector
Creates a new Vector object.
-
dataAny data which can be converted on Array -
typeLevel of meausurement. See Vector#type -
optsHash of options-
:missing_valuesArray of missing values. See Vector#missing_values -
:today_valuesArray of ‘today’ values. See Vector#today_values -
:labelsLabels for data values -
:nameName of vector
-
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
# File 'lib/statsample/vector.rb', line 80 def initialize(data=[], type=:object, opts=Hash.new) if type == :ordinal or type == :scale $stderr.puts "WARNING: #{type} has been deprecated. Use :numeric instead." type = :numeric end if type == :nominal $stderr.puts "WARNING: nominal has been deprecated. Use :object instead." type = :object end @data=data.is_a?(Array) ? data : data.to_a @type=type opts_default={ :missing_values=>[], :today_values=>['NOW','TODAY', :NOW, :TODAY], :labels=>{}, :name=>nil } @opts=opts_default.merge(opts) if @opts[:name].nil? @@n_table||=0 @@n_table+=1 @opts[:name]="Vector #{@@n_table}" end @missing_values=@opts[:missing_values] @labels=@opts[:labels] @today_values=@opts[:today_values] @name=@opts[:name] @valid_data=[] @data_with_nils=[] @date_data_with_nils=[] @missing_data=[] @has_missing_data=nil @numeric_data=nil set_valid_data self.type=type end |
Instance Attribute Details
#data ⇒ Object (readonly)
Original data.
54 55 56 |
# File 'lib/statsample/vector.rb', line 54 def data @data end |
#data_with_nils ⇒ Object (readonly)
Original data, with all missing values replaced by nils
64 65 66 |
# File 'lib/statsample/vector.rb', line 64 def data_with_nils @data_with_nils end |
#date_data_with_nils ⇒ Object (readonly)
Date date, with all missing values replaced by nils
66 67 68 |
# File 'lib/statsample/vector.rb', line 66 def date_data_with_nils @date_data_with_nils end |
#labels ⇒ Object
Change label for specific values
68 69 70 |
# File 'lib/statsample/vector.rb', line 68 def labels @labels end |
#missing_data ⇒ Object (readonly)
Missing values array
62 63 64 |
# File 'lib/statsample/vector.rb', line 62 def missing_data @missing_data end |
#missing_values ⇒ Object
Array of values considered as missing. Nil is a missing value, by default
58 59 60 |
# File 'lib/statsample/vector.rb', line 58 def missing_values @missing_values end |
#name ⇒ Object
Name of vector. Should be used for output by many classes
70 71 72 |
# File 'lib/statsample/vector.rb', line 70 def name @name end |
#today_values ⇒ Object
Array of values considered as “Today”, with date type. “NOW”, “TODAY”, :NOW and :TODAY are ‘today’ values, by default
60 61 62 |
# File 'lib/statsample/vector.rb', line 60 def today_values @today_values end |
#type ⇒ Object
Level of measurement. Could be :object, :numeric
52 53 54 |
# File 'lib/statsample/vector.rb', line 52 def type @type end |
#valid_data ⇒ Object (readonly)
Valid data. Equal to data, minus values assigned as missing values
56 57 58 |
# File 'lib/statsample/vector.rb', line 56 def valid_data @valid_data end |
Class Method Details
.[](*args) ⇒ Object
Create a vector using (almost) any object
-
Array: flattened
-
Range: transformed using to_a
-
Statsample::Vector
-
Numeric and string values
123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
# File 'lib/statsample/vector.rb', line 123 def self.[](*args) values=[] args.each do |a| case a when Array values.concat a.flatten when Statsample::Vector values.concat a.to_a when Range values.concat a.to_a else values << a end end vector=new(values) vector.type=:numeric if vector.can_be_numeric? vector end |
._load(data) ⇒ Object
:nodoc:
256 257 258 259 |
# File 'lib/statsample/vector.rb', line 256 def self._load(data) # :nodoc: h=Marshal.load(data) Vector.new(h['data'], h['type'], :missing_values=> h['missing_values'], :labels=>h['labels'], :name=>h['name']) end |
.new_numeric(n, val = nil, &block) ⇒ Object
Create a new numeric type vector Parameters
- n
-
Size
- val
-
Value of each value
- &block
-
If block provided, is used to set the values of vector
146 147 148 149 150 151 152 153 154 |
# File 'lib/statsample/vector.rb', line 146 def self.new_numeric(n,val=nil, &block) if block vector=n.times.map {|i| block.call(i)}.to_numeric else vector=n.times.map { val}.to_numeric end vector.type=:numeric vector end |
.new_scale(n, val = nil, &block) ⇒ Object
Deprecated. Use new_numeric instead.
157 158 159 160 |
# File 'lib/statsample/vector.rb', line 157 def self.new_scale(n, val=nil,&block) $stderr.puts "WARNING: .new_scale has been deprecated. Use .new_numeric instead." new_numeric n, val, &block end |
Instance Method Details
#*(v) ⇒ Object
451 452 453 |
# File 'lib/statsample/vector.rb', line 451 def *(v) _vector_ari("*",v) end |
#+(v) ⇒ Object
Vector sum.
-
If v is a scalar, add this value to all elements
-
If v is a Array or a Vector, should be of the same size of this vector every item of this vector will be added to the value of the item at the same position on the other vector
437 438 439 |
# File 'lib/statsample/vector.rb', line 437 def +(v) _vector_ari("+",v) end |
#-(v) ⇒ Object
Vector rest.
-
If v is a scalar, rest this value to all elements
-
If v is a Array or a Vector, should be of the same size of this vector every item of this vector will be rested to the value of the item at the same position on the other vector
447 448 449 |
# File 'lib/statsample/vector.rb', line 447 def -(v) _vector_ari("-",v) end |
#==(v2) ⇒ Object
Vector equality. Two vector will be the same if their data, missing values, type, labels are equals
247 248 249 250 |
# File 'lib/statsample/vector.rb', line 247 def ==(v2) return false unless v2.instance_of? Statsample::Vector @data==v2.data and @missing_values==v2.missing_values and @type==v2.type and @labels==v2.labels end |
#[](i) ⇒ Object
Retrieves i element of data
394 395 396 |
# File 'lib/statsample/vector.rb', line 394 def [](i) @data[i] end |
#[]=(i, v) ⇒ Object
Set i element of data. Note: Use set_valid_data if you include missing values
399 400 401 |
# File 'lib/statsample/vector.rb', line 399 def []=(i,v) @data[i]=v end |
#_check_type(t) ⇒ Object
:nodoc:
185 186 187 188 |
# File 'lib/statsample/vector.rb', line 185 def _check_type(t) #:nodoc: raise NoMethodError if (t == :numeric and @type == :object) or (t == :date) or (:date == @type) end |
#_dump(i) ⇒ Object
:nodoc:
252 253 254 |
# File 'lib/statsample/vector.rb', line 252 def _dump(i) # :nodoc: Marshal.dump({'data'=>@data,'missing_values'=>@missing_values, 'labels'=>@labels, 'type'=>@type,'name'=>@name}) end |
#_frequencies ⇒ Object
:nodoc:
775 776 777 778 779 780 781 |
# File 'lib/statsample/vector.rb', line 775 def _frequencies #:nodoc: @valid_data.inject(Hash.new) {|a,x| a[x]||=0 a[x]=a[x]+1 a } end |
#_set_valid_data_intern ⇒ Object
:nodoc:
351 352 353 354 355 356 357 358 359 360 361 362 |
# File 'lib/statsample/vector.rb', line 351 def _set_valid_data_intern #:nodoc: @data.each do |n| if is_valid? n @valid_data.push(n) @data_with_nils.push(n) else @data_with_nils.push(nil) @missing_data.push(n) end end @has_missing_data=@missing_data.size>0 end |
#_vector_ari(method, v) ⇒ Object
:nodoc:
465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 |
# File 'lib/statsample/vector.rb', line 465 def _vector_ari(method,v) # :nodoc: if(v.is_a? Vector or v.is_a? Array) raise ArgumentError, "The array/vector parameter (#{v.size}) should be of the same size of the original vector (#{@data.size})" unless v.size==@data.size sum=[] v.size.times {|i| if((v.is_a? Vector and v.is_valid?(v[i]) and is_valid?(@data[i])) or (v.is_a? Array and !v[i].nil? and !data[i].nil?)) sum.push(@data[i].send(method,v[i])) else sum.push(nil) end } Statsample::Vector.new(sum, :numeric) elsif(v.respond_to? method ) Statsample::Vector.new( @data.collect {|x| if(!x.nil?) x.send(method,v) else nil end } , :numeric) else raise TypeError,"You should pass a scalar or a array/vector" end end |
#add(v, update_valid = true) ⇒ Object
Add a value at the end of the vector. If second argument set to false, you should update the Vector usign Vector.set_valid_data at the end of your insertion cycle
314 315 316 317 |
# File 'lib/statsample/vector.rb', line 314 def add(v,update_valid=true) @data.push(v) set_valid_data if update_valid end |
#average_deviation_population(m = nil) ⇒ Object Also known as: adp
Population average deviation (denominator N) author: Al Chou
1003 1004 1005 1006 1007 |
# File 'lib/statsample/vector.rb', line 1003 def average_deviation_population( m = nil ) check_type :numeric m ||= mean ( @numeric_data.inject( 0 ) { |a, x| ( x - m ).abs + a } ).quo( n_valid ) end |
#bootstrap(estimators, nr, s = nil) ⇒ Object
Bootstrap
Generate nr resamples (with replacement) of size s from vector, computing each estimate from estimators over each resample. estimators could be a) Hash with variable names as keys and lambdas as values
a.bootstrap(:log_s2=>lambda {|v| Math.log(v.variance)},1000)
b) Array with names of method to bootstrap
a.bootstrap([:mean, :sd],1000)
c) A single method to bootstrap
a.jacknife(:mean, 1000)
If s is nil, is set to vector size by default.
Returns a dataset where each vector is an vector of length nr containing the computed resample estimates.
565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 |
# File 'lib/statsample/vector.rb', line 565 def bootstrap(estimators, nr, s=nil) s||=n h_est, es, bss= prepare_bootstrap(estimators) nr.times do |i| bs=sample_with_replacement(s) es.each do |estimator| # Add bootstrap bss[estimator].push(h_est[estimator].call(bs)) end end es.each do |est| bss[est]=bss[est].to_numeric bss[est].type=:numeric end bss.to_dataset end |
#box_cox_transformation(lambda) ⇒ Object
:nodoc:
230 231 232 233 234 235 236 237 238 239 240 241 242 243 |
# File 'lib/statsample/vector.rb', line 230 def box_cox_transformation(lambda) # :nodoc: raise "Should be a numeric" unless @type==:numeric @data_with_nils.collect{|x| if !x.nil? if(lambda==0) Math.log(x) else (x**lambda-1).quo(lambda) end else nil end }.to_vector(:numeric) end |
#can_be_date? ⇒ Boolean
Return true if all data is Date, “today” values or nil
719 720 721 722 723 724 725 726 |
# File 'lib/statsample/vector.rb', line 719 def can_be_date? if @data.find {|v| !v.nil? and !v.is_a? Date and !v.is_a? Time and (v.is_a? String and !@today_values.include? v) and (v.is_a? String and !(v=~/\d{4,4}[-\/]\d{1,2}[-\/]\d{1,2}/))} false else true end end |
#can_be_numeric? ⇒ Boolean
Return true if all data is Numeric or nil
728 729 730 731 732 733 734 |
# File 'lib/statsample/vector.rb', line 728 def can_be_numeric? if @data.find {|v| !v.nil? and !v.is_a? Numeric and !@missing_values.include? v} false else true end end |
#check_type(t) ⇒ Object
:nodoc:
175 176 177 |
# File 'lib/statsample/vector.rb', line 175 def check_type(t) Statsample::STATSAMPLE__.check_type(self,t) end |
#coefficient_of_variation ⇒ Object Also known as: cov
Coefficient of variation Calculed with the sample standard deviation
1075 1076 1077 1078 |
# File 'lib/statsample/vector.rb', line 1075 def coefficient_of_variation check_type :numeric standard_deviation_sample.quo(mean) end |
#count(x = false) ⇒ Object
Retrieves number of cases which comply condition. If block given, retrieves number of instances where block returns true. If other values given, retrieves the frequency for this value.
692 693 694 695 696 697 698 699 700 701 702 |
# File 'lib/statsample/vector.rb', line 692 def count(x=false) if block_given? r=@data.inject(0) {|s, i| r=yield i s+(r ? 1 : 0) } r.nil? ? 0 : r else frequencies[x].nil? ? 0 : frequencies[x] end end |
#db_type(dbs = 'mysql') ⇒ Object
Returns the database type for the vector, according to its content
706 707 708 709 710 711 712 713 714 715 716 717 |
# File 'lib/statsample/vector.rb', line 706 def db_type(dbs='mysql') # first, detect any character not number if @data.find {|v| v.to_s=~/\d{2,2}-\d{2,2}-\d{4,4}/} or @data.find {|v| v.to_s=~/\d{4,4}-\d{2,2}-\d{2,2}/} return "DATE" elsif @data.find {|v| v.to_s=~/[^0-9e.-]/ } return "VARCHAR (255)" elsif @data.find {|v| v.to_s=~/\./} return "DOUBLE" else return "INTEGER" end end |
#dichotomize(low = nil) ⇒ Object
Dicotomize the vector with 0 and 1, based on lowest value If parameter if defined, this value and lower will be 0 and higher, 1
284 285 286 287 288 289 290 291 292 293 294 295 296 |
# File 'lib/statsample/vector.rb', line 284 def dichotomize(low = nil) low ||= factors.min @data_with_nils.collect do |x| if x.nil? nil elsif x > low 1 else 0 end end.to_numeric end |
#dup ⇒ Object
Creates a duplicate of the Vector. Note: data, missing_values and labels are duplicated, so changes on original vector doesn’t propages to copies.
164 165 166 |
# File 'lib/statsample/vector.rb', line 164 def dup Vector.new(@data.dup,@type, :missing_values => @missing_values.dup, :labels => @labels.dup, :name=>@name) end |
#dup_empty ⇒ Object
Returns an empty duplicate of the vector. Maintains the type, missing values and labels.
169 170 171 |
# File 'lib/statsample/vector.rb', line 169 def dup_empty Vector.new([],@type, :missing_values => @missing_values.dup, :labels => @labels.dup, :name=> @name) end |
#each ⇒ Object
Iterate on each item. Equivalent to
@data.each{|x| yield x}
300 301 302 |
# File 'lib/statsample/vector.rb', line 300 def each @data.each{|x| yield(x) } end |
#each_index ⇒ Object
Iterate on each item, retrieving index
305 306 307 308 309 |
# File 'lib/statsample/vector.rb', line 305 def each_index (0...@data.size).each {|i| yield(i) } end |
#factors ⇒ Object
Retrieves uniques values for data.
753 754 755 756 757 758 759 760 761 |
# File 'lib/statsample/vector.rb', line 753 def factors if @type==:numeric @numeric_data.uniq.sort elsif @type==:date @date_data_with_nils.uniq.sort else @valid_data.uniq.sort end end |
#frequencies ⇒ Object
:nodoc:
765 766 767 |
# File 'lib/statsample/vector.rb', line 765 def frequencies Statsample::STATSAMPLE__.frequencies(@valid_data) end |
#has_missing_data? ⇒ Boolean Also known as: flawed?
Retrieves true if data has one o more missing values
365 366 367 |
# File 'lib/statsample/vector.rb', line 365 def has_missing_data? @has_missing_data end |
#histogram(bins = 10) ⇒ Object
With a fixnum, creates X bins within the range of data With an Array, each value will be a cut point
1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 |
# File 'lib/statsample/vector.rb', line 1050 def histogram(bins=10) check_type :numeric if bins.is_a? Array #h=Statsample::Histogram.new(self, bins) h=Statsample::Histogram.alloc(bins) else # ugly patch. The upper limit for a bin has the form # x < range #h=Statsample::Histogram.new(self, bins) min,max=Statsample::Util.nice(@valid_data.min,@valid_data.max) # fix last data if max==@valid_data.max max+=1e-10 end h=Statsample::Histogram.alloc(bins,[min,max]) # Fix last bin end h.increment(@valid_data) h end |
#inspect ⇒ Object
749 750 751 |
# File 'lib/statsample/vector.rb', line 749 def inspect self.to_s end |
#is_valid?(x) ⇒ Boolean
Return true if a value is valid (not nil and not included on missing values)
403 404 405 |
# File 'lib/statsample/vector.rb', line 403 def is_valid?(x) !(x.nil? or @missing_values.include? x) end |
#jacknife(estimators, k = 1) ⇒ Object
Jacknife
Returns a dataset with jacknife delete-k estimators estimators could be: a) Hash with variable names as keys and lambdas as values
a.jacknife(:log_s2=>lambda {|v| Math.log(v.variance)})
b) Array with method names to jacknife
a.jacknife([:mean, :sd])
c) A single method to jacknife
a.jacknife(:mean)
k represent the block size for block jacknife. By default is set to 1, for classic delete-one jacknife.
Returns a dataset where each vector is an vector of length cases/k containing the computed jacknife estimates.
Reference:
-
Sawyer, S. (2005). Resampling Data: Using a Statistical Jacknife.
604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 |
# File 'lib/statsample/vector.rb', line 604 def jacknife(estimators, k=1) raise "n should be divisible by k:#{k}" unless n%k==0 nb=(n / k).to_i h_est, es, ps= prepare_bootstrap(estimators) est_n=es.inject({}) {|h,v| h[v]=h_est[v].call(self) h } nb.times do |i| other=@data_with_nils.dup other.slice!(i*k,k) other=other.to_numeric es.each do |estimator| # Add pseudovalue ps[estimator].push( nb * est_n[estimator] - (nb-1) * h_est[estimator].call(other)) end end es.each do |est| ps[est]=ps[est].to_numeric ps[est].type=:numeric end ps.to_dataset end |
#kurtosis(m = nil) ⇒ Object
Kurtosis of the sample
1034 1035 1036 1037 1038 1039 1040 |
# File 'lib/statsample/vector.rb', line 1034 def kurtosis(m=nil) check_type :numeric m||=mean fo=@numeric_data.inject(0){|a,x| a+((x-m)**4)} fo.quo((@numeric_data.size)*sd(m)**4)-3 end |
#labeling(x) ⇒ Object Also known as: label
Retrieves label for value x. Retrieves x if no label defined.
372 373 374 |
# File 'lib/statsample/vector.rb', line 372 def labeling(x) @labels.has_key?(x) ? @labels[x].to_s : x.to_s end |
#max ⇒ Object
Maximum value
920 921 922 923 |
# File 'lib/statsample/vector.rb', line 920 def max check_type :numeric @valid_data.max end |
#mean ⇒ Object
The arithmetical mean of data
966 967 968 969 |
# File 'lib/statsample/vector.rb', line 966 def mean check_type :numeric sum.to_f.quo(n_valid) end |
#median ⇒ Object
Return the median (percentil 50)
910 911 912 913 |
# File 'lib/statsample/vector.rb', line 910 def median check_type :numeric percentil(50) end |
#median_absolute_deviation ⇒ Object Also known as: mad
1008 1009 1010 1011 |
# File 'lib/statsample/vector.rb', line 1008 def median_absolute_deviation med=median recode {|x| (x-med).abs}.median end |
#min ⇒ Object
Minimun value
915 916 917 918 |
# File 'lib/statsample/vector.rb', line 915 def min check_type :numeric @valid_data.min end |
#mode ⇒ Object
Returns the most frequent item.
784 785 786 |
# File 'lib/statsample/vector.rb', line 784 def mode frequencies.max{|a,b| a[1]<=>b[1]}.first end |
#n_valid ⇒ Object
The numbers of item with valid data.
788 789 790 |
# File 'lib/statsample/vector.rb', line 788 def n_valid @valid_data.size end |
#percentil(q, strategy = :midpoint) ⇒ Object
Percentil
Returns the value of the percentile q
Accepts an optional second argument specifying the strategy to interpolate when the requested percentile lies between two data points a and b Valid strategies are:
-
:midpoint (Default): (a + b) / 2
-
:linear : a + (b - a) * d where d is the decimal part of the index between a and b.
This is the NIST recommended method (en.wikipedia.org/wiki/Percentile#NIST_method)
868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 |
# File 'lib/statsample/vector.rb', line 868 def percentil(q, strategy = :midpoint) check_type :numeric sorted=@valid_data.sort case strategy when :midpoint v = (n_valid * q).quo(100) if(v.to_i!=v) sorted[v.to_i] else (sorted[(v-0.5).to_i].to_f + sorted[(v+0.5).to_i]).quo(2) end when :linear index = (q / 100.0) * (n_valid + 1) k = index.truncate d = index % 1 if k == 0 sorted[0] elsif k >= sorted.size sorted[-1] else sorted[k - 1] + d * (sorted[k] - sorted[k - 1]) end else raise NotImplementedError.new "Unknown strategy #{strategy.to_s}" end end |
#product ⇒ Object
Product of all values on the sample
1043 1044 1045 1046 |
# File 'lib/statsample/vector.rb', line 1043 def product check_type :numeric @numeric_data.inject(1){|a,x| a*x } end |
#proportion(v = 1) ⇒ Object
Proportion of a given value.
800 801 802 |
# File 'lib/statsample/vector.rb', line 800 def proportion(v=1) frequencies[v].quo(@valid_data.size) end |
#proportion_confidence_interval_t(n_poblation, margin = 0.95, v = 1) ⇒ Object
840 841 842 |
# File 'lib/statsample/vector.rb', line 840 def proportion_confidence_interval_t(n_poblation,margin=0.95,v=1) Statsample::proportion_confidence_interval_t(proportion(v), @valid_data.size, n_poblation, margin) end |
#proportion_confidence_interval_z(n_poblation, margin = 0.95, v = 1) ⇒ Object
843 844 845 |
# File 'lib/statsample/vector.rb', line 843 def proportion_confidence_interval_z(n_poblation,margin=0.95,v=1) Statsample::proportion_confidence_interval_z(proportion(v), @valid_data.size, n_poblation, margin) end |
#proportions ⇒ Object
Returns a hash with the distribution of proportions of the sample.
793 794 795 796 797 798 |
# File 'lib/statsample/vector.rb', line 793 def proportions frequencies.inject({}){|a,v| a[v[0]] = v[1].quo(n_valid) a } end |
#push(v) ⇒ Object
276 277 278 279 |
# File 'lib/statsample/vector.rb', line 276 def push(v) @data.push(v) set_valid_data end |
#range ⇒ Object
The range of the data (max - min)
956 957 958 959 |
# File 'lib/statsample/vector.rb', line 956 def range; check_type :numeric @numeric_data.max - @numeric_data.min end |
#ranked(type = :numeric) ⇒ Object
Returns a ranked vector.
899 900 901 902 903 904 905 906 907 908 |
# File 'lib/statsample/vector.rb', line 899 def ranked(type=:numeric) check_type :numeric i=0 r=frequencies.sort.inject({}){|a,v| a[v[0]]=(i+1 + i+v[1]).quo(2) i+=v[1] a } @data.collect {|c| r[c] }.to_vector(type) end |
#recode(type = nil) ⇒ Object
Returns a new vector, with data modified by block. Equivalent to create a Vector after #collect on data
262 263 264 265 266 267 |
# File 'lib/statsample/vector.rb', line 262 def recode(type=nil) type||=@type @data.collect{|x| yield x }.to_vector(type) end |
#recode! ⇒ Object
Modifies current vector, with data modified by block. Equivalent to #collect! on @data
270 271 272 273 274 275 |
# File 'lib/statsample/vector.rb', line 270 def recode! @data.collect!{|x| yield x } set_valid_data end |
#report_building(b) ⇒ Object
803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 |
# File 'lib/statsample/vector.rb', line 803 def report_building(b) b.section(:name=>name) do |s| s.text _("n :%d") % n s.text _("n valid:%d") % n_valid if @type==:object s.text _("factors:%s") % factors.join(",") s.text _("mode: %s") % mode s.table(:name=>_("Distribution")) do |t| frequencies.sort.each do |k,v| key=labels.has_key?(k) ? labels[k]:k t.row [key, v , ("%0.2f%%" % (v.quo(n_valid)*100))] end end end s.text _("median: %s") % median.to_s if(@type==:numeric or @type==:numeric) if(@type==:numeric) s.text _("mean: %0.4f") % mean if sd s.text _("std.dev.: %0.4f") % sd s.text _("std.err.: %0.4f") % se s.text _("skew: %0.4f") % skew s.text _("kurtosis: %0.4f") % kurtosis end end end end |
#sample_with_replacement(sample = 1) ⇒ Object
Returns an random sample of size n, with replacement, only with valid data.
In all the trails, every item have the same probability of been selected.
666 667 668 669 |
# File 'lib/statsample/vector.rb', line 666 def sample_with_replacement(sample=1) vds=@valid_data.size (0...sample).collect{ @valid_data[rand(vds)] } end |
#sample_without_replacement(sample = 1) ⇒ Object
Returns an random sample of size n, without replacement, only with valid data.
Every element could only be selected once.
A sample of the same size of the vector is the vector itself.
677 678 679 680 681 682 683 684 685 686 |
# File 'lib/statsample/vector.rb', line 677 def sample_without_replacement(sample=1) raise ArgumentError, "Sample size couldn't be greater than n" if sample>@valid_data.size out=[] size=@valid_data.size while out.size<sample value=rand(size) out.push(value) if !out.include?value end out.collect{|i| @data[i]} end |
#set_valid_data ⇒ Object
Update valid_data, missing_data, data_with_nils and gsl at the end of an insertion.
Use after Vector.add(v,false) Usage:
v=Statsample::Vector.new
v.add(2,false)
v.add(4,false)
v.data
=> [2,3]
v.valid_data
=> []
v.set_valid_data
v.valid_data
=> [2,3]
333 334 335 336 337 338 339 340 341 |
# File 'lib/statsample/vector.rb', line 333 def set_valid_data @valid_data.clear @missing_data.clear @data_with_nils.clear @date_data_with_nils.clear set_valid_data_intern set_numeric_data if(@type==:numeric) set_date_data if(@type==:date) end |
#set_valid_data_intern ⇒ Object
:nodoc:
343 344 345 |
# File 'lib/statsample/vector.rb', line 343 def set_valid_data_intern #:nodoc: Statsample::STATSAMPLE__.set_valid_data_intern(self) end |
#size ⇒ Object Also known as: n
Size of total data
388 389 390 |
# File 'lib/statsample/vector.rb', line 388 def size @data.size end |
#skew(m = nil) ⇒ Object
Skewness of the sample
1027 1028 1029 1030 1031 1032 |
# File 'lib/statsample/vector.rb', line 1027 def skew(m=nil) check_type :numeric m||=mean th=@numeric_data.inject(0){|a,x| a+((x-m)**3)} th.quo((@numeric_data.size)*sd(m)**3) end |
#split_by_separator(sep = Statsample::SPLIT_TOKEN) ⇒ Object
Returns a hash of Vectors, defined by the different values defined on the fields Example:
a=Vector.new(["a,b","c,d","a,b"])
a.split_by_separator
=> {"a"=>#<Statsample::Type::object:0x7f2dbcc09d88
@data=[1, 0, 1]>,
"b"=>#<Statsample::Type::object:0x7f2dbcc09c48
@data=[1, 1, 0]>,
"c"=>#<Statsample::Type::object:0x7f2dbcc09b08
@data=[0, 1, 1]>}
520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 |
# File 'lib/statsample/vector.rb', line 520 def split_by_separator(sep=Statsample::SPLIT_TOKEN) split_data=splitted(sep) factors=split_data.flatten.uniq.compact out=factors.inject({}) {|a,x| a[x]=[] a } split_data.each do |r| if r.nil? factors.each do |f| out[f].push(nil) end else factors.each do |f| out[f].push(r.include?(f) ? 1:0) end end end out.inject({}){|s,v| s[v[0]]=Vector.new(v[1],:object) s } end |
#split_by_separator_freq(sep = Statsample::SPLIT_TOKEN) ⇒ Object
543 544 545 546 547 548 |
# File 'lib/statsample/vector.rb', line 543 def split_by_separator_freq(sep=Statsample::SPLIT_TOKEN) split_by_separator(sep).inject({}) {|a,v| a[v[0]]=v[1].inject {|s,x| s+x.to_i} a } end |
#splitted(sep = Statsample::SPLIT_TOKEN) ⇒ Object
Return an array with the data splitted by a separator.
a=Vector.new(["a,b","c,d","a,b","d"])
a.splitted
=>
[["a","b"],["c","d"],["a","b"],["d"]]
496 497 498 499 500 501 502 503 504 505 506 |
# File 'lib/statsample/vector.rb', line 496 def splitted(sep=Statsample::SPLIT_TOKEN) @data.collect{|x| if x.nil? nil elsif (x.respond_to? :split) x.split(sep) else [x] end } end |
#standard_deviation_population(m = nil) ⇒ Object Also known as: sdp
Population Standard deviation (denominator N)
995 996 997 998 |
# File 'lib/statsample/vector.rb', line 995 def standard_deviation_population(m=nil) check_type :numeric Math::sqrt( variance_population(m) ) end |
#standard_deviation_sample(m = nil) ⇒ Object Also known as: sds, sd
Sample Standard deviation (denominator n-1)
1021 1022 1023 1024 1025 |
# File 'lib/statsample/vector.rb', line 1021 def standard_deviation_sample(m=nil) check_type :numeric m||=mean Math::sqrt(variance_sample(m)) end |
#standard_error ⇒ Object Also known as: se
Standard error of the distribution mean Calculated using sd/sqrt(n)
1081 1082 1083 |
# File 'lib/statsample/vector.rb', line 1081 def standard_error standard_deviation_sample.quo(Math.sqrt(valid_data.size)) end |
#sum ⇒ Object
The sum of values for the data
961 962 963 964 |
# File 'lib/statsample/vector.rb', line 961 def sum check_type :numeric @numeric_data.inject(0){|a,x|x+a} ; end |
#sum_of_squared_deviation ⇒ Object
Sum of squared deviation
980 981 982 983 |
# File 'lib/statsample/vector.rb', line 980 def sum_of_squared_deviation check_type :numeric @numeric_data.inject(0) {|a,x| x.square+a} - (sum.square.quo(n_valid)) end |
#sum_of_squares(m = nil) ⇒ Object Also known as: ss
Sum of squares for the data around a value. By default, this value is the mean
ss= sum{(xi-m)^2}
974 975 976 977 978 |
# File 'lib/statsample/vector.rb', line 974 def sum_of_squares(m=nil) check_type :numeric m||=mean @numeric_data.inject(0){|a,x| a+(x-m).square} end |
#to_a ⇒ Object Also known as: to_ary
423 424 425 426 427 428 429 |
# File 'lib/statsample/vector.rb', line 423 def to_a if @data.is_a? Array @data.dup else @data.to_a end end |
#to_matrix(dir = :horizontal) ⇒ Object
Ugly name. Really, create a Vector for standard ‘matrix’ package. dir could be :horizontal or :vertical
741 742 743 744 745 746 747 748 |
# File 'lib/statsample/vector.rb', line 741 def to_matrix(dir=:horizontal) case dir when :horizontal Matrix[@data] when :vertical Matrix.columns([@data]) end end |
#to_REXP ⇒ Object
6 7 8 |
# File 'lib/statsample/rserve_extension.rb', line 6 def to_REXP Rserve::REXP::Wrapper.wrap(data_with_nils) end |
#to_s ⇒ Object
736 737 738 |
# File 'lib/statsample/vector.rb', line 736 def to_s sprintf("Vector(type:%s, n:%d)[%s]",@type.to_s,@data.size, @data.collect{|d| d.nil? ? "nil":d}.join(",")) end |
#variance_population(m = nil) ⇒ Object
Population variance (denominator N)
986 987 988 989 990 991 |
# File 'lib/statsample/vector.rb', line 986 def variance_population(m=nil) check_type :numeric m||=mean squares=@numeric_data.inject(0){|a,x| x.square+a} squares.quo(n_valid) - m.square end |
#variance_proportion(n_poblation, v = 1) ⇒ Object
Variance of p, according to poblation size
833 834 835 |
# File 'lib/statsample/vector.rb', line 833 def variance_proportion(n_poblation, v=1) Statsample::proportion_variance_sample(self.proportion(v), @valid_data.size, n_poblation) end |
#variance_sample(m = nil) ⇒ Object Also known as: variance
Sample Variance (denominator n-1)
1014 1015 1016 1017 1018 |
# File 'lib/statsample/vector.rb', line 1014 def variance_sample(m=nil) check_type :numeric m||=mean sum_of_squares(m).quo(n_valid - 1) end |
#variance_total(n_poblation, v = 1) ⇒ Object
Variance of p, according to poblation size
837 838 839 |
# File 'lib/statsample/vector.rb', line 837 def variance_total(n_poblation, v=1) Statsample::total_variance_sample(self.proportion(v), @valid_data.size, n_poblation) end |
#vector_centered ⇒ Object Also known as: centered
Return a centered vector
210 211 212 213 214 215 216 217 |
# File 'lib/statsample/vector.rb', line 210 def vector_centered check_type :numeric m=mean return ([nil]*size).to_numeric if mean.nil? vector=vector_centered_compute(m) vector.name=_("%s(centered)") % @name vector end |
#vector_centered_compute(m) ⇒ Object
:nodoc:
206 207 208 |
# File 'lib/statsample/vector.rb', line 206 def vector_centered_compute(m) #:nodoc: @data_with_nils.collect {|x| x.nil? ? nil : x.to_f-m }.to_numeric end |
#vector_labeled ⇒ Object
Returns a Vector with data with labels replaced by the label.
377 378 379 380 381 382 383 384 385 386 |
# File 'lib/statsample/vector.rb', line 377 def vector_labeled d=@data.collect{|x| if @labels.has_key? x @labels[x] else x end } Vector.new(d,@type) end |
#vector_percentil ⇒ Object
Return a vector with values replaced with the percentiles of each values
223 224 225 226 227 228 229 |
# File 'lib/statsample/vector.rb', line 223 def vector_percentil check_type :numeric c=@valid_data.size vector=ranked.map {|i| i.nil? ? nil : (i.quo(c)*100).to_f }.to_vector(@type) vector.name=_("%s(percentil)") % @name vector end |
#vector_standarized(use_population = false) ⇒ Object Also known as: standarized
Return a vector usign the standarized values for data with sd with denominator n-1. With variance=0 or mean nil, returns a vector of equal size full of nils
197 198 199 200 201 202 203 204 205 |
# File 'lib/statsample/vector.rb', line 197 def vector_standarized(use_population=false) check_type :numeric m=mean sd=use_population ? sdp : sds return ([nil]*size).to_numeric if mean.nil? or sd==0.0 vector=vector_standarized_compute(m,sd) vector.name=_("%s(standarized)") % @name vector end |
#vector_standarized_compute(m, sd) ⇒ Object
:nodoc:
190 191 192 |
# File 'lib/statsample/vector.rb', line 190 def vector_standarized_compute(m,sd) # :nodoc: @data_with_nils.collect{|x| x.nil? ? nil : (x.to_f - m).quo(sd) }.to_vector(:numeric) end |
#verify ⇒ Object
Reports all values that doesn’t comply with a condition. Returns a hash with the index of data and the invalid data.
456 457 458 459 460 461 462 463 464 |
# File 'lib/statsample/vector.rb', line 456 def verify h={} (0...@data.size).to_a.each{|i| if !(yield @data[i]) h[i]=@data[i] end } h end |