Module: Statsample::Bivariate
- Defined in:
- lib/statsample/bivariate.rb,
lib/statsample/bivariate/polychoric.rb,
lib/statsample/bivariate/tetrachoric.rb
Overview
Diverse bivariate methods, including #covariance, #pearson correlation ®, #spearman ranked correlation (rho), #tetrachoric correlation and #polychoric correlation.
Defined Under Namespace
Classes: Polychoric, Tetrachoric
Class Method Summary collapse
-
.correlation_matrix(ds) ⇒ Object
Correlation matrix.
-
.correlation_probability_matrix(ds, tails = :both) ⇒ Object
Matrix of correlation probabilities.
-
.covariance(v1, v2) ⇒ Object
Covariance between two vectors.
-
.covariance_matrix(ds) ⇒ Object
Covariance matrix.
-
.covariance_slow(v1, v2) ⇒ Object
:nodoc:.
-
.gamma(matrix) ⇒ Object
Calculates Goodman and Kruskal’s gamma.
-
.maximum_likehood_dichotomic(pred, real) ⇒ Object
Estimate the ML between two dichotomic vectors.
-
.min_n_valid(ds) ⇒ Object
Report the minimum number of cases valid of a covariate matrix based on a dataset.
-
.n_valid_matrix(ds) ⇒ Object
Retrieves the n valid pairwise.
- .ordered_pairs(vector) ⇒ Object
-
.pairs(matrix) ⇒ Object
Calculate indexes for a matrix the rows and cols has to be ordered.
-
.partial_correlation(v1, v2, control) ⇒ Object
Correlation between v1 and v2, controling the effect of control on both.
-
.pearson(v1, v2) ⇒ Object
Calculate Pearson correlation coefficient ® between 2 vectors.
-
.pearson_slow(v1, v2) ⇒ Object
:nodoc:.
-
.point_biserial(dichotomous, continous) ⇒ Object
Calculate Point biserial correlation.
-
.polychoric(v1, v2) ⇒ Object
Calculate Polychoric correlation for two vectors.
-
.polychoric_correlation_matrix(ds) ⇒ Object
Polychoric correlation matrix.
-
.prop_pearson(t, size, tails = :both) ⇒ Object
Retrieves the probability value (a la SPSS) for a given t, size and number of tails.
-
.residuals(from, del) ⇒ Object
Returns residual score after delete variance from another variable.
-
.spearman(v1, v2) ⇒ Object
Spearman ranked correlation coefficient (rho) between 2 vectors.
- .sum_of_squares(v1, v2) ⇒ Object
-
.t_pearson(v1, v2) ⇒ Object
Retrieves the value for t test for a pearson correlation between two vectors to test the null hipothesis of r=0.
-
.t_r(r, size) ⇒ Object
Retrieves the value for t test for a pearson correlation giving r and vector size Source : faculty.chass.ncsu.edu/garson/PA765/correl.htm.
-
.tau_a(v1, v2) ⇒ Object
Kendall Rank Correlation Coefficient.
-
.tau_b(matrix) ⇒ Object
Calculates Tau b correlation.
-
.tetrachoric(v1, v2) ⇒ Object
Calculate Tetrachoric correlation for two vectors.
-
.tetrachoric_correlation_matrix(ds) ⇒ Object
Tetrachoric correlation matrix.
Class Method Details
.correlation_matrix(ds) ⇒ Object
Correlation matrix. Order of rows and columns depends on Dataset#fields order
149 150 151 152 153 154 155 156 157 158 159 160 161 162 |
# File 'lib/statsample/bivariate.rb', line 149 def correlation_matrix(ds) cm=ds.collect_matrix do |row,col| if row==col 1.0 elsif (ds[row].type!=:scale or ds[col].type!=:scale) nil else pearson(ds[row],ds[col]) end end cm.extend(Statsample::CovariateMatrix) cm.fields=ds.fields cm end |
.correlation_probability_matrix(ds, tails = :both) ⇒ Object
Matrix of correlation probabilities. Order of rows and columns depends on Dataset#fields order
179 180 181 182 183 184 185 186 187 |
# File 'lib/statsample/bivariate.rb', line 179 def correlation_probability_matrix(ds, tails=:both) rows=ds.fields.collect do |row| ds.fields.collect do |col| v1a,v2a=Statsample.only_valid(ds[row],ds[col]) (row==col or ds[row].type!=:scale or ds[col].type!=:scale) ? nil : prop_pearson(t_pearson(ds[row],ds[col]), v1a.size, tails) end end Matrix.rows(rows) end |
.covariance(v1, v2) ⇒ Object
Covariance between two vectors
8 9 10 11 12 13 14 15 16 |
# File 'lib/statsample/bivariate.rb', line 8 def covariance(v1,v2) v1a,v2a=Statsample.only_valid(v1,v2) return nil if v1a.size==0 if Statsample.has_gsl? GSL::Stats::covariance(v1a.gsl, v2a.gsl) else covariance_slow(v1a,v2a) end end |
.covariance_matrix(ds) ⇒ Object
Covariance matrix. Order of rows and columns depends on Dataset#fields order
131 132 133 134 135 136 137 138 139 140 141 142 143 144 |
# File 'lib/statsample/bivariate.rb', line 131 def covariance_matrix(ds) matrix=ds.collect_matrix do |row,col| if (ds[row].type!=:scale or ds[col].type!=:scale) nil elsif row==col ds[row].variance else covariance(ds[row], ds[col]) end end matrix.extend CovariateMatrix matrix.fields=ds.fields matrix end |
.covariance_slow(v1, v2) ⇒ Object
:nodoc:
27 28 29 30 |
# File 'lib/statsample/bivariate.rb', line 27 def covariance_slow(v1,v2) # :nodoc: v1a,v2a=Statsample.only_valid(v1,v2) sum_of_squares(v1a,v2a) / (v1a.size-1) end |
.gamma(matrix) ⇒ Object
Calculates Goodman and Kruskal’s gamma.
Gamma is the surplus of concordant pairs over discordant pairs, as a percentage of all pairs ignoring ties.
Source: faculty.chass.ncsu.edu/garson/PA765/assocordinal.htm
234 235 236 237 |
# File 'lib/statsample/bivariate.rb', line 234 def gamma(matrix) v=pairs(matrix) (v['P']-v['Q']).to_f / (v['P']+v['Q']).to_f end |
.maximum_likehood_dichotomic(pred, real) ⇒ Object
Estimate the ML between two dichotomic vectors
18 19 20 21 22 23 24 25 |
# File 'lib/statsample/bivariate.rb', line 18 def maximum_likehood_dichotomic(pred,real) preda,reala=Statsample.only_valid(pred,real) sum=0 pred.each_index{|i| sum+=(real[i]*Math::log(pred[i])) + ((1-real[i])*Math::log(1-pred[i])) } sum end |
.min_n_valid(ds) ⇒ Object
Report the minimum number of cases valid of a covariate matrix based on a dataset
302 303 304 305 306 307 308 309 310 311 |
# File 'lib/statsample/bivariate.rb', line 302 def min_n_valid(ds) min=ds.cases m=n_valid_matrix(ds) for x in 0...m.row_size for y in 0...m.column_size min=m[x,y] if m[x,y] < min end end min end |
.n_valid_matrix(ds) ⇒ Object
Retrieves the n valid pairwise.
165 166 167 168 169 170 171 172 173 174 |
# File 'lib/statsample/bivariate.rb', line 165 def n_valid_matrix(ds) ds.collect_matrix do |row,col| if row==col ds[row].valid_data.size else rowa,rowb=Statsample.only_valid(ds[row],ds[col]) rowa.size end end end |
.ordered_pairs(vector) ⇒ Object
280 281 282 283 284 285 286 287 288 289 |
# File 'lib/statsample/bivariate.rb', line 280 def ordered_pairs(vector) d=vector.data a=[] (0...(d.size-1)).each{|i| ((i+1)...(d.size)).each {|j| a.push([d[i],d[j]]) } } a end |
.pairs(matrix) ⇒ Object
Calculate indexes for a matrix the rows and cols has to be ordered
239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 |
# File 'lib/statsample/bivariate.rb', line 239 def pairs(matrix) # calculate concordant #p matrix rs=matrix.row_size cs=matrix.column_size conc=disc=ties_x=ties_y=0 (0...(rs-1)).each do |x| (0...(cs-1)).each do |y| ((x+1)...rs).each do |x2| ((y+1)...cs).each do |y2| # #p sprintf("%d:%d,%d:%d",x,y,x2,y2) conc+=matrix[x,y]*matrix[x2,y2] end end end end (0...(rs-1)).each {|x| (1...(cs)).each{|y| ((x+1)...rs).each{|x2| (0...y).each{|y2| # #p sprintf("%d:%d,%d:%d",x,y,x2,y2) disc+=matrix[x,y]*matrix[x2,y2] } } } } (0...(rs-1)).each {|x| (0...(cs)).each{|y| ((x+1)...(rs)).each{|x2| ties_x+=matrix[x,y]*matrix[x2,y] } } } (0...rs).each {|x| (0...(cs-1)).each{|y| ((y+1)...(cs)).each{|y2| ties_y+=matrix[x,y]*matrix[x,y2] } } } {'P'=>conc,'Q'=>disc,'Y'=>ties_y,'X'=>ties_x} end |
.partial_correlation(v1, v2, control) ⇒ Object
Correlation between v1 and v2, controling the effect of control on both.
119 120 121 122 123 124 125 126 |
# File 'lib/statsample/bivariate.rb', line 119 def partial_correlation(v1,v2,control) v1a,v2a,cona=Statsample.only_valid(v1,v2,control) rv1v2=pearson(v1a,v2a) rv1con=pearson(v1a,cona) rv2con=pearson(v2a,cona) (rv1v2-(rv1con*rv2con)).quo(Math::sqrt(1-rv1con**2) * Math::sqrt(1-rv2con**2)) end |
.pearson(v1, v2) ⇒ Object
Calculate Pearson correlation coefficient ® between 2 vectors
38 39 40 41 42 43 44 45 46 |
# File 'lib/statsample/bivariate.rb', line 38 def pearson(v1,v2) v1a,v2a=Statsample.only_valid(v1,v2) return nil if v1a.size ==0 if Statsample.has_gsl? GSL::Stats::correlation(v1a.gsl, v2a.gsl) else pearson_slow(v1a,v2a) end end |
.pearson_slow(v1, v2) ⇒ Object
:nodoc:
47 48 49 50 51 52 53 54 55 56 57 58 59 |
# File 'lib/statsample/bivariate.rb', line 47 def pearson_slow(v1,v2) # :nodoc: v1a,v2a=Statsample.only_valid(v1,v2) # Calculate sum of squares ss=sum_of_squares(v1a,v2a) ss.quo(Math::sqrt(v1a.sum_of_squares) * Math::sqrt(v2a.sum_of_squares)) =begin v1s,v2s=v1a.vector_standarized,v2a.vector_standarized t=0 siz=v1s.size (0...v1s.size).each {|i| t+=(v1s[i]*v2s[i]) } t.quo(v2s.size-1) =end end |
.point_biserial(dichotomous, continous) ⇒ Object
Calculate Point biserial correlation. Equal to Pearson correlation, with one dichotomous value replaced by “0” and the other by “1”
197 198 199 200 201 202 203 204 205 |
# File 'lib/statsample/bivariate.rb', line 197 def point_biserial(dichotomous,continous) ds={'d'=>dichotomous,'c'=>continous}.to_dataset.dup_only_valid raise(TypeError, "First vector should be dichotomous") if ds['d'].factors.size!=2 raise(TypeError, "Second vector should be continous") if ds['c'].type!=:scale f0=ds['d'].factors.sort[0] m0=ds.filter_field('c') {|c| c['d']==f0} m1=ds.filter_field('c') {|c| c['d']!=f0} ((m1.mean-m0.mean).to_f / ds['c'].sdp) * Math::sqrt(m0.size*m1.size.to_f / ds.cases**2) end |
.polychoric(v1, v2) ⇒ Object
Calculate Polychoric correlation for two vectors.
5 6 7 8 |
# File 'lib/statsample/bivariate/polychoric.rb', line 5 def self.polychoric(v1,v2) pc=Polychoric.new_with_vectors(v1,v2) pc.r end |
.polychoric_correlation_matrix(ds) ⇒ Object
Polychoric correlation matrix. Order of rows and columns depends on Dataset#fields order
12 13 14 15 16 17 18 19 20 21 22 23 24 |
# File 'lib/statsample/bivariate/polychoric.rb', line 12 def self.polychoric_correlation_matrix(ds) ds.collect_matrix do |row,col| if row==col 1.0 else begin polychoric(ds[row],ds[col]) rescue RuntimeError nil end end end end |
.prop_pearson(t, size, tails = :both) ⇒ Object
Retrieves the probability value (a la SPSS) for a given t, size and number of tails. Uses a second parameter
-
:both or 2 : for r!=0 (default)
-
:right, :positive or 1 : for r > 0
-
:left, :negative : for r < 0
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
# File 'lib/statsample/bivariate.rb', line 84 def prop_pearson(t, size, tails=:both) tails=:both if tails==2 tails=:right if tails==1 or tails==:positive tails=:left if tails==:negative n_tails=case tails when :both then 2 else 1 end t=-t if t>0 and (tails==:both) cdf=Distribution::T.cdf(t, size-2) if(tails==:right) 1.0-(cdf*n_tails) else cdf*n_tails end end |
.residuals(from, del) ⇒ Object
Returns residual score after delete variance from another variable
104 105 106 107 108 109 110 111 112 113 114 115 116 |
# File 'lib/statsample/bivariate.rb', line 104 def residuals(from,del) r=Statsample::Bivariate.pearson(from,del) froms, dels = from.vector_standarized, del.vector_standarized nv=[] froms.data_with_nils.each_index do |i| if froms[i].nil? or dels[i].nil? nv.push(nil) else nv.push(froms[i]-r*dels[i]) end end nv.to_vector(:scale) end |
.spearman(v1, v2) ⇒ Object
Spearman ranked correlation coefficient (rho) between 2 vectors
190 191 192 193 194 |
# File 'lib/statsample/bivariate.rb', line 190 def spearman(v1,v2) v1a,v2a=Statsample.only_valid(v1,v2) v1r,v2r=v1a.ranked(:scale),v2a.ranked(:scale) pearson(v1r,v2r) end |
.sum_of_squares(v1, v2) ⇒ Object
31 32 33 34 35 36 |
# File 'lib/statsample/bivariate.rb', line 31 def sum_of_squares(v1,v2) v1a,v2a=Statsample.only_valid(v1,v2) m1=v1a.mean m2=v2a.mean (v1a.size).times.inject(0) {|ac,i| ac+(v1a[i]-m1)*(v2a[i]-m2)} end |
.t_pearson(v1, v2) ⇒ Object
Retrieves the value for t test for a pearson correlation between two vectors to test the null hipothesis of r=0
62 63 64 65 66 67 68 69 70 |
# File 'lib/statsample/bivariate.rb', line 62 def t_pearson(v1,v2) v1a,v2a=Statsample.only_valid(v1,v2) r=pearson(v1a,v2a) if(r==1.0) 0 else t_r(r,v1a.size) end end |
.t_r(r, size) ⇒ Object
Retrieves the value for t test for a pearson correlation giving r and vector size Source : faculty.chass.ncsu.edu/garson/PA765/correl.htm
74 75 76 |
# File 'lib/statsample/bivariate.rb', line 74 def t_r(r,size) r * Math::sqrt(((size)-2).to_f / (1 - r**2)) end |
.tau_a(v1, v2) ⇒ Object
Kendall Rank Correlation Coefficient. Based on Hervé Adbi article
208 209 210 211 212 213 214 215 216 |
# File 'lib/statsample/bivariate.rb', line 208 def tau_a(v1,v2) v1a,v2a=Statsample.only_valid(v1,v2) n=v1.size v1r,v2r=v1a.ranked(:scale),v2a.ranked(:scale) o1=ordered_pairs(v1r) o2=ordered_pairs(v2r) delta= o1.size*2-(o2 & o1).size*2 1-(delta * 2 / (n*(n-1)).to_f) end |
.tau_b(matrix) ⇒ Object
Calculates Tau b correlation.
Tau-b defines perfect association as strict monotonicity. Although it requires strict monotonicity to reach 1.0, it does not penalize ties as much as some other measures.
Source: faculty.chass.ncsu.edu/garson/PA765/assocordinal.htm
224 225 226 227 |
# File 'lib/statsample/bivariate.rb', line 224 def tau_b(matrix) v=pairs(matrix) ((v['P']-v['Q']).to_f / Math::sqrt((v['P']+v['Q']+v['Y'])*(v['P']+v['Q']+v['X'])).to_f) end |
.tetrachoric(v1, v2) ⇒ Object
Calculate Tetrachoric correlation for two vectors.
4 5 6 7 |
# File 'lib/statsample/bivariate/tetrachoric.rb', line 4 def self.tetrachoric(v1,v2) tc=Tetrachoric.new_with_vectors(v1,v2) tc.r end |
.tetrachoric_correlation_matrix(ds) ⇒ Object
Tetrachoric correlation matrix. Order of rows and columns depends on Dataset#fields order
11 12 13 14 15 16 17 18 19 20 21 22 23 |
# File 'lib/statsample/bivariate/tetrachoric.rb', line 11 def self.tetrachoric_correlation_matrix(ds) ds.collect_matrix do |row,col| if row==col 1.0 else begin tetrachoric(ds[row],ds[col]) rescue RuntimeError nil end end end end |