Class: Statsample::Test::UMannWhitney
- Inherits:
-
Object
- Object
- Statsample::Test::UMannWhitney
- Defined in:
- lib/statsample/test/umannwhitney.rb
Overview
U Mann-Whitney test
Non-parametric test for assessing whether two independent samples of observations come from the same distribution.
Assumptions
-
The two samples under investigation in the test are independent of each other and the observations within each sample are independent.
-
The observations are comparable (i.e., for any two observations, one can assess whether they are equal or, if not, which one is greater).
-
The variances in the two groups are approximately equal.
Higher differences of distributions correspond to to lower values of U.
Constant Summary collapse
- MAX_MN_EXACT =
Max for m*n allowed for exact calculation of probability
10000
Instance Attribute Summary collapse
-
#r1 ⇒ Object
readonly
Sample 1 Rank sum.
-
#r2 ⇒ Object
readonly
Sample 2 Rank sum.
-
#t ⇒ Object
readonly
Value of compensation for ties (useful for demostration).
-
#u ⇒ Object
readonly
U Value.
-
#u1 ⇒ Object
readonly
Sample 1 U (useful for demostration).
-
#u2 ⇒ Object
readonly
Sample 2 U (useful for demostration).
Class Method Summary collapse
-
.distribution_permutations(n1, n2) ⇒ Object
Generate distribution for permutations.
-
.u_sampling_distribution_as62(n1, n2) ⇒ Object
U sampling distribution, based on Dinneen & Blakesley (1973) algorithm.
Instance Method Summary collapse
-
#exact_probability ⇒ Object
Exact probability of finding values of U lower or equal to sample on U distribution.
-
#initialize(v1, v2) ⇒ UMannWhitney
constructor
Create a new U Mann-Whitney test Params: Two Statsample::Vectors.
-
#report_building(generator) ⇒ Object
:nodoc:.
-
#summary ⇒ Object
Report results.
-
#z ⇒ Object
Z value for U, with adjust for ties.
-
#z_probability ⇒ Object
Assuming H_0, the proportion of cdf with values of U lower than the sample.
Constructor Details
#initialize(v1, v2) ⇒ UMannWhitney
Create a new U Mann-Whitney test Params: Two Statsample::Vectors
114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
# File 'lib/statsample/test/umannwhitney.rb', line 114 def initialize(v1,v2) @n1=v1.valid_data.size @n2=v2.valid_data.size data=(v1.valid_data+v2.valid_data).to_scale groups=(([0]*@n1)+([1]*@n2)).to_vector ds={'g'=>groups, 'data'=>data}.to_dataset @t=nil @ties=data.data.size!=data.data.uniq.size if(@ties) adjust_for_ties(ds['data']) end ds['ranked']=ds['data'].ranked(:scale) @n=ds.cases @r1=ds.filter{|r| r['g']==0}['ranked'].sum @r2=((ds.cases*(ds.cases+1)).quo(2))-r1 @u1=r1-((@n1*(@n1+1)).quo(2)) @u2=r2-((@n2*(@n2+1)).quo(2)) @u=(u1<u2) ? u1 : u2 end |
Instance Attribute Details
#r1 ⇒ Object (readonly)
Sample 1 Rank sum
99 100 101 |
# File 'lib/statsample/test/umannwhitney.rb', line 99 def r1 @r1 end |
#r2 ⇒ Object (readonly)
Sample 2 Rank sum
101 102 103 |
# File 'lib/statsample/test/umannwhitney.rb', line 101 def r2 @r2 end |
#t ⇒ Object (readonly)
Value of compensation for ties (useful for demostration)
109 110 111 |
# File 'lib/statsample/test/umannwhitney.rb', line 109 def t @t end |
#u ⇒ Object (readonly)
U Value
107 108 109 |
# File 'lib/statsample/test/umannwhitney.rb', line 107 def u @u end |
#u1 ⇒ Object (readonly)
Sample 1 U (useful for demostration)
103 104 105 |
# File 'lib/statsample/test/umannwhitney.rb', line 103 def u1 @u1 end |
#u2 ⇒ Object (readonly)
Sample 2 U (useful for demostration)
105 106 107 |
# File 'lib/statsample/test/umannwhitney.rb', line 105 def u2 @u2 end |
Class Method Details
.distribution_permutations(n1, n2) ⇒ Object
Generate distribution for permutations. Very expensive, but useful for demostrations
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
# File 'lib/statsample/test/umannwhitney.rb', line 77 def self.distribution_permutations(n1,n2) base=[0]*n1+[1]*n2 po=Statsample::Permutation.new(base) upper=0 total=n1*n2 req={} po.each do |perm| r0,s0=0,0 perm.each_index {|c_i| if perm[c_i]==0 r0+=c_i+1 s0+=1 end } u1=r0-((s0*(s0+1)).quo(2)) u2=total-u1 temp_u= (u1 <= u2) ? u1 : u2 req[perm]=temp_u end req end |
.u_sampling_distribution_as62(n1, n2) ⇒ Object
U sampling distribution, based on Dinneen & Blakesley (1973) algorithm. This is the algorithm used on SPSS Parameters:
-
n1: group 1 size
-
n2: group 2 size
Reference:
-
Dinneen, L., & Blakesley, B. (1973). Algorithm AS 62: A Generator for the Sampling Distribution of the Mann- Whitney U Statistic. Journal of the Royal Statistical Society, 22(2), 269-273
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
# File 'lib/statsample/test/umannwhitney.rb', line 30 def self.u_sampling_distribution_as62(n1,n2) freq=[] work=[] mn1=n1*n2+1 max_u=n1*n2 minmn=n1<n2 ? n1 : n2 maxmn=n1>n2 ? n1 : n2 n1=maxmn+1 (1..n1).each{|i| freq[i]=1} n1+=1 (n1..mn1).each{|i| freq[i]=0} work[1]=0 xin=maxmn (2..minmn).each do |i| work[i]=0 xin=xin+maxmn n1=xin+2 l=1+xin.quo(2) k=i (1..l).each do |j| k=k+1 n1=n1-1 sum=freq[j]+work[j] freq[j]=sum work[k]=sum-freq[n1] freq[n1]=sum end end # Generate percentages for normal U dist=(1+max_u/2).to_i freq.shift total=freq.inject(0) {|a,v| a+v } (0...dist).collect {|i| if i!=max_u-i ues=freq[i]*2 else ues=freq[i] end ues.quo(total) } end |
Instance Method Details
#exact_probability ⇒ Object
Exact probability of finding values of U lower or equal to sample on U distribution. Use with caution with m*n>100000. Uses u_sampling_distribution_as62
155 156 157 158 159 160 161 162 |
# File 'lib/statsample/test/umannwhitney.rb', line 155 def exact_probability dist=UMannWhitney.u_sampling_distribution_as62(@n1,@n2) sum=0 (0..@u.to_i).each {|i| sum+=dist[i] } sum end |
#report_building(generator) ⇒ Object
:nodoc:
150 151 152 |
# File 'lib/statsample/test/umannwhitney.rb', line 150 def report_building(generator) # :nodoc: generator.text(summary) end |
#summary ⇒ Object
Report results.
137 138 139 140 141 142 143 144 145 146 147 148 149 |
# File 'lib/statsample/test/umannwhitney.rb', line 137 def summary out="Mann-Whitney U\nSum of ranks v1: \#{@r1.to_f}\nSum of ranks v1: \#{@r2.to_f}\nU Value: \#{@u.to_f}\nZ: \#{sprintf(\"%0.3f\",z)} (p: \#{sprintf(\"%0.3f\",z_probability)})\n HEREDOC\n if @n1*@n2<MAX_MN_EXACT\n out+=\"Exact p (Dinneen & Blakesley): \#{sprintf(\"%0.3f\",exact_probability)}\"\n end\n out\nend\n" |
#z ⇒ Object
Z value for U, with adjust for ties. For large samples, U is approximately normally distributed. In that case, you can use z to obtain probabily for U. Reference:
-
SPSS Manual
180 181 182 183 184 185 186 187 188 189 190 191 |
# File 'lib/statsample/test/umannwhitney.rb', line 180 def z mu=(@n1*@n2).quo(2) if(!@ties) ou=Math::sqrt(((@n1*@n2)*(@n1+@n2+1)).quo(12)) else n=@n1+@n2 first=(@n1*@n2).quo(n*(n-1)) second=((n**3-n).quo(12))-@t ou=Math::sqrt(first*second) end (@u-mu).quo(ou) end |
#z_probability ⇒ Object
Assuming H_0, the proportion of cdf with values of U lower than the sample. Use with more than 30 cases per group.
195 196 197 |
# File 'lib/statsample/test/umannwhitney.rb', line 195 def z_probability (1-Distribution::Normal.cdf(z.abs()))*2 end |