Class: Statsample::Test::UMannWhitney
- Includes:
- Summarizable
- Defined in:
- lib/statsample/test/umannwhitney.rb
Overview
U Mann-Whitney test
Non-parametric test for assessing whether two independent samples of observations come from the same distribution.
Assumptions
-
The two samples under investigation in the test are independent of each other and the observations within each sample are independent.
-
The observations are comparable (i.e., for any two observations, one can assess whether they are equal or, if not, which one is greater).
-
The variances in the two groups are approximately equal.
Higher differences of distributions correspond to to lower values of U.
Constant Summary collapse
- MAX_MN_EXACT =
Max for m*n allowed for exact calculation of probability
10000
Instance Attribute Summary collapse
-
#name ⇒ Object
Name of test.
-
#r1 ⇒ Object
readonly
Sample 1 Rank sum.
-
#r2 ⇒ Object
readonly
Sample 2 Rank sum.
-
#t ⇒ Object
readonly
Value of compensation for ties (useful for demostration).
-
#u ⇒ Object
readonly
U Value.
-
#u1 ⇒ Object
readonly
Sample 1 U (useful for demostration).
-
#u2 ⇒ Object
readonly
Sample 2 U (useful for demostration).
Class Method Summary collapse
-
.distribution_permutations(n1, n2) ⇒ Object
Generate distribution for permutations.
-
.u_sampling_distribution_as62(n1, n2) ⇒ Object
U sampling distribution, based on Dinneen & Blakesley (1973) algorithm.
Instance Method Summary collapse
-
#initialize(v1, v2, opts = Hash.new) ⇒ UMannWhitney
constructor
Create a new U Mann-Whitney test Params: Two Daru::Vectors.
-
#probability_exact ⇒ Object
Exact probability of finding values of U lower or equal to sample on U distribution.
-
#probability_z ⇒ Object
Assuming H_0, the proportion of cdf with values of U lower than the sample, using normal approximation.
-
#report_building(generator) ⇒ Object
:nodoc:.
-
#z ⇒ Object
Z value for U, with adjust for ties.
Methods included from Summarizable
Constructor Details
#initialize(v1, v2, opts = Hash.new) ⇒ UMannWhitney
Create a new U Mann-Whitney test Params: Two Daru::Vectors
118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
# File 'lib/statsample/test/umannwhitney.rb', line 118 def initialize(v1,v2, opts=Hash.new) @v1 = v1 @v2 = v2 v1_valid = v1.only_valid.reset_index! v2_valid = v2.only_valid.reset_index! @n1 = v1_valid.size @n2 = v2_valid.size data = Daru::Vector.new(v1_valid.to_a + v2_valid.to_a) groups = Daru::Vector.new(([0] * @n1) + ([1] * @n2)) ds = Daru::DataFrame.new({:g => groups, :data => data}) @t = nil @ties = data.to_a.size != data.to_a.uniq.size if @ties adjust_for_ties(ds[:data]) end ds[:ranked] = ds[:data].ranked @n = ds.nrows @r1 = ds.filter_rows { |r| r[:g] == 0}[:ranked].sum @r2 = ((ds.nrows * (ds.nrows + 1)).quo(2)) - r1 @u1 = r1 - ((@n1 * (@n1 + 1)).quo(2)) @u2 = r2 - ((@n2 * (@n2 + 1)).quo(2)) @u = (u1 < u2) ? u1 : u2 opts_default = { :name=>_("Mann-Whitney's U") } @opts = opts_default.merge(opts) opts_default.keys.each {|k| send("#{k}=", @opts[k]) } end |
Instance Attribute Details
#name ⇒ Object
Name of test
112 113 114 |
# File 'lib/statsample/test/umannwhitney.rb', line 112 def name @name end |
#r1 ⇒ Object (readonly)
Sample 1 Rank sum
100 101 102 |
# File 'lib/statsample/test/umannwhitney.rb', line 100 def r1 @r1 end |
#r2 ⇒ Object (readonly)
Sample 2 Rank sum
102 103 104 |
# File 'lib/statsample/test/umannwhitney.rb', line 102 def r2 @r2 end |
#t ⇒ Object (readonly)
Value of compensation for ties (useful for demostration)
110 111 112 |
# File 'lib/statsample/test/umannwhitney.rb', line 110 def t @t end |
#u ⇒ Object (readonly)
U Value
108 109 110 |
# File 'lib/statsample/test/umannwhitney.rb', line 108 def u @u end |
#u1 ⇒ Object (readonly)
Sample 1 U (useful for demostration)
104 105 106 |
# File 'lib/statsample/test/umannwhitney.rb', line 104 def u1 @u1 end |
#u2 ⇒ Object (readonly)
Sample 2 U (useful for demostration)
106 107 108 |
# File 'lib/statsample/test/umannwhitney.rb', line 106 def u2 @u2 end |
Class Method Details
.distribution_permutations(n1, n2) ⇒ Object
Generate distribution for permutations. Very expensive, but useful for demostrations
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
# File 'lib/statsample/test/umannwhitney.rb', line 78 def self.distribution_permutations(n1,n2) base=[0]*n1+[1]*n2 po=Statsample::Permutation.new(base) total=n1*n2 req={} po.each do |perm| r0,s0=0,0 perm.each_index {|c_i| if perm[c_i]==0 r0+=c_i+1 s0+=1 end } u1=r0-((s0*(s0+1)).quo(2)) u2=total-u1 temp_u= (u1 <= u2) ? u1 : u2 req[perm]=temp_u end req end |
.u_sampling_distribution_as62(n1, n2) ⇒ Object
U sampling distribution, based on Dinneen & Blakesley (1973) algorithm. This is the algorithm used on SPSS.
Parameters:
-
n1
: group 1 size -
n2
: group 2 size
Reference:
-
Dinneen, L., & Blakesley, B. (1973). Algorithm AS 62: A Generator for the Sampling Distribution of the Mann- Whitney U Statistic. Journal of the Royal Statistical Society, 22(2), 269-273
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
# File 'lib/statsample/test/umannwhitney.rb', line 31 def self.u_sampling_distribution_as62(n1,n2) freq=[] work=[] mn1=n1*n2+1 max_u=n1*n2 minmn=n1<n2 ? n1 : n2 maxmn=n1>n2 ? n1 : n2 n1=maxmn+1 (1..n1).each{|i| freq[i]=1} n1+=1 (n1..mn1).each{|i| freq[i]=0} work[1]=0 xin=maxmn (2..minmn).each do |i| work[i]=0 xin=xin+maxmn n1=xin+2 l=1+xin.quo(2) k=i (1..l).each do |j| k=k+1 n1=n1-1 sum=freq[j]+work[j] freq[j]=sum work[k]=sum-freq[n1] freq[n1]=sum end end # Generate percentages for normal U dist=(1+max_u/2).to_i freq.shift total=freq.inject(0) {|a,v| a+v } (0...dist).collect {|i| if i!=max_u-i ues=freq[i]*2 else ues=freq[i] end ues.quo(total) } end |
Instance Method Details
#probability_exact ⇒ Object
Exact probability of finding values of U lower or equal to sample on U distribution. Use with caution with m*n>100000. Uses u_sampling_distribution_as62
162 163 164 165 166 167 168 169 |
# File 'lib/statsample/test/umannwhitney.rb', line 162 def probability_exact dist = UMannWhitney.u_sampling_distribution_as62(@n1,@n2) sum = 0 (0..@u.to_i).each {|i| sum+=dist[i] } sum end |
#probability_z ⇒ Object
Assuming H_0, the proportion of cdf with values of U lower than the sample, using normal approximation. Use with more than 30 cases per group.
202 203 204 |
# File 'lib/statsample/test/umannwhitney.rb', line 202 def probability_z (1-Distribution::Normal.cdf(z.abs()))*2 end |
#report_building(generator) ⇒ Object
:nodoc:
147 148 149 150 151 152 153 154 155 156 157 158 159 |
# File 'lib/statsample/test/umannwhitney.rb', line 147 def report_building(generator) # :nodoc: generator.section(:name=>@name) do |s| s.table(:name=>_("%s results") % @name) do |t| t.row([_("Sum of ranks %s") % @v1.name, "%0.3f" % @r1]) t.row([_("Sum of ranks %s") % @v2.name, "%0.3f" % @r2]) t.row([_("U Value"), "%0.3f" % @u]) t.row([_("Z"), "%0.3f (p: %0.3f)" % [z, probability_z]]) if @n1*@n2<MAX_MN_EXACT t.row([_("Exact p (Dinneen & Blakesley, 1973):"), "%0.3f" % probability_exact]) end end end end |
#z ⇒ Object
Z value for U, with adjust for ties. For large samples, U is approximately normally distributed. In that case, you can use z to obtain probabily for U.
Reference:
-
SPSS Manual
187 188 189 190 191 192 193 194 195 196 197 198 |
# File 'lib/statsample/test/umannwhitney.rb', line 187 def z mu=(@n1*@n2).quo(2) if(!@ties) ou=Math::sqrt(((@n1*@n2)*(@n1+@n2+1)).quo(12)) else n=@n1+@n2 first=(@n1*@n2).quo(n*(n-1)) second=((n**3-n).quo(12))-@t ou=Math::sqrt(first*second) end (@u-mu).quo(ou) end |