Class: Statsample::Test::UMannWhitney

Inherits:
Object
• Object
show all
Includes:
Summarizable
Defined in:
lib/statsample/test/umannwhitney.rb

U Mann-Whitney test

Non-parametric test for assessing whether two independent samples of observations come from the same distribution.

Assumptions

• The two samples under investigation in the test are independent of each other and the observations within each sample are independent.

• The observations are comparable (i.e., for any two observations, one can assess whether they are equal or, if not, which one is greater).

• The variances in the two groups are approximately equal.

Higher differences of distributions correspond to to lower values of U.

Constant Summary collapse

MAX_MN_EXACT =

Max for m*n allowed for exact calculation of probability

`10000`

Instance Attribute Summary collapse

• Name of test.

Sample 1 Rank sum.

Sample 2 Rank sum.

Value of compensation for ties (useful for demostration).

U Value.

Sample 1 U (useful for demostration).

Sample 2 U (useful for demostration).

Class Method Summary collapse

• Generate distribution for permutations.

• U sampling distribution, based on Dinneen & Blakesley (1973) algorithm.

Instance Method Summary collapse

• constructor

Create a new U Mann-Whitney test Params: Two Daru::Vectors.

• Exact probability of finding values of U lower or equal to sample on U distribution.

• Assuming H_0, the proportion of cdf with values of U lower than the sample, using normal approximation.

• :nodoc:.

• Z value for U, with adjust for ties.

#summary

Constructor Details

#initialize(v1, v2, opts = Hash.new) ⇒ UMannWhitney

Create a new U Mann-Whitney test Params: Two Daru::Vectors

 ``` 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146``` ```# File 'lib/statsample/test/umannwhitney.rb', line 118 def initialize(v1,v2, opts=Hash.new) @v1 = v1 @v2 = v2 v1_valid = v1.only_valid.reset_index! v2_valid = v2.only_valid.reset_index! @n1 = v1_valid.size @n2 = v2_valid.size data = Daru::Vector.new(v1_valid.to_a + v2_valid.to_a) groups = Daru::Vector.new(([0] * @n1) + ([1] * @n2)) ds = Daru::DataFrame.new({:g => groups, :data => data}) @t = nil @ties = data.to_a.size != data.to_a.uniq.size if @ties adjust_for_ties(ds[:data]) end ds[:ranked] = ds[:data].ranked @n = ds.nrows @r1 = ds.filter_rows { |r| r[:g] == 0}[:ranked].sum @r2 = ((ds.nrows * (ds.nrows + 1)).quo(2)) - r1 @u1 = r1 - ((@n1 * (@n1 + 1)).quo(2)) @u2 = r2 - ((@n2 * (@n2 + 1)).quo(2)) @u = (u1 < u2) ? u1 : u2 opts_default = { :name=>_("Mann-Whitney's U") } @opts = opts_default.merge(opts) opts_default.keys.each {|k| send("#{k}=", @opts[k]) } end```

Instance Attribute Details

#name ⇒ Object

Name of test

 ``` 112 113 114``` ```# File 'lib/statsample/test/umannwhitney.rb', line 112 def name @name end```

Sample 1 Rank sum

 ``` 100 101 102``` ```# File 'lib/statsample/test/umannwhitney.rb', line 100 def r1 @r1 end```

Sample 2 Rank sum

 ``` 102 103 104``` ```# File 'lib/statsample/test/umannwhitney.rb', line 102 def r2 @r2 end```

Value of compensation for ties (useful for demostration)

 ``` 110 111 112``` ```# File 'lib/statsample/test/umannwhitney.rb', line 110 def t @t end```

U Value

 ``` 108 109 110``` ```# File 'lib/statsample/test/umannwhitney.rb', line 108 def u @u end```

Sample 1 U (useful for demostration)

 ``` 104 105 106``` ```# File 'lib/statsample/test/umannwhitney.rb', line 104 def u1 @u1 end```

Sample 2 U (useful for demostration)

 ``` 106 107 108``` ```# File 'lib/statsample/test/umannwhitney.rb', line 106 def u2 @u2 end```

Class Method Details

.distribution_permutations(n1, n2) ⇒ Object

Generate distribution for permutations. Very expensive, but useful for demostrations

 ``` 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98``` ```# File 'lib/statsample/test/umannwhitney.rb', line 78 def self.distribution_permutations(n1,n2) base=[0]*n1+[1]*n2 po=Statsample::Permutation.new(base) total=n1*n2 req={} po.each do |perm| r0,s0=0,0 perm.each_index {|c_i| if perm[c_i]==0 r0+=c_i+1 s0+=1 end } u1=r0-((s0*(s0+1)).quo(2)) u2=total-u1 temp_u= (u1 <= u2) ? u1 : u2 req[perm]=temp_u end req end```

.u_sampling_distribution_as62(n1, n2) ⇒ Object

U sampling distribution, based on Dinneen & Blakesley (1973) algorithm. This is the algorithm used on SPSS.

Parameters:

• `n1`: group 1 size

• `n2`: group 2 size

Reference:

• Dinneen, L., & Blakesley, B. (1973). Algorithm AS 62: A Generator for the Sampling Distribution of the Mann- Whitney U Statistic. Journal of the Royal Statistical Society, 22(2), 269-273

 ``` 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73``` ```# File 'lib/statsample/test/umannwhitney.rb', line 31 def self.u_sampling_distribution_as62(n1,n2) freq=[] work=[] mn1=n1*n2+1 max_u=n1*n2 minmn=n1n2 ? n1 : n2 n1=maxmn+1 (1..n1).each{|i| freq[i]=1} n1+=1 (n1..mn1).each{|i| freq[i]=0} work[1]=0 xin=maxmn (2..minmn).each do |i| work[i]=0 xin=xin+maxmn n1=xin+2 l=1+xin.quo(2) k=i (1..l).each do |j| k=k+1 n1=n1-1 sum=freq[j]+work[j] freq[j]=sum work[k]=sum-freq[n1] freq[n1]=sum end end # Generate percentages for normal U dist=(1+max_u/2).to_i freq.shift total=freq.inject(0) {|a,v| a+v } (0...dist).collect {|i| if i!=max_u-i ues=freq[i]*2 else ues=freq[i] end ues.quo(total) } end```

Instance Method Details

#probability_exact ⇒ Object

Exact probability of finding values of U lower or equal to sample on U distribution. Use with caution with m*n>100000. Uses u_sampling_distribution_as62

 ``` 162 163 164 165 166 167 168 169``` ```# File 'lib/statsample/test/umannwhitney.rb', line 162 def probability_exact dist = UMannWhitney.u_sampling_distribution_as62(@n1,@n2) sum = 0 (0..@u.to_i).each {|i| sum+=dist[i] } sum end```

#probability_z ⇒ Object

Assuming H_0, the proportion of cdf with values of U lower than the sample, using normal approximation. Use with more than 30 cases per group.

 ``` 202 203 204``` ```# File 'lib/statsample/test/umannwhitney.rb', line 202 def probability_z (1-Distribution::Normal.cdf(z.abs()))*2 end```

#report_building(generator) ⇒ Object

:nodoc:

 ``` 147 148 149 150 151 152 153 154 155 156 157 158 159``` ```# File 'lib/statsample/test/umannwhitney.rb', line 147 def report_building(generator) # :nodoc: generator.section(:name=>@name) do |s| s.table(:name=>_("%s results") % @name) do |t| t.row([_("Sum of ranks %s") % @v1.name, "%0.3f" % @r1]) t.row([_("Sum of ranks %s") % @v2.name, "%0.3f" % @r2]) t.row([_("U Value"), "%0.3f" % @u]) t.row([_("Z"), "%0.3f (p: %0.3f)" % [z, probability_z]]) if @n1*@n2

#z ⇒ Object

Z value for U, with adjust for ties. For large samples, U is approximately normally distributed. In that case, you can use z to obtain probabily for U.

Reference:

• SPSS Manual

 ``` 187 188 189 190 191 192 193 194 195 196 197 198``` ```# File 'lib/statsample/test/umannwhitney.rb', line 187 def z mu=(@n1*@n2).quo(2) if(!@ties) ou=Math::sqrt(((@n1*@n2)*(@n1+@n2+1)).quo(12)) else n=@n1+@n2 first=(@n1*@n2).quo(n*(n-1)) second=((n**3-n).quo(12))-@t ou=Math::sqrt(first*second) end (@u-mu).quo(ou) end```