Class: Statsample::Test::UMannWhitney

Inherits:
Object
  • Object
show all
Includes:
Summarizable
Defined in:
lib/statsample/test/umannwhitney.rb

Overview

U Mann-Whitney test

Non-parametric test for assessing whether two independent samples of observations come from the same distribution.

Assumptions

  • The two samples under investigation in the test are independent of each other and the observations within each sample are independent.

  • The observations are comparable (i.e., for any two observations, one can assess whether they are equal or, if not, which one is greater).

  • The variances in the two groups are approximately equal.

Higher differences of distributions correspond to to lower values of U.

Constant Summary collapse

MAX_MN_EXACT =

Max for m*n allowed for exact calculation of probability

10000

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Summarizable

#summary

Constructor Details

#initialize(v1, v2, opts = Hash.new) ⇒ UMannWhitney

Create a new U Mann-Whitney test Params: Two Daru::Vectors


118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
# File 'lib/statsample/test/umannwhitney.rb', line 118

def initialize(v1,v2, opts=Hash.new)
  @v1      = v1
  @v2      = v2
  v1_valid = v1.only_valid.reset_index!
  v2_valid = v2.only_valid.reset_index!
  @n1      = v1_valid.size
  @n2      = v2_valid.size
  data     = Daru::Vector.new(v1_valid.to_a + v2_valid.to_a)
  groups   = Daru::Vector.new(([0] * @n1) + ([1] * @n2))
  ds       = Daru::DataFrame.new({:g => groups, :data => data})
  @t       = nil
  @ties    = data.to_a.size != data.to_a.uniq.size        
  if @ties
    adjust_for_ties(ds[:data])
  end
  ds[:ranked] = ds[:data].ranked      
  @n = ds.nrows
    
  @r1 = ds.filter_rows { |r| r[:g] == 0}[:ranked].sum
  @r2 = ((ds.nrows * (ds.nrows + 1)).quo(2)) - r1
  @u1 = r1 - ((@n1 * (@n1 + 1)).quo(2))
  @u2 = r2 - ((@n2 * (@n2 + 1)).quo(2))
  @u  = (u1 < u2) ? u1 : u2
  opts_default = { :name=>_("Mann-Whitney's U") }
  @opts = opts_default.merge(opts)
  opts_default.keys.each {|k|
    send("#{k}=", @opts[k])
  }       
end

Instance Attribute Details

#nameObject

Name of test


112
113
114
# File 'lib/statsample/test/umannwhitney.rb', line 112

def name
  @name
end

#r1Object (readonly)

Sample 1 Rank sum


100
101
102
# File 'lib/statsample/test/umannwhitney.rb', line 100

def r1
  @r1
end

#r2Object (readonly)

Sample 2 Rank sum


102
103
104
# File 'lib/statsample/test/umannwhitney.rb', line 102

def r2
  @r2
end

#tObject (readonly)

Value of compensation for ties (useful for demostration)


110
111
112
# File 'lib/statsample/test/umannwhitney.rb', line 110

def t
  @t
end

#uObject (readonly)

U Value


108
109
110
# File 'lib/statsample/test/umannwhitney.rb', line 108

def u
  @u
end

#u1Object (readonly)

Sample 1 U (useful for demostration)


104
105
106
# File 'lib/statsample/test/umannwhitney.rb', line 104

def u1
  @u1
end

#u2Object (readonly)

Sample 2 U (useful for demostration)


106
107
108
# File 'lib/statsample/test/umannwhitney.rb', line 106

def u2
  @u2
end

Class Method Details

.distribution_permutations(n1, n2) ⇒ Object

Generate distribution for permutations. Very expensive, but useful for demostrations


78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
# File 'lib/statsample/test/umannwhitney.rb', line 78

def self.distribution_permutations(n1,n2)
  base=[0]*n1+[1]*n2
  po=Statsample::Permutation.new(base)
  
  total=n1*n2
  req={}
  po.each do |perm|
    r0,s0=0,0
    perm.each_index {|c_i|
      if perm[c_i]==0
        r0+=c_i+1
        s0+=1
      end
    }
    u1=r0-((s0*(s0+1)).quo(2))
    u2=total-u1
    temp_u= (u1 <= u2) ? u1 : u2
    req[perm]=temp_u
  end
  req
end

.u_sampling_distribution_as62(n1, n2) ⇒ Object

U sampling distribution, based on Dinneen & Blakesley (1973) algorithm. This is the algorithm used on SPSS.

Parameters:

  • n1: group 1 size

  • n2: group 2 size

Reference:

  • Dinneen, L., & Blakesley, B. (1973). Algorithm AS 62: A Generator for the Sampling Distribution of the Mann- Whitney U Statistic. Journal of the Royal Statistical Society, 22(2), 269-273


31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# File 'lib/statsample/test/umannwhitney.rb', line 31

def self.u_sampling_distribution_as62(n1,n2)

  freq=[]
  work=[]
  mn1=n1*n2+1
  max_u=n1*n2
  minmn=n1<n2 ? n1 : n2
  maxmn=n1>n2 ? n1 : n2
  n1=maxmn+1
  (1..n1).each{|i| freq[i]=1}
  n1+=1
  (n1..mn1).each{|i| freq[i]=0}
  work[1]=0
  xin=maxmn
  (2..minmn).each do |i|
    work[i]=0
    xin=xin+maxmn
    n1=xin+2
    l=1+xin.quo(2)
    k=i
    (1..l).each do |j|
      k=k+1
      n1=n1-1
      sum=freq[j]+work[j]
      freq[j]=sum
      work[k]=sum-freq[n1]
      freq[n1]=sum
    end
  end
  
  # Generate percentages for normal U
  dist=(1+max_u/2).to_i
  freq.shift
  total=freq.inject(0) {|a,v| a+v }
  (0...dist).collect {|i|
    if i!=max_u-i
      ues=freq[i]*2
    else
      ues=freq[i]
    end
    ues.quo(total)
  }
end

Instance Method Details

#probability_exactObject

Exact probability of finding values of U lower or equal to sample on U distribution. Use with caution with m*n>100000. Uses u_sampling_distribution_as62


162
163
164
165
166
167
168
169
# File 'lib/statsample/test/umannwhitney.rb', line 162

def probability_exact
  dist = UMannWhitney.u_sampling_distribution_as62(@n1,@n2)
  sum = 0
  (0..@u.to_i).each {|i|
    sum+=dist[i]
  }
  sum
end

#probability_zObject

Assuming H_0, the proportion of cdf with values of U lower than the sample, using normal approximation. Use with more than 30 cases per group.


202
203
204
# File 'lib/statsample/test/umannwhitney.rb', line 202

def probability_z
  (1-Distribution::Normal.cdf(z.abs()))*2
end

#report_building(generator) ⇒ Object

:nodoc:


147
148
149
150
151
152
153
154
155
156
157
158
159
# File 'lib/statsample/test/umannwhitney.rb', line 147

def report_building(generator) # :nodoc:
  generator.section(:name=>@name) do |s|
    s.table(:name=>_("%s results") % @name) do |t|
      t.row([_("Sum of ranks %s") % @v1.name, "%0.3f" % @r1])
      t.row([_("Sum of ranks %s") % @v2.name, "%0.3f" % @r2])
      t.row([_("U Value"), "%0.3f" % @u])
      t.row([_("Z"), "%0.3f (p: %0.3f)" % [z, probability_z]])
      if @n1*@n2<MAX_MN_EXACT
        t.row([_("Exact p (Dinneen & Blakesley, 1973):"), "%0.3f" % probability_exact])
      end
    end
  end
end

#zObject

Z value for U, with adjust for ties. For large samples, U is approximately normally distributed. In that case, you can use z to obtain probabily for U.

Reference:

  • SPSS Manual


187
188
189
190
191
192
193
194
195
196
197
198
# File 'lib/statsample/test/umannwhitney.rb', line 187

def z
  mu=(@n1*@n2).quo(2)
  if(!@ties)
    ou=Math::sqrt(((@n1*@n2)*(@n1+@n2+1)).quo(12))
  else
    n=@n1+@n2
    first=(@n1*@n2).quo(n*(n-1))
    second=((n**3-n).quo(12))-@t
    ou=Math::sqrt(first*second)
  end
  (@u-mu).quo(ou)
end