Class: UMannWhitney
- Inherits:
-
Object
- Object
- UMannWhitney
- Defined in:
- lib/u_mann_whitney.rb,
lib/u_mann_whitney/version.rb
Overview
U Mann-Whitney test
Non-parametric test for assessing whether two independent samples of observations come from the same distribution.
Assumptions
-
The two samples under investigation in the test are independent of each other and the observations within each sample are independent.
-
The observations are comparable (i.e., for any two observations, one can assess whether they are equal or, if not, which one is greater).
-
The variances in the two groups are approximately equal.
Higher differences of distributions correspond to to lower values of U.
Constant Summary collapse
- MAX_MN_EXACT =
Max for m*n allowed for exact calculation of probability
10000- VERSION =
"0.2.0"
Instance Attribute Summary collapse
-
#name ⇒ Object
Name of test.
-
#r1 ⇒ Object
readonly
Sample 1 Rank sum.
-
#r2 ⇒ Object
readonly
Sample 2 Rank sum.
-
#t ⇒ Object
readonly
Value of compensation for ties (useful for demostration).
-
#u ⇒ Object
readonly
U Value.
-
#u1 ⇒ Object
readonly
Sample 1 U (useful for demostration).
-
#u2 ⇒ Object
readonly
Sample 2 U (useful for demostration).
Class Method Summary collapse
- .u_mannwhitney(v1, v2) ⇒ Object
-
.u_sampling_distribution_as62(n1, n2) ⇒ Object
U sampling distribution, based on Dinneen & Blakesley (1973) algorithm.
Instance Method Summary collapse
-
#_(t) ⇒ Object
Shim for gettext.
-
#initialize(v1, v2, opts = Hash.new) ⇒ UMannWhitney
constructor
Create a new U Mann-Whitney test Params: Two Daru::Vectors.
-
#probability_exact ⇒ Object
Exact probability of finding values of U lower or equal to sample on U distribution.
-
#probability_z ⇒ Object
Assuming H_0, the proportion of cdf with values of U lower than the sample, using normal approximation.
-
#report_building(generator) ⇒ Object
:nodoc:.
-
#z ⇒ Object
Z value for U, with adjust for ties.
Constructor Details
#initialize(v1, v2, opts = Hash.new) ⇒ UMannWhitney
Create a new U Mann-Whitney test Params: Two Daru::Vectors
99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
# File 'lib/u_mann_whitney.rb', line 99 def initialize(v1,v2, opts=Hash.new) @v1 = v1 @v2 = v2 v1_valid = v1.reject_values(*Daru::MISSING_VALUES).reset_index! v2_valid = v2.reject_values(*Daru::MISSING_VALUES).reset_index! @n1 = v1_valid.size @n2 = v2_valid.size data = Daru::Vector.new(v1_valid.to_a + v2_valid.to_a) groups = Daru::Vector.new(([0] * @n1) + ([1] * @n2)) ds = Daru::DataFrame.new({:g => groups, :data => data}) @t = nil @ties = data.to_a.size != data.to_a.uniq.size if @ties adjust_for_ties(ds[:data]) end ds[:ranked] = ds[:data].ranked @n = ds.nrows @r1 = ds.filter_rows { |r| r[:g] == 0}[:ranked].sum || 0 @r2 = ((ds.nrows * (ds.nrows + 1)).quo(2)) - r1 @u1 = r1 - ((@n1 * (@n1 + 1)).quo(2)) @u2 = r2 - ((@n2 * (@n2 + 1)).quo(2)) @u = (u1 < u2) ? u1 : u2 opts_default = { :name=>_("Mann-Whitney's U") } @opts = opts_default.merge(opts) opts_default.keys.each {|k| send("#{k}=", @opts[k]) } end |
Instance Attribute Details
#name ⇒ Object
Name of test
94 95 96 |
# File 'lib/u_mann_whitney.rb', line 94 def name @name end |
#r1 ⇒ Object (readonly)
Sample 1 Rank sum
82 83 84 |
# File 'lib/u_mann_whitney.rb', line 82 def r1 @r1 end |
#r2 ⇒ Object (readonly)
Sample 2 Rank sum
84 85 86 |
# File 'lib/u_mann_whitney.rb', line 84 def r2 @r2 end |
#t ⇒ Object (readonly)
Value of compensation for ties (useful for demostration)
92 93 94 |
# File 'lib/u_mann_whitney.rb', line 92 def t @t end |
#u ⇒ Object (readonly)
U Value
90 91 92 |
# File 'lib/u_mann_whitney.rb', line 90 def u @u end |
#u1 ⇒ Object (readonly)
Sample 1 U (useful for demostration)
86 87 88 |
# File 'lib/u_mann_whitney.rb', line 86 def u1 @u1 end |
#u2 ⇒ Object (readonly)
Sample 2 U (useful for demostration)
88 89 90 |
# File 'lib/u_mann_whitney.rb', line 88 def u2 @u2 end |
Class Method Details
.u_mannwhitney(v1, v2) ⇒ Object
24 25 26 |
# File 'lib/u_mann_whitney.rb', line 24 def self.u_mannwhitney(v1, v2) new(v1,v2) end |
.u_sampling_distribution_as62(n1, n2) ⇒ Object
U sampling distribution, based on Dinneen & Blakesley (1973) algorithm. This is the algorithm used on SPSS.
Parameters:
-
n1: group 1 size -
n2: group 2 size
Reference:
-
Dinneen, L., & Blakesley, B. (1973). Algorithm AS 62: A Generator for the Sampling Distribution of the Mann- Whitney U Statistic. Journal of the Royal Statistical Society, 22(2), 269-273
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
# File 'lib/u_mann_whitney.rb', line 37 def self.u_sampling_distribution_as62(n1,n2) freq=[] work=[] mn1=n1*n2+1 max_u=n1*n2 minmn=n1<n2 ? n1 : n2 maxmn=n1>n2 ? n1 : n2 n1=maxmn+1 (1..n1).each{|i| freq[i]=1} n1+=1 (n1..mn1).each{|i| freq[i]=0} work[1]=0 xin=maxmn (2..minmn).each do |i| work[i]=0 xin=xin+maxmn n1=xin+2 l=1+xin.quo(2) k=i (1..l).each do |j| k=k+1 n1=n1-1 sum=freq[j]+work[j] freq[j]=sum work[k]=sum-freq[n1] freq[n1]=sum end end # Generate percentages for normal U dist=(1+max_u/2).to_i freq.shift total=freq.inject(0) {|a,v| a+v } (0...dist).collect {|i| if i!=max_u-i ues=freq[i]*2 else ues=freq[i] end ues.quo(total) } end |
Instance Method Details
#_(t) ⇒ Object
Shim for gettext
142 143 144 |
# File 'lib/u_mann_whitney.rb', line 142 def _(t) t end |
#probability_exact ⇒ Object
Exact probability of finding values of U lower or equal to sample on U distribution. Use with caution with m*n>100000. Uses u_sampling_distribution_as62
147 148 149 150 151 152 153 154 |
# File 'lib/u_mann_whitney.rb', line 147 def probability_exact dist = UMannWhitney.u_sampling_distribution_as62(@n1,@n2) sum = 0 (0..@u.to_i).each {|i| sum+=dist[i] } sum end |
#probability_z ⇒ Object
Assuming H_0, the proportion of cdf with values of U lower than the sample, using normal approximation. Use with more than 30 cases per group.
187 188 189 |
# File 'lib/u_mann_whitney.rb', line 187 def probability_z (1-Distribution::Normal.cdf(z.abs()))*2 end |
#report_building(generator) ⇒ Object
:nodoc:
128 129 130 131 132 133 134 135 136 137 138 139 140 |
# File 'lib/u_mann_whitney.rb', line 128 def report_building(generator) # :nodoc: generator.section(:name=>@name) do |s| s.table(:name=>_("%s results") % @name) do |t| t.row([_("Sum of ranks %s") % @v1.name, "%0.3f" % @r1]) t.row([_("Sum of ranks %s") % @v2.name, "%0.3f" % @r2]) t.row([_("U Value"), "%0.3f" % @u]) t.row([_("Z"), "%0.3f (p: %0.3f)" % [z, probability_z]]) if @n1*@n2<MAX_MN_EXACT t.row([_("Exact p (Dinneen & Blakesley, 1973):"), "%0.3f" % probability_exact]) end end end end |
#z ⇒ Object
Z value for U, with adjust for ties. For large samples, U is approximately normally distributed. In that case, you can use z to obtain probabily for U.
Reference:
-
SPSS Manual
172 173 174 175 176 177 178 179 180 181 182 183 |
# File 'lib/u_mann_whitney.rb', line 172 def z mu=(@n1*@n2).quo(2) if(!@ties) ou=Math::sqrt(((@n1*@n2)*(@n1+@n2+1)).quo(12)) else n=@n1+@n2 first=(@n1*@n2).quo(n*(n-1)) second=((n**3-n).quo(12))-@t ou=Math::sqrt(first*second) end (@u-mu).quo(ou) end |