Class: UMannWhitney

Inherits:

Object

Object
UMannWhitney

show all

Defined in:: lib/u_mann_whitney.rb,
lib/u_mann_whitney/version.rb

Overview

U Mann-Whitney test

Non-parametric test for assessing whether two independent samples of observations come from the same distribution.

Assumptions

The two samples under investigation in the test are independent of each other and the observations within each sample are independent.
The observations are comparable (i.e., for any two observations, one can assess whether they are equal or, if not, which one is greater).
The variances in the two groups are approximately equal.

Higher differences of distributions correspond to to lower values of U.

Constant Summary collapse

MAX_MN_EXACT = Max for m*n allowed for exact calculation of probability

VERSION =

"0.2.0"

Instance Attribute Summary collapse

#name ⇒ Object

Name of test.
#r1 ⇒ Object readonly

Sample 1 Rank sum.
#r2 ⇒ Object readonly

Sample 2 Rank sum.
#t ⇒ Object readonly

Value of compensation for ties (useful for demostration).
#u ⇒ Object readonly

U Value.
#u1 ⇒ Object readonly

Sample 1 U (useful for demostration).
#u2 ⇒ Object readonly

Sample 2 U (useful for demostration).

Class Method Summary collapse

.u_mannwhitney(v1, v2) ⇒ Object
.u_sampling_distribution_as62(n1, n2) ⇒ Object

U sampling distribution, based on Dinneen & Blakesley (1973) algorithm.

Instance Method Summary collapse

#_(t) ⇒ Object

Shim for gettext.
#initialize(v1, v2, opts = Hash.new) ⇒ UMannWhitney constructor

Create a new U Mann-Whitney test Params: Two Daru::Vectors.
#probability_exact ⇒ Object

Exact probability of finding values of U lower or equal to sample on U distribution.
#probability_z ⇒ Object

Assuming H_0, the proportion of cdf with values of U lower than the sample, using normal approximation.
#report_building(generator) ⇒ Object

:nodoc:.
#z ⇒ Object

Z value for U, with adjust for ties.

Constructor Details

#initialize(v1, v2, opts = Hash.new) ⇒ `UMannWhitney`

Create a new U Mann-Whitney test Params: Two Daru::Vectors

# File 'lib/u_mann_whitney.rb', line 99

def initialize(v1,v2, opts=Hash.new)
  @v1      = v1
  @v2      = v2
  v1_valid = v1.reject_values(*Daru::MISSING_VALUES).reset_index!
  v2_valid = v2.reject_values(*Daru::MISSING_VALUES).reset_index!
  @n1      = v1_valid.size
  @n2      = v2_valid.size
  data     = Daru::Vector.new(v1_valid.to_a + v2_valid.to_a)
  groups   = Daru::Vector.new(([0] * @n1) + ([1] * @n2))
  ds       = Daru::DataFrame.new({:g => groups, :data => data})
  @t       = nil
  @ties    = data.to_a.size != data.to_a.uniq.size
  if @ties
    adjust_for_ties(ds[:data])
  end
  ds[:ranked] = ds[:data].ranked
  @n = ds.nrows

  @r1 = ds.filter_rows { |r| r[:g] == 0}[:ranked].sum || 0
  @r2 = ((ds.nrows * (ds.nrows + 1)).quo(2)) - r1
  @u1 = r1 - ((@n1 * (@n1 + 1)).quo(2))
  @u2 = r2 - ((@n2 * (@n2 + 1)).quo(2))
  @u  = (u1 < u2) ? u1 : u2
  opts_default = { :name=>_("Mann-Whitney's U") }
  @opts = opts_default.merge(opts)
  opts_default.keys.each {|k|
    send("#{k}=", @opts[k])
  }
end

Instance Attribute Details

#name ⇒ `Object`

Name of test



94
95
96

# File 'lib/u_mann_whitney.rb', line 94

def name
  @name
end

#r1 ⇒ `Object` (readonly)

Sample 1 Rank sum



82
83
84

# File 'lib/u_mann_whitney.rb', line 82

def r1
  @r1
end

#r2 ⇒ `Object` (readonly)

Sample 2 Rank sum



84
85
86

# File 'lib/u_mann_whitney.rb', line 84

def r2
  @r2
end

#t ⇒ `Object` (readonly)

Value of compensation for ties (useful for demostration)



92
93
94

# File 'lib/u_mann_whitney.rb', line 92

def t
  @t
end

#u ⇒ `Object` (readonly)

U Value



90
91
92

# File 'lib/u_mann_whitney.rb', line 90

def u
  @u
end

#u1 ⇒ `Object` (readonly)

Sample 1 U (useful for demostration)



86
87
88

# File 'lib/u_mann_whitney.rb', line 86

def u1
  @u1
end

#u2 ⇒ `Object` (readonly)

Sample 2 U (useful for demostration)



88
89
90

# File 'lib/u_mann_whitney.rb', line 88

def u2
  @u2
end

Class Method Details

.u_mannwhitney(v1, v2) ⇒ `Object`



24
25
26

# File 'lib/u_mann_whitney.rb', line 24

def self.u_mannwhitney(v1, v2)
  new(v1,v2)
end

.u_sampling_distribution_as62(n1, n2) ⇒ `Object`

U sampling distribution, based on Dinneen & Blakesley (1973) algorithm. This is the algorithm used on SPSS.

Parameters:

n1: group 1 size
n2: group 2 size

Reference:

Dinneen, L., & Blakesley, B. (1973). Algorithm AS 62: A Generator for the Sampling Distribution of the Mann- Whitney U Statistic. Journal of the Royal Statistical Society, 22(2), 269-273

# File 'lib/u_mann_whitney.rb', line 37

def self.u_sampling_distribution_as62(n1,n2)

  freq=[]
  work=[]
  mn1=n1*n2+1
  max_u=n1*n2
  minmn=n1<n2 ? n1 : n2
  maxmn=n1>n2 ? n1 : n2
  n1=maxmn+1
  (1..n1).each{|i| freq[i]=1}
  n1+=1
  (n1..mn1).each{|i| freq[i]=0}
  work[1]=0
  xin=maxmn
  (2..minmn).each do |i|
    work[i]=0
    xin=xin+maxmn
    n1=xin+2
    l=1+xin.quo(2)
    k=i
    (1..l).each do |j|
      k=k+1
      n1=n1-1
      sum=freq[j]+work[j]
      freq[j]=sum
      work[k]=sum-freq[n1]
      freq[n1]=sum
    end
  end

  # Generate percentages for normal U
  dist=(1+max_u/2).to_i
  freq.shift
  total=freq.inject(0) {|a,v| a+v }
  (0...dist).collect {|i|
    if i!=max_u-i
      ues=freq[i]*2
    else
      ues=freq[i]
    end
    ues.quo(total)
  }
end

Instance Method Details

#_(t) ⇒ `Object`

Shim for gettext



142
143
144

# File 'lib/u_mann_whitney.rb', line 142

def _(t)
  t
end

#probability_exact ⇒ `Object`

Exact probability of finding values of U lower or equal to sample on U distribution. Use with caution with m*n>100000. Uses u_sampling_distribution_as62

# File 'lib/u_mann_whitney.rb', line 147

def probability_exact
  dist = UMannWhitney.u_sampling_distribution_as62(@n1,@n2)
  sum = 0
  (0..@u.to_i).each {|i|
    sum+=dist[i]
  }
  sum
end

#probability_z ⇒ `Object`

Assuming H_0, the proportion of cdf with values of U lower than the sample, using normal approximation. Use with more than 30 cases per group.



187
188
189

# File 'lib/u_mann_whitney.rb', line 187

def probability_z
  (1-Distribution::Normal.cdf(z.abs()))*2
end

#report_building(generator) ⇒ `Object`

:nodoc:

# File 'lib/u_mann_whitney.rb', line 128

def report_building(generator) # :nodoc:
  generator.section(:name=>@name) do |s|
    s.table(:name=>_("%s results") % @name) do |t|
      t.row([_("Sum of ranks %s") % @v1.name, "%0.3f" % @r1])
      t.row([_("Sum of ranks %s") % @v2.name, "%0.3f" % @r2])
      t.row([_("U Value"), "%0.3f" % @u])
      t.row([_("Z"), "%0.3f (p: %0.3f)" % [z, probability_z]])
      if @n1*@n2<MAX_MN_EXACT
        t.row([_("Exact p (Dinneen & Blakesley, 1973):"), "%0.3f" % probability_exact])
      end
    end
  end
end

#z ⇒ `Object`

Z value for U, with adjust for ties. For large samples, U is approximately normally distributed. In that case, you can use z to obtain probabily for U.

Reference:

SPSS Manual

# File 'lib/u_mann_whitney.rb', line 172

def z
  mu=(@n1*@n2).quo(2)
  if(!@ties)
    ou=Math::sqrt(((@n1*@n2)*(@n1+@n2+1)).quo(12))
  else
    n=@n1+@n2
    first=(@n1*@n2).quo(n*(n-1))
    second=((n**3-n).quo(12))-@t
    ou=Math::sqrt(first*second)
  end
  (@u-mu).quo(ou)
end

Class: UMannWhitney

Overview

U Mann-Whitney test

Assumptions

Constant Summary collapse

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(v1, v2, opts = Hash.new) ⇒ UMannWhitney

Instance Attribute Details

#name ⇒ Object

#r1 ⇒ Object (readonly)

#r2 ⇒ Object (readonly)

#t ⇒ Object (readonly)

#u ⇒ Object (readonly)

#u1 ⇒ Object (readonly)

#u2 ⇒ Object (readonly)

Class Method Details

.u_mannwhitney(v1, v2) ⇒ Object

.u_sampling_distribution_as62(n1, n2) ⇒ Object

Reference:

Instance Method Details

#_(t) ⇒ Object

#probability_exact ⇒ Object

#probability_z ⇒ Object

#report_building(generator) ⇒ Object

#z ⇒ Object

Reference:

#initialize(v1, v2, opts = Hash.new) ⇒ `UMannWhitney`

#name ⇒ `Object`

#r1 ⇒ `Object` (readonly)

#r2 ⇒ `Object` (readonly)

#t ⇒ `Object` (readonly)

#u ⇒ `Object` (readonly)

#u1 ⇒ `Object` (readonly)

#u2 ⇒ `Object` (readonly)

.u_mannwhitney(v1, v2) ⇒ `Object`

.u_sampling_distribution_as62(n1, n2) ⇒ `Object`

#_(t) ⇒ `Object`

#probability_exact ⇒ `Object`

#probability_z ⇒ `Object`

#report_building(generator) ⇒ `Object`

#z ⇒ `Object`