Class: Statsample::Bivariate::Tetrachoric

Inherits:
Object
  • Object
show all
Includes:
Summarizable
Defined in:
lib/statsample/bivariate/tetrachoric.rb

Overview

Compute tetrachoric correlation.

The tetrachoric correlation is a measure of bivariate association arising when both observed variates are categorical variables that result from dichotomizing the two undelying continuous variables (Drasgow, 2006). The tetrachoric correlation is a good way to measure rater agreement (Uebersax, 2006)

This class uses Brown (1977) algorithm. You can see FORTRAN code on lib.stat.cmu.edu/apstat/116

Usage

With two variables x and y on a crosstab like this:

      -------------
      | y=0 | y=1 |
      -------------
x = 0 |  a  |  b  |
      -------------
x = 1 |  c  |  d  |
      -------------

The code will be

tc=Statsample::Bivariate::Tetrachoric.new(a,b,c,d)
tc.r # correlation
tc.se # standard error
tc.threshold_y # threshold for y variable
tc.threshold_x # threshold for x variable

Reference:

  • Brown, MB. (1977) Algorithm AS 116: the tetrachoric correlation and its standard error. Applied Statistics, 26, 343-351.

  • Drasgow F. (2006). Polychoric and polyserial correlations. In Kotz L, Johnson NL (Eds.), Encyclopedia of statistical sciences. Vol. 7 (pp. 69-74). New York: Wiley.

  • Uebersax, J.S. (2006). The tetrachoric and polychoric correlation coefficients. Statistical Methods for Rater Agreement web site. 2006. Available at: john-uebersax.com/stat/tetra.htm . Accessed February, 11, 2010

Constant Summary collapse

RequerimentNotMeet =
Class.new(Exception)
TWOPI =
Math::PI*2
SQT2PI =
2.50662827
RLIMIT =
0.9999
RCUT =
0.95
UPLIM =
5.0
CONST =
1E-36
CHALF =
1E-18
CONV =
1E-8
CITER =
1E-6
NITER =
25
X =
[0,0.9972638618,  0.9856115115,  0.9647622556, 0.9349060759,  0.8963211558, 0.8493676137, 0.7944837960, 0.7321821187, 0.6630442669, 0.5877157572, 0.5068999089, 0.4213512761, 0.3318686023, 0.2392873623, 0.1444719616, 0.0483076657]
W =
[0, 0.0070186100,  0.0162743947,  0.0253920653, 0.0342738629,  0.0428358980,  0.0509980593, 0.0586840935,  0.0658222228,  0.0723457941, 0.0781938958, 0.0833119242, 0.0876520930, 0.0911738787, 0.0938443991, 0.0956387201, 0.0965400885]

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(a, b, c, d, opts = Hash.new) ⇒ Tetrachoric

Creates a new tetrachoric object for analysis


146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
# File 'lib/statsample/bivariate/tetrachoric.rb', line 146

def initialize(a,b,c,d, opts=Hash.new)
  @a,@b,@c,@d=a,b,c,d
  
  opts_default={
    :name=>_("Tetrachoric correlation"),
    :ruby_engine=>false
  }
  
  @opts=opts_default.merge opts
  @opts.each{|k,v| self.send("#{k}=",v) if self.respond_to? k}
  #
  #       CHECK IF ANY CELL FREQUENCY IS NEGATIVE
  #
  raise "All frequencies should be positive" if  (@a < 0 or @b < 0 or @c < 0  or @d < 0)
  compute
end

Instance Attribute Details

#nameObject

Name on the analysis


75
76
77
# File 'lib/statsample/bivariate/tetrachoric.rb', line 75

def name
  @name
end

#rObject (readonly)

Value for tethrachoric correlation


73
74
75
# File 'lib/statsample/bivariate/tetrachoric.rb', line 73

def r
  @r
end

#ruby_engineObject

Use ruby version of algorithm. By default, this attribute is set to false, and C version of algorithm is used


79
80
81
# File 'lib/statsample/bivariate/tetrachoric.rb', line 79

def ruby_engine
  @ruby_engine
end

Class Method Details

.new_with_matrix(m, opts = Hash.new) ⇒ Object

Creates a Tetrachoric object based on a 2x2 Matrix.


93
94
95
# File 'lib/statsample/bivariate/tetrachoric.rb', line 93

def self.new_with_matrix(m, opts=Hash.new)
  Tetrachoric.new(m[0,0], m[0,1], m[1,0],m[1,1], opts)
end

.new_with_vectors(v1, v2, opts = Hash.new) ⇒ Object

Creates a Tetrachoric object based on two vectors. The vectors are dichotomized previously.

Raises:


98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
# File 'lib/statsample/bivariate/tetrachoric.rb', line 98

def self.new_with_vectors(v1,v2, opts=Hash.new)
  v1a, v2a=Statsample.only_valid(v1,v2)
  v1a=v1a.dichotomize
  v2a=v2a.dichotomize
  raise RequerimentNotMeet, "v1 have only 0" if v1a.factors==[0]
  raise RequerimentNotMeet, "v2 have only 0" if v2a.factors==[0]
  a,b,c,d = 0,0,0,0
  v1a.each_index{|i|
    x,y=v1a[i],v2a[i]
    a+=1 if x==0 and y==0
    b+=1 if x==0 and y==1
    c+=1 if x==1 and y==0
    d+=1 if x==1 and y==1
  }
  Tetrachoric.new(a,b,c,d, opts)
end

Instance Method Details

#check_frequenciesObject

Raises:


176
177
178
179
180
181
182
183
184
185
186
187
188
189
# File 'lib/statsample/bivariate/tetrachoric.rb', line 176

def check_frequencies
  #
  #       CHECK IF ANY FREQUENCY IS 0.0 AND SET kdelta
  #
  @kdelta = 1
 
  @kdelta  = 2 if (@a == 0 or @d == 0)
  @kdelta += 2 if (@b == 0 or @c == 0)
  #
  #        kdelta=4 MEANS TABLE HAS 0.0 ROW OR COLUMN, RUN IS TERMINATED
  #

  raise RequerimentNotMeet, "Rows and columns should have more than 0 items" if @kdelta==4
end

#compute_optimizedObject

Compute the tetrachoric correlation


170
171
172
173
174
175
# File 'lib/statsample/bivariate/tetrachoric.rb', line 170

def compute_optimized
  check_frequencies        
  t=Statsample::STATSAMPLE__.tetrachoric(@a,@b,@c,@d)
  raise "Error on calculation of tetrachoric correlation: #{t[:ifault]}" if t[:ifault]>0
  @r,@sdr,@itype,@ifault,@zab, @zac = t[:r],t[:sdr],t[:itype],t[:ifault], t[:threshold_x], t[:threshold_y]
end

#compute_rubyObject

Compute the tetrachoric correlation using ruby Called on object creation.


193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
# File 'lib/statsample/bivariate/tetrachoric.rb', line 193

def compute_ruby
  check_frequencies
  #
  # INITIALIZATION
  #
  @r = 0
  sdzero = 0
  @sdr = 0
  @itype = 0
  @ifault = 0
  delta  = 0
  

  #      GOTO (4, 1, 2 , 92), kdelta
  #
  #        delta IS 0.0, 0.5 OR -0.5 ACCORDING TO WHICH CELL IS 0.0
  #

  if(@kdelta==2)
    # 1
    delta=0.5
    @r=-1 if (@a==0 and @d==0)
  elsif(@kdelta==3)
    # 2
    delta=-0.5
    @r=1 if (@b==0 and @c==0)
  end
  # 4
  if @r!=0
    @itype=3
  end

  #
  #        STORE FREQUENCIES IN  AA, BB, CC AND DD
  #
  @aa = @a + delta
  @bb = @b - delta
  @cc = @c - delta
  @dd = @d + delta
  @tot = @aa+@bb+@cc+@dd
  #
  #        CHECK IF CORRELATION IS NEGATIVE, 0.0, POSITIVE
  #        IF (AA * DD - BB * CC) 7, 5, 6

  corr_dir=@aa * @dd - @bb * @cc
  if(corr_dir < 0)
    # 7
    @probaa = @bb.quo(@tot)
    @probac = (@bb + @dd).quo(@tot)
    @ksign = 2
    # ->  8
  else
    if (corr_dir==0)
      # 5
      @itype=4
    end
    # 6
    #
    #        COMPUTE PROBABILITIES OF QUADRANT AND OF MARGINALS
    #        PROBAA AND PROBAC CHOSEN SO THAT CORRELATION IS POSITIVE.
    #        KSIGN INDICATES WHETHER QUADRANTS HAVE BEEN SWITCHED
    #

    @probaa = @aa.quo(@tot)
    @probac = (@aa+@cc).quo(@tot)
    @ksign=1
  end
  # 8

  @probab = (@aa+@bb).quo(@tot)

  #
  #        COMPUTE NORMAL DEVIATES FOR THE MARGINAL FREQUENCIES
  #        SINCE NO MARGINAL CAN BE 0.0, IE IS NOT CHECKED
  #
  @zac = Distribution::Normal.p_value(@probac.to_f)
  @zab = Distribution::Normal.p_value(@probab.to_f)
  @ss = Math::exp(-0.5 * (@zac ** 2 + @zab ** 2)).quo(TWOPI)
  #
  #        WHEN R IS 0.0, 1.0 OR -1.0, TRANSFER TO COMPUTE SDZERO
  #
  if (@r != 0 or @itype > 0)
    compute_sdzero
    return true
  end
  #
  #        WHEN MARGINALS ARE EQUAL, COSINE EVALUATION IS USED
  #
  if (@a == @b and @b == @c)
    calculate_cosine
    return true
  end
  #
  #        INITIAL ESTIMATE OF CORRELATION IS YULES Y
  #
  @rr = ((Math::sqrt(@aa * @dd) - Math::sqrt(@bb * @cc)) ** 2)  / (@aa * @dd - @bb * @cc).abs
  @iter = 0
  begin
    #
    #        IF RR EXCEEDS RCUT, GAUSSIAN QUADRATURE IS USED
    #
    #10
    if @rr>RCUT
      gaussian_quadrature
      return true
    end
    #
    #        TETRACHORIC SERIES IS COMPUTED
    #
    #        INITIALIZATION
    #
    va=1.0
    vb=@zac.to_f
    wa=1.0
    wb=@zab.to_f
    term = 1.0
    iterm = 0.0
    @sum = @probab * @probac
    deriv = 0.0
    sr = @ss
    #15
    begin
      if(sr.abs<=CONST)
        #
        #        RESCALE TERMS TO AVOID OVERFLOWS AND UNDERFLOWS
        #
        sr = sr  / CONST
        va = va * CHALF
        vb = vb * CHALF
        wa = wa * CHALF
        wb = wb * CHALF
      end
      #
      #        FORM SUM AND DERIVATIVE OF SERIES
      #
      #  20
      dr = sr * va * wa
      sr = sr * @rr / term
      cof = sr * va * wa
      #
      #        ITERM COUNTS NO. OF CONSECUTIVE TERMS  <  CONV
      #
      iterm+=  1
      iterm=0 if (cof.abs > CONV)
      @sum = @sum + cof
      deriv += dr
      vaa = va
      waa = wa
      va = vb
      wa = wb
      vb = @zac * va - term * vaa
      wb = @zab * wa - term * waa
      term += 1
    end while (iterm < 2 or term < 6)
    #
    #        CHECK IF ITERATION CONVERGED
    #
    if((@sum-@probaa).abs <= CITER)
      @itype=term
      calculate_sdr
      return true
    end
    #
    #        CALCULATE NEXT ESTIMATE OF CORRELATION
    #
    #25
    @iter += 1
    #
    #        IF TOO MANY ITERATlONS, RUN IS TERMINATED
    #
    delta = (@sum - @probaa) /  deriv
    @rrprev = @rr
    @rr = @rr - delta
    @rr += 0.5 * delta if(@iter == 1)
    @rr= RLIMIT if (@rr > RLIMIT)
    @rr =0 if (@rr  <  0.0)
  end while @iter < NITER
  raise "Too many iteration"
  #  GOTO 10
end

#report_building(generator) ⇒ Object

:nodoc:


130
131
132
133
134
135
136
137
138
139
140
141
142
143
# File 'lib/statsample/bivariate/tetrachoric.rb', line 130

def report_building(generator) # :nodoc:
  generator.section(:name=>@name) do |s|
    s.table(:name=>_("Contingence Table"),:header=>["","Y=0","Y=1", "T"]) do |t|
      t.row(["X=0", @a,@b,@a+@b])
      t.row(["X=1", @c,@d,@c+@d])
      t.hr
      t.row(["T", @a+@c,@b+@d,@a+@b+@c+@d])
    end
    s.text(sprintf("r: %0.3f",r))
    s.text(_("SE: %0.3f") % se)
    s.text(_("Threshold X: %0.3f ") % threshold_x)
    s.text(_("Threshold Y: %0.3f ") % threshold_y )
  end
end

#seObject

Standard error


115
116
117
# File 'lib/statsample/bivariate/tetrachoric.rb', line 115

def se
  @sdr
end

#threshold_xObject

Threshold for variable x (rows) Point on gauss curve under X rater select cases


120
121
122
# File 'lib/statsample/bivariate/tetrachoric.rb', line 120

def threshold_x
  @zab
end

#threshold_yObject

Threshold for variable y (columns) Point on gauss curve under Y rater select cases


127
128
129
# File 'lib/statsample/bivariate/tetrachoric.rb', line 127

def threshold_y
  @zac
end