Class: Statsample::DominanceAnalysis

Inherits:
Object
  • Object
show all
Includes:
Summarizable
Defined in:
lib/statsample/dominanceanalysis.rb,
lib/statsample/dominanceanalysis/bootstrap.rb

Overview

Dominance Analysis is a procedure based on an examination of the R<sup>2</sup> values for all possible subset models, to identify the relevance of one or more predictors in the prediction of criterium.

See Budescu(1993), Azen & Budescu (2003, 2006) for more information.

Use

a = Daru::Vector.new(1000.times.collect rand) b = Daru::Vector.new(1000.times.collect rand) c = Daru::Vector.new(1000.times.collect rand) ds= Daru::DataFrame.new(=> a,:b => b,:c => c) ds = ds.collect_rows {|row| row*5 + row*3 + row*2 + rand()} da=Statsample::DominanceAnalysis.new(ds, :y) puts da.summary

Output:

Report: Report 2010-02-08 19:10:11 -0300
Table: Dominance Analysis result
------------------------------------------------------------
|                  | r2    | sign  | a     | b     | c     |
------------------------------------------------------------
| Model 0          |       |       | 0.648 | 0.265 | 0.109 |
------------------------------------------------------------
| a                | 0.648 | 0.000 | --    | 0.229 | 0.104 |
| b                | 0.265 | 0.000 | 0.612 | --    | 0.104 |
| c                | 0.109 | 0.000 | 0.643 | 0.260 | --    |
------------------------------------------------------------
| k=1 Average      |       |       | 0.627 | 0.244 | 0.104 |
------------------------------------------------------------
| a*b              | 0.877 | 0.000 | --    | --    | 0.099 |
| a*c              | 0.752 | 0.000 | --    | 0.224 | --    |
| b*c              | 0.369 | 0.000 | 0.607 | --    | --    |
------------------------------------------------------------
| k=2 Average      |       |       | 0.607 | 0.224 | 0.099 |
------------------------------------------------------------
| a*b*c            | 0.976 | 0.000 | --    | --    | --    |
------------------------------------------------------------
| Overall averages |       |       | 0.628 | 0.245 | 0.104 |
------------------------------------------------------------

Table: Pairwise dominance
-----------------------------------------
| Pairs | Total | Conditional | General |
-----------------------------------------
| a - b | 1.0   | 1.0         | 1.0     |
| a - c | 1.0   | 1.0         | 1.0     |
| b - c | 1.0   | 1.0         | 1.0     |
-----------------------------------------

Reference:

  • Budescu, D. V. (1993). Dominance analysis: a new approach to the problem of relative importance of predictors in multiple regression. Psychological Bulletin, 114, 542-551.

  • Azen, R. & Budescu, D.V. (2003). The dominance analysis approach for comparing predictors in multiple regression. Psychological Methods, 8(2), 129-148.

  • Azen, R. & Budescu, D.V. (2006). Comparing predictors in Multivariate Regression Models: An extension of Dominance Analysis. Journal of Educational and Behavioral Statistics, 31(2), 157-180.

Defined Under Namespace

Classes: Bootstrap, ModelData

Constant Summary collapse

UNIVARIATE_REGRESSION_CLASS =
Statsample::Regression::Multiple::MatrixEngine
MULTIVARIATE_REGRESSION_CLASS =
Statsample::Regression::Multiple::MultipleDependent

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Summarizable

#summary

Constructor Details

#initialize(input, dependent, opts = Hash.new) ⇒ DominanceAnalysis

Creates a new DominanceAnalysis object Parameters:

  • input: A Matrix or Dataset object

  • dependent: Name of dependent variable. Could be an array, if you want to

    do an Multivariate Regression Analysis. If nil, set to all
    fields on input, except criteria
    

102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
# File 'lib/statsample/dominanceanalysis.rb', line 102

def initialize(input, dependent, opts=Hash.new)
  @build_from_dataset=false
  if dependent.is_a? Array
    @regression_class= MULTIVARIATE_REGRESSION_CLASS
    @method_association=:r2yx
  else
    @regression_class= UNIVARIATE_REGRESSION_CLASS
    @method_association=:r2
  end
  
  @name=nil
  opts.each{|k,v|
    self.send("#{k}=",v) if self.respond_to? k
  }
  @dependent=dependent
  @dependent=[@dependent] unless @dependent.is_a? Array

  if input.kind_of? Daru::DataFrame
    @predictors ||= input.vectors.to_a - @dependent
    @ds=input
    @matrix=Statsample::Bivariate.correlation_matrix(input)
    @cases=Statsample::Bivariate.min_n_valid(input)
  elsif input.is_a? ::Matrix
    @predictors ||= input.fields-@dependent
    @ds=nil
    @matrix=input
  else
    raise ArgumentError.new("You should use a Matrix or a Dataset")
  end

  @name=_("Dominance Analysis:  %s over %s") % [ @predictors.flatten.join(",") , @dependent.join(",")] if @name.nil?
  @models=nil
  @models_data=nil
  @general_averages=nil
end

Instance Attribute Details

#build_from_datasetObject

Set to true if you want to build from dataset, not correlation matrix


65
66
67
# File 'lib/statsample/dominanceanalysis.rb', line 65

def build_from_dataset
  @build_from_dataset
end

#casesObject

If you provide a matrix as input, you should set the number of cases to define significance of R^2


71
72
73
# File 'lib/statsample/dominanceanalysis.rb', line 71

def cases
  @cases
end

#dependentObject (readonly)

Returns the value of attribute dependent


83
84
85
# File 'lib/statsample/dominanceanalysis.rb', line 83

def dependent
  @dependent
end

#method_associationObject

Method of :regression_class used to measure association.

Only necessary to change if you have multivariate dependent.

  • :r2yx (R^2_yx), the default option, is the option when distinction between independent and dependents variable is arbitrary

  • :p2yx is the option when the distinction between independent and dependents variables is real.


80
81
82
# File 'lib/statsample/dominanceanalysis.rb', line 80

def method_association
  @method_association
end

#nameObject

Name of analysis


63
64
65
# File 'lib/statsample/dominanceanalysis.rb', line 63

def name
  @name
end

#predictorsObject

Array with independent variables. You could create subarrays,

to test groups of predictors as blocks

68
69
70
# File 'lib/statsample/dominanceanalysis.rb', line 68

def predictors
  @predictors
end

#regression_classObject

Class to generate the regressions. Default to Statsample::Regression::Multiple::MatrixEngine


61
62
63
# File 'lib/statsample/dominanceanalysis.rb', line 61

def regression_class
  @regression_class
end

Class Method Details

.predictor_name(variable) ⇒ Object


88
89
90
91
92
93
94
# File 'lib/statsample/dominanceanalysis.rb', line 88

def self.predictor_name(variable)
  if variable.is_a? Array
    sprintf("(%s)", variable.join(","))
  else
    variable
  end
end

Instance Method Details

#average_k(k) ⇒ Object

Hash with average for each k size model.


285
286
287
288
289
290
291
292
293
294
295
# File 'lib/statsample/dominanceanalysis.rb', line 285

def average_k(k)
  return nil if k==@predictors.size
  models=md_k(k)
  averages=@predictors.inject({}) {|a,v| a[v]=[];a}
  models.each do |m|
    @predictors.each do |f|
      averages[f].push(m.contributions[f]) unless m.contributions[f].nil?
    end
  end
  get_averages(averages)
end

#computeObject

Compute models.


138
139
140
141
# File 'lib/statsample/dominanceanalysis.rb', line 138

def compute
  create_models
  fill_models
end

#conditional_dominanceObject


255
256
257
258
259
# File 'lib/statsample/dominanceanalysis.rb', line 255

def conditional_dominance
  pairs.inject({}){|a,pair| a[pair]=conditional_dominance_pairwise(pair[0], pair[1])
  a
  }
end

#conditional_dominance_pairwise(i, j) ⇒ Object

Returns 1 if i cD k, 0 if j cD i and 0.5 if undetermined


218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
# File 'lib/statsample/dominanceanalysis.rb', line 218

def conditional_dominance_pairwise(i,j)
  dm=dominance_for_nil_model(i,j)
  return 0.5 if dm==0.5
  dominances=[dm]
  for k in 1...@predictors.size
    a=average_k(k)
    if a[i]>a[j]
        dominances.push(1)
    elsif a[i]<a[j]
        dominances.push(0)
    else
      return 0.5
        #dominances.push(0.5)
    end                 
  end
  final=dominances.uniq
  final.size>1 ? 0.5 : final[0]            
end

#dominance_for_nil_model(i, j) ⇒ Object


187
188
189
190
191
192
193
194
195
# File 'lib/statsample/dominanceanalysis.rb', line 187

def dominance_for_nil_model(i,j)
  if md([i]).r2>md([j]).r2
    1
  elsif md([i]).r2<md([j]).r2
    0
  else
    0.5
  end           
end

#general_averagesObject


296
297
298
299
300
301
302
303
304
305
306
307
308
# File 'lib/statsample/dominanceanalysis.rb', line 296

def general_averages
  if @general_averages.nil?
    averages=@predictors.inject({}) {|a,v| a[v]=[md([v]).r2];a}
    for k in 1...@predictors.size
      ak=average_k(k)
      @predictors.each do |f|
        averages[f].push(ak[f])
      end
    end
    @general_averages=get_averages(averages)
  end
  @general_averages
end

#general_dominanceObject


260
261
262
263
264
# File 'lib/statsample/dominanceanalysis.rb', line 260

def general_dominance
  pairs.inject({}){|a,pair| a[pair]=general_dominance_pairwise(pair[0], pair[1])
  a
  }
end

#general_dominance_pairwise(i, j) ⇒ Object

Returns 1 if i gD k, 0 if j gD i and 0.5 if undetermined


237
238
239
240
241
242
243
244
245
246
# File 'lib/statsample/dominanceanalysis.rb', line 237

def general_dominance_pairwise(i,j)
  ga=general_averages
  if ga[i]>ga[j]
    1
  elsif ga[i]<ga[j]
    0
  else
    0.5
  end                 
end

#get_averages(averages) ⇒ Object

For a hash with arrays of numbers as values Returns a hash with same keys and value as the mean of values of original hash


279
280
281
282
283
# File 'lib/statsample/dominanceanalysis.rb', line 279

def get_averages(averages)
  out={}
  averages.each{ |key,val| out[key] = Daru::Vector.new(val).mean }
  out
end

#md(m) ⇒ Object


266
267
268
# File 'lib/statsample/dominanceanalysis.rb', line 266

def md(m)
  models_data[m.sort {|a,b| a.to_s <=> b.to_s}]
end

#md_k(k) ⇒ Object

Get all model of size k


270
271
272
273
274
# File 'lib/statsample/dominanceanalysis.rb', line 270

def md_k(k)
  out=[]
  @models.each{ |m| out.push(md(m)) if m.size==k }
  out
end

#modelsObject


142
143
144
145
146
147
# File 'lib/statsample/dominanceanalysis.rb', line 142

def models
  if @models.nil?
    compute
  end
  @models
end

#models_dataObject


149
150
151
152
153
154
# File 'lib/statsample/dominanceanalysis.rb', line 149

def models_data
  if @models_data.nil?
    compute
  end
  @models_data
end

#pairsObject


247
248
249
# File 'lib/statsample/dominanceanalysis.rb', line 247

def pairs
  models.find_all{|m| m.size==2}
end

#report_building(g) ⇒ Object


311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
# File 'lib/statsample/dominanceanalysis.rb', line 311

def report_building(g)
  compute if @models.nil?
  g.section(:name=>@name) do |generator|
    header=["","r2",_("sign")]+@predictors.collect {|c| DominanceAnalysis.predictor_name(c) }
    
    generator.table(:name=>_("Dominance Analysis result"), :header=>header) do |t|
      row=[_("Model 0"),"",""]+@predictors.collect{|f|
        sprintf("%0.3f",md([f]).r2)
      }
      
      t.row(row)
      t.hr
      for i in 1..@predictors.size
        mk=md_k(i)
        mk.each{|m|
          t.row(m.add_table_row)
        }
        # Report averages
        a=average_k(i)
        if !a.nil?
            t.hr
            row=[_("k=%d Average") % i,"",""] + @predictors.collect{|f|
                sprintf("%0.3f",a[f])
            }
            t.row(row)
            t.hr
            
        end
      end
      
      g=general_averages
      t.hr
      
      row=[_("Overall averages"),"",""]+@predictors.collect{|f|
                sprintf("%0.3f",g[f])
      }
      t.row(row)
    end
    
    td=total_dominance
    cd=conditional_dominance
    gd=general_dominance
    generator.table(:name=>_("Pairwise dominance"), :header=>[_("Pairs"),_("Total"),_("Conditional"),_("General")]) do |t|
      pairs.each{|pair|
        name=pair.map{|v| v.is_a?(Array) ? "("+v.join("-")+")" : v}.join(" - ")
        row=[name, sprintf("%0.1f",td[pair]), sprintf("%0.1f",cd[pair]), sprintf("%0.1f",gd[pair])]
        t.row(row)
      }
    end
  end
end

#total_dominanceObject


250
251
252
253
254
# File 'lib/statsample/dominanceanalysis.rb', line 250

def total_dominance
  pairs.inject({}){|a,pair| a[pair]=total_dominance_pairwise(pair[0], pair[1])
  a
  }
end

#total_dominance_pairwise(i, j) ⇒ Object

Returns 1 if i D k, 0 if j dominates i and 0.5 if undetermined


197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
# File 'lib/statsample/dominanceanalysis.rb', line 197

def total_dominance_pairwise(i,j)
  dm=dominance_for_nil_model(i,j)
  return 0.5 if dm==0.5
  dominances=[dm]
  models_data.each do |k,m|
    if !m.contributions[i].nil? and !m.contributions[j].nil?
      if m.contributions[i]>m.contributions[j]
          dominances.push(1)
      elsif m.contributions[i]<m.contributions[j]
          dominances.push(0)
      else
        return 0.5
          #dominances.push(0.5)
      end
    end
  end
  final=dominances.uniq
  final.size>1 ? 0.5 : final[0]
end