Class: Statsample::Factor::ParallelAnalysis

Inherits:
Object
  • Object
show all
Includes:
DirtyMemoize, Summarizable
Defined in:
lib/statsample/factor/parallelanalysis.rb

Overview

Performs Horn's 'parallel analysis' to a principal components analysis to adjust for sample bias in the retention of components. Can create the bootstrap samples using random data, using number of cases and variables, parameters for actual data (mean and standard deviation of each variable) or bootstrap sampling for actual data.

Description

“PA involves the construction of a number of correlation matrices of random variables based on the same sample size and number of variables in the real data set. The average eigenvalues from the random correlation matrices are then compared to the eigenvalues from the real data correlation matrix, such that the first observed eigenvalue is compared to the first random eigenvalue, the second observed eigenvalue is compared to the second random eigenvalue, and so on.” (Hayton, Allen & Scarpello, 2004, p.194)

Usage

*With real dataset*

# ds should be any valid dataset
pa=Statsample::Factor::ParallelAnalysis.new(ds, :iterations=>100, :bootstrap_method=>:data)

*With number of cases and variables*

pa=Statsample::Factor::ParallelAnalysis.with_random_data(100,8)

Reference

  • Hayton, J., Allen, D. & Scarpello, V.(2004). Factor Retention Decisions in Exploratory Factor Analysis: a Tutorial on Parallel Analysis. Organizational Research Methods, 7 (2), 191-205.

  • O'Connor, B. (2000). SPSS and SAS programs for determining the number of components using parallel analysis and Velicer's MAP test. Behavior Research Methods, Instruments, & Computers, 32(3), 396-402.

  • Liu, O., & Rijmen, F. (2008). A modified procedure for parallel analysis of ordered categorical data. Behavior Research Methods, 40(2), 556-562.

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Summarizable

#summary

Constructor Details

#initialize(ds, opts = Hash.new) ⇒ ParallelAnalysis

Returns a new instance of ParallelAnalysis


62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
# File 'lib/statsample/factor/parallelanalysis.rb', line 62

def initialize(ds, opts=Hash.new)
  @ds=ds
  @fields=@ds.vectors.to_a
  @n_variables=@fields.size
  @n_cases=ds.nrows
  opts_default={
    :name=>_("Parallel Analysis"),
    :iterations=>50, # See Liu and Rijmen (2008)
    :bootstrap_method => :random,
    :smc=>false,
    :percentil=>95, 
    :debug=>false,
    :no_data=>false,
    :matrix_method=>:correlation_matrix
  }
  @use_gsl=Statsample.has_gsl?
  @opts=opts_default.merge(opts)
  @opts[:matrix_method]==:correlation_matrix if @opts[:bootstrap_method]==:parameters
  opts_default.keys.each {|k| send("#{k}=", @opts[k]) }
end

Instance Attribute Details

#bootstrap_methodObject

Bootstrap method. :random used by default

  • :random: uses number of variables and cases for the dataset

  • :data : sample with replacement from actual data.


43
44
45
# File 'lib/statsample/factor/parallelanalysis.rb', line 43

def bootstrap_method
  @bootstrap_method
end

#debugObject

Show extra information if true


60
61
62
# File 'lib/statsample/factor/parallelanalysis.rb', line 60

def debug
  @debug
end

#dsObject (readonly)

Dataset. You could use mock vectors when use bootstrap method


39
40
41
# File 'lib/statsample/factor/parallelanalysis.rb', line 39

def ds
  @ds
end

#ds_eigenvaluesObject (readonly)

Dataset with bootstrapped eigenvalues


56
57
58
# File 'lib/statsample/factor/parallelanalysis.rb', line 56

def ds_eigenvalues
  @ds_eigenvalues
end

#iterationsObject

Number of random sets to produce. 50 by default


35
36
37
# File 'lib/statsample/factor/parallelanalysis.rb', line 35

def iterations
  @iterations
end

#matrix_methodObject

Correlation matrix used with :raw_data . :correlation_matrix used by default


51
52
53
# File 'lib/statsample/factor/parallelanalysis.rb', line 51

def matrix_method
  @matrix_method
end

#n_variablesObject

Number of eigenvalues to calculate. Should be set for Principal Axis Analysis.


54
55
56
# File 'lib/statsample/factor/parallelanalysis.rb', line 54

def n_variables
  @n_variables
end

#nameObject

Name of analysis


37
38
39
# File 'lib/statsample/factor/parallelanalysis.rb', line 37

def name
  @name
end

#no_dataObject

Perform analysis without actual data.


58
59
60
# File 'lib/statsample/factor/parallelanalysis.rb', line 58

def no_data
  @no_data
end

#percentilObject

Percentil over bootstrap eigenvalue should be accepted. 95 by default


49
50
51
# File 'lib/statsample/factor/parallelanalysis.rb', line 49

def percentil
  @percentil
end

#smcObject

Uses smc on diagonal of matrixes, to perform simulation of a Principal Axis analysis. By default, false.


47
48
49
# File 'lib/statsample/factor/parallelanalysis.rb', line 47

def smc
  @smc
end

#use_gslObject

Returns the value of attribute use_gsl


61
62
63
# File 'lib/statsample/factor/parallelanalysis.rb', line 61

def use_gsl
  @use_gsl
end

Class Method Details

.with_random_data(cases, vars, opts = Hash.new) ⇒ Object


24
25
26
27
28
29
30
# File 'lib/statsample/factor/parallelanalysis.rb', line 24

def self.with_random_data(cases,vars,opts=Hash.new)
  ds= Daru::DataFrame.new({}, 
    order: vars.times.map {|i| "v#{i+1}".to_sym},
    index: cases )
  opts=opts.merge({:bootstrap_method=> :random, :no_data=>true})
  new(ds, opts)
end

Instance Method Details

#computeObject

Perform calculation. Shouldn't be called directly for the user


122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
# File 'lib/statsample/factor/parallelanalysis.rb', line 122

def compute
  @original=Statsample::Bivariate.send(matrix_method, @ds).eigenvalues unless no_data
  @ds_eigenvalues=Daru::DataFrame.new({}, order: (1..@n_variables).map{|v| ("ev_%05d" % v).to_sym})
  
  if bootstrap_method==:parameter or bootstrap_method==:random
    rng = Distribution::Normal.rng
  end
  
  @iterations.times do |i|
    begin
      puts "#{@name}: Iteration #{i}" if $DEBUG or debug
      # Create a dataset of dummy values
      ds_bootstrap = Daru::DataFrame.new({}, order: @ds.vectors, index: @n_cases)
      
      @fields.each do |f|
        if bootstrap_method==:random
          ds_bootstrap[f] = Daru::Vector.new(@n_cases.times.map {|c| rng.call})
        elsif bootstrap_method==:data
          ds_bootstrap[f] = ds[f].sample_with_replacement(@n_cases)
        else
          raise "bootstrap_method doesn't recogniced"
        end
      end
      ds_bootstrap.update
      
      matrix=Statsample::Bivariate.send(matrix_method, ds_bootstrap)
      matrix=matrix.to_gsl if @use_gsl
      if smc
          smc_v=matrix.inverse.diagonal.map{|ii| 1-(1.quo(ii))}
          smc_v.each_with_index do |v,ii| 
            matrix[ii,ii]=v
          end
      end
      ev=matrix.eigenvalues
      @ds_eigenvalues.add_row(ev)
    rescue Statsample::Bivariate::Tetrachoric::RequerimentNotMeet => e
      puts "Error: #{e}" if $DEBUG
      redo
    end
  end
  @ds_eigenvalues.update
end

#number_of_factorsObject

Number of factor to retent


83
84
85
86
87
88
89
90
91
92
93
# File 'lib/statsample/factor/parallelanalysis.rb', line 83

def number_of_factors
  total=0
  ds_eigenvalues.vectors.to_a.each_with_index do |f,i|
    if (@original[i]>0 and @original[i]>ds_eigenvalues[f].percentil(percentil))
      total+=1
    else
      break
    end
  end
  total
end

#report_building(g) ⇒ Object

:nodoc:


94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
# File 'lib/statsample/factor/parallelanalysis.rb', line 94

def report_building(g) #:nodoc:
  g.section(:name=>@name) do |s|
    s.text _("Bootstrap Method: %s") % bootstrap_method
    s.text _("Uses SMC: %s") % (smc ? _("Yes") : _("No"))
    s.text _("Correlation Matrix type : %s") % matrix_method
    s.text _("Number of variables: %d") % @n_variables
    s.text _("Number of cases: %d") % @n_cases
    s.text _("Number of iterations: %d") % @iterations
    if @no_data
      s.table(:name=>_("Eigenvalues"), :header=>[_("n"), _("generated eigenvalue"), "p.#{percentil}"]) do |t|
        ds_eigenvalues.vectors.to_a.each_with_index do |f,i|
          v=ds_eigenvalues[f]
          t.row [i+1, "%0.4f" %  v.mean, "%0.4f" %  v.percentil(percentil), ]
        end
      end
    else
      s.text _("Number or factors to preserve: %d") % number_of_factors 
      s.table(:name=>_("Eigenvalues"), :header=>[_("n"), _("data eigenvalue"), _("generated eigenvalue"),"p.#{percentil}",_("preserve?")]) do |t|
        ds_eigenvalues.vectors.to_a.each_with_index do |f,i|
          v=ds_eigenvalues[f]
          t.row [i+1, "%0.4f" % @original[i], "%0.4f" %  v.mean, "%0.4f" %  v.percentil(percentil), (v.percentil(percentil)>0 and @original[i] > v.percentil(percentil)) ? "Yes":""]
        end
      end
    end
    
  end
end