Class: RailsDataExplorer
- Inherits:
-
Object
- Object
- RailsDataExplorer
- Includes:
- ActionView::Helpers::TagHelper
- Defined in:
- lib/rails-data-explorer/data_set.rb,
lib/rails_data_explorer.rb,
lib/rails-data-explorer/chart.rb,
lib/rails-data-explorer/engine.rb,
lib/rails-data-explorer/constants.rb,
lib/rails-data-explorer/data_type.rb,
lib/rails-data-explorer/data_series.rb,
lib/rails-data-explorer/exploration.rb,
lib/rails-data-explorer/chart/box_plot.rb,
lib/rails-data-explorer/chart/pie_chart.rb,
lib/rails-data-explorer/utils/rde_table.rb,
lib/rails-data-explorer/chart/scatterplot.rb,
lib/rails-data-explorer/utils/color_scale.rb,
lib/rails-data-explorer/utils/data_binner.rb,
lib/rails-data-explorer/chart/parallel_set.rb,
lib/rails-data-explorer/chart/box_plot_group.rb,
lib/rails-data-explorer/utils/data_quantizer.rb,
lib/rails-data-explorer/action_view_extension.rb,
lib/rails-data-explorer/data_type/categorical.rb,
lib/rails-data-explorer/utils/value_formatter.rb,
lib/rails-data-explorer/data_type/quantitative.rb,
lib/rails-data-explorer/active_record_extension.rb,
lib/rails-data-explorer/chart/contingency_table.rb,
lib/rails-data-explorer/statistics/rng_category.rb,
lib/rails-data-explorer/statistics/rng_gaussian.rb,
lib/rails-data-explorer/chart/histogram_temporal.rb,
lib/rails-data-explorer/statistics/rng_power_law.rb,
lib/rails-data-explorer/chart/parallel_coordinates.rb,
lib/rails-data-explorer/chart/histogram_categorical.rb,
lib/rails-data-explorer/chart/histogram_quantitative.rb,
lib/rails-data-explorer/data_type/quantitative/decimal.rb,
lib/rails-data-explorer/data_type/quantitative/integer.rb,
lib/rails-data-explorer/data_type/quantitative/temporal.rb,
lib/rails-data-explorer/chart/descriptive_statistics_table.rb,
lib/rails-data-explorer/chart/stacked_bar_chart_categorical_percent.rb,
lib/rails-data-explorer/statistics/pearsons_chi_squared_independence_test.rb
Overview
From en.wikipedia.org/wiki/Pearson’s_chi-squared_test
Pearson’s chi-squared test is used to assess whether paired observations on two variables, expressed in a contingency table, are independent of each other.
An “observation” consists of the values of two outcomes and the null hypothesis is that the occurrence of these outcomes is statistically independent. Each observation is allocated to one cell of a two-dimensional array of cells (called a contingency table) according to the values of the two outcomes.
Assumptions
The chi-squared test, when used with the standard approximation that a chi- squared distribution is applicable, has the following assumptions:
-
Simple random sample – The sample data is a random sampling from a fixed distribution or population where every collection of members of the population of the given sample size has an equal probability of selection. Variants of the test have been developed for complex samples, such as where the data is weighted. Other forms can be used such as purposive sampling.
-
Sample size (whole table) – A sample with a sufficiently large size is assumed. If a chi squared test is conducted on a sample with a smaller size, then the chi squared test will yield an inaccurate inference. The researcher, by using chi squared test on small samples, might end up committing a Type II error.
-
Expected cell count – Adequate expected cell counts. Some require 5 or more, and others require 10 or more. A common rule is 5 or more in all cells of a 2-by-2 table, and 5 or more in 80% of cells in larger tables, but no cells with zero expected count. When this assumption is not met, Yates’s Correction is applied.
-
Independence – The observations are always assumed to be independent of each other. This means chi-squared cannot be used to test correlated data (like matched pairs or panel data). In those cases you might want to turn to McNemar’s test.
Problems
The approximation to the chi-squared distribution breaks down if expected frequencies are too low. It will normally be acceptable so long as no more than 20% of the events have expected frequencies below 5. Where there is only 1 degree of freedom, the approximation is not reliable if expected frequencies are below 10. In this case, a better approximation can be obtained by reducing the absolute value of each difference between observed and expected frequencies by 0.5 before squaring; this is called Yates’s correction for continuity.
In cases where the expected value, E, is found to be small (indicating a small underlying population probability, and/or a small number of observations), the normal approximation of the multinomial distribution can fail, and in such cases it is found to be more appropriate to use the G-test, a likelihood ratio-based test statistic. Where the total sample size is small, it is necessary to use an appropriate exact test, typically either the binomial test or (for contingency tables) Fisher’s exact test. This test uses the conditional distribution of the test statistic given the marginal totals; however, it does not assume that the data were generated from an experiment in which the marginal totals are fixed and is valid whether or not that is the case.
Defined Under Namespace
Modules: ActionViewExtension, ActiveRecordExtension, Statistics, Utils Classes: Chart, DataSeries, DataSet, DataType, Engine, Exploration
Constant Summary collapse
- GREATER_ZERO =
The smallest value to use if we have to avoid zero (div by zero)
1.0 / 1_000_000
Instance Attribute Summary collapse
-
#explorations ⇒ Object
readonly
Returns the value of attribute explorations.
-
#output_buffer ⇒ Object
required for content_tag.
Instance Method Summary collapse
-
#initialize(data_collection, data_series_specs) ⇒ RailsDataExplorer
constructor
A new instance of RailsDataExplorer.
- #number_of_values ⇒ Object
- #render ⇒ Object
Constructor Details
#initialize(data_collection, data_series_specs) ⇒ RailsDataExplorer
Returns a new instance of RailsDataExplorer.
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
# File 'lib/rails_data_explorer.rb', line 8 def initialize(data_collection, data_series_specs) @explorations = [] univariate = [] bivariate = {} multivariate = {} data_series_specs.each do |data_series_spec| ds_spec = { univariate: true, bivariate: true, }.merge(data_series_spec) univariate << ds_spec.dup if ds_spec[:univariate] if ds_spec[:bivariate] [*ds_spec[:bivariate]].each { |group_key| group_key = group_key.to_s bivariate[group_key] ||= [] bivariate[group_key] << ds_spec.dup } end if ds_spec[:multivariate] [*ds_spec[:multivariate]].each { |group_key| group_key = group_key.to_s multivariate[group_key] ||= [] multivariate[group_key] << ds_spec.dup } end end univariate.uniq.compact.each { |data_series_spec| @explorations << Exploration.new( data_series_spec[:name], data_collection.map(&data_series_spec[:data_method]), ) } bivariate.each { |group_key, bv_data_series_specs| next unless group_key # skip if key is falsey bv_data_series_specs.uniq.compact.combination(2) { |ds_specs_pair| @explorations << build_exploration_from_data_series_specs( data_collection, ds_specs_pair ) } } multivariate.each { |group_key, mv_data_series_specs| next unless group_key # skip key `false` or `nil` ds_specs = mv_data_series_specs.uniq.compact @explorations << build_exploration_from_data_series_specs( data_collection, ds_specs ) } end |
Instance Attribute Details
#explorations ⇒ Object (readonly)
Returns the value of attribute explorations.
6 7 8 |
# File 'lib/rails_data_explorer.rb', line 6 def explorations @explorations end |
#output_buffer ⇒ Object
required for content_tag
3 4 5 |
# File 'lib/rails_data_explorer.rb', line 3 def output_buffer @output_buffer end |
Instance Method Details
#number_of_values ⇒ Object
73 74 75 |
# File 'lib/rails_data_explorer.rb', line 73 def number_of_values explorations.first.number_of_values end |
#render ⇒ Object
65 66 67 68 69 70 71 |
# File 'lib/rails_data_explorer.rb', line 65 def render expls = separate_explorations_with_and_without_charts r = render_toc(expls[:with]) r << render_charts(expls[:with]) r << render_explorations_without_charts(expls[:without]) r end |