Class: Statsample::FormulaWrapper

Inherits:

Object

Object
Statsample::FormulaWrapper

show all

Defined in:: lib/statsample/formula/formula.rb

Overview

This class recognizes what terms are numeric and accordingly forms groups which are fed to Formula Once they are parsed with Formula, they are combined back

Instance Attribute Summary collapse

#canonical_tokens ⇒ Object readonly
Returns the value of attribute canonical_tokens.
#tokens ⇒ Object readonly
Returns the value of attribute tokens.
#y ⇒ Object readonly
Returns the value of attribute y.

Instance Method Summary collapse

#canonical_to_s ⇒ String
Returns canonical tokens in a readable form.
#initialize(formula, df) ⇒ FormulaWrapper constructor
Initializes formula wrapper object to parse a given formula into some tokens which do not overlap one another.
#non_redundant_tokens ⇒ Array
Returns tokens to produce non-redundant design matrix.

Constructor Details

#initialize(formula, df) ⇒ `FormulaWrapper`

Note:

Specify 0 as a term in the formula if you do not want constant to be included in the parsed formula

Initializes formula wrapper object to parse a given formula into some tokens which do not overlap one another.

Examples:

df = Daru::DataFrame.from_csv 'spec/data/df.csv'
df.to_category 'c', 'd', 'e'
formula = Statsample::GLM::FormulaWrapper.new 'y~a+d:c', df
formula.canonical_to_s
#=> "1+c(-)+d(-):c+a"

Parameters:

formula (string) —
to parse
df (Daru::DataFrame) —
dataframe requried to know what vectors are numerical

# File 'lib/statsample/formula/formula.rb', line 21

def initialize(formula, df)
  @df = df
  # @y store the LHS term that is name of vector to be predicted
  # @tokens store the RHS terms of the formula
  @y, *@tokens = split_to_tokens(formula)
  @tokens = @tokens.uniq.sort
  manage_constant_term
  @canonical_tokens = non_redundant_tokens
end

Instance Attribute Details

#canonical_tokens ⇒ `Object` (readonly)

Returns the value of attribute canonical_tokens.



6
7
8

# File 'lib/statsample/formula/formula.rb', line 6

def canonical_tokens
  @canonical_tokens
end

#tokens ⇒ `Object` (readonly)

Returns the value of attribute tokens.



6
7
8

# File 'lib/statsample/formula/formula.rb', line 6

def tokens
  @tokens
end

#y ⇒ `Object` (readonly)

Returns the value of attribute y.



6
7
8

# File 'lib/statsample/formula/formula.rb', line 6

def y
  @y
end

Instance Method Details

#canonical_to_s ⇒ `String`

Note:

'y~a+b(-)' means 'a' exist in full rank expansion and 'b(-)' exist in reduced rank expansion

Returns canonical tokens in a readable form.

Examples:

df = Daru::DataFrame.from_csv 'spec/data/df.csv'
df.to_category 'c', 'd', 'e'
formula = Statsample::GLM::FormulaWrapper.new 'y~a+d:c', df
formula.canonical_to_s
#=> "1+c(-)+d(-):c+a"

Returns:

(String) —
canonical tokens in a readable form.



41
42
43

# File 'lib/statsample/formula/formula.rb', line 41

def canonical_to_s
  canonical_tokens.join '+'
end

#non_redundant_tokens ⇒ `Array`

Returns tokens to produce non-redundant design matrix

Returns:

(Array) —
array of tokens that do not produce redundant matrix

# File 'lib/statsample/formula/formula.rb', line 47

def non_redundant_tokens
  groups = split_to_groups
  # TODO: An enhancement
  # Right now x:c appears as c:x
  groups.each { |k, v| groups[k] = strip_numeric v, k }
  groups.each { |k, v| groups[k] = Formula.new(v).canonical_tokens }
  groups.flat_map { |k, v| add_numeric v, k }
end

Class: Statsample::FormulaWrapper

Overview

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(formula, df) ⇒ FormulaWrapper

Examples:

Instance Attribute Details

#canonical_tokens ⇒ Object (readonly)

#tokens ⇒ Object (readonly)

#y ⇒ Object (readonly)

Instance Method Details

#canonical_to_s ⇒ String

Examples:

#non_redundant_tokens ⇒ Array

#initialize(formula, df) ⇒ `FormulaWrapper`

#canonical_tokens ⇒ `Object` (readonly)

#tokens ⇒ `Object` (readonly)

#y ⇒ `Object` (readonly)

#canonical_to_s ⇒ `String`

#non_redundant_tokens ⇒ `Array`