Class: Remi::Transform::DataFrameSieve
- Inherits:
-
Remi::Transform
- Object
- Remi::Transform
- Remi::Transform::DataFrameSieve
- Defined in:
- lib/remi/transform.rb
Overview
Public: Applies a DataFrame grouping sieve.
The DataFrame sieve can be used to simplify very complex nested if-then logic to group data into buckets. Given a DataFrame with N columns, the first N-1 columns represent the variables needed to group data into buckets. The last column is the desired group. The sieve then progresses down the rows of the DataFrame and checks to see if the input data matches the values in the columns of the sieve. Nils in the sieve are treated as wildcards and match anything. The first row that matches wins and the sieve progression stops.
sieve_df - The sieve, defined as a dataframe. The arguments
to the transform must appear in the same order as the
first N-1 columns of the sieve.
Examples:
# This sieve captures the following business logic
# 1 - All Non-Graduate Nursing, regardless of contact, gets assigned to the :intensive group.
# 2 - All Undergraduate programs with contact get assigned to the :intensive group.
# 3 - All Undergraduate programs without a contact get assigned to the :base group.
# 4 - All Graduate engineering programs with a contact get assigned to the :intensive group.
# 5 - All other programs get assigned to the :base group
sieve_df = Daru::DataFrame.new([
[ 'Undergrad' , 'NURS' , nil , :intensive ],
[ 'Undergrad' , nil , true , :intensive ],
[ 'Undergrad' , nil , false , :base ],
[ 'Grad' , 'ENG' , true , :intensive ],
[ nil , nil , nil , :base ],
].transpose,
order: [:level, :program, :contact, :group]
)
test_df = Daru::DataFrame.new([
['Undergrad' , 'CHEM' , false],
['Undergrad' , 'CHEM' , true],
['Grad' , 'CHEM' , true],
['Undergrad' , 'NURS' , false],
['Unknown' , 'CHEM' , true],
].transpose,
order: [:level, :program, :contact]
)
Remi::SourceToTargetMap.apply(test_df) do
map source(:level, :program, :contact,) .target(:group)
.transform(Remi::Transform::DataFrameSieve.new(sieve_df))
end
test_df
# => #<Daru::DataFrame:70099624408400 @name = d30888fd-6ca8-48dd-9be3-558f81ae1015 @size = 5>
level program contact group
0 Undergrad CHEM nil base
1 Undergrad CHEM true intensive
2 Grad CHEM true base
3 Undergrad NURS nil intensive
4 Unknown CHEM true base
Instance Attribute Summary
Attributes inherited from Remi::Transform
#multi_arg, #source_metadata, #target_metadata
Instance Method Summary collapse
-
#initialize(sieve_df, *args, **kargs, &block) ⇒ DataFrameSieve
constructor
A new instance of DataFrameSieve.
- #transform(*values) ⇒ Object
Methods inherited from Remi::Transform
Constructor Details
#initialize(sieve_df, *args, **kargs, &block) ⇒ DataFrameSieve
Returns a new instance of DataFrameSieve.
613 614 615 616 |
# File 'lib/remi/transform.rb', line 613 def initialize(sieve_df, *args, **kargs, &block) super @sieve_df = sieve_df.transpose.to_h.values end |
Instance Method Details
#transform(*values) ⇒ Object
618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 |
# File 'lib/remi/transform.rb', line 618 def transform(*values) sieve_keys = @sieve_df.first.index.to_a sieve_result_key = sieve_keys.pop @sieve_df.each.find do |sieve_row| match_row = true sieve_keys.each_with_index do |key,idx| match_value = if sieve_row[key].is_a?(Regexp) !!sieve_row[key].match(values[idx]) else sieve_row[key] == values[idx] end match_row &&= sieve_row[key].nil? || match_value end match_row end[sieve_result_key] end |