Class: Remi::Transform::DataFrameSieve
- Inherits:
-
Remi::Transform
- Object
- Remi::Transform
- Remi::Transform::DataFrameSieve
- Defined in:
- lib/remi/transform.rb
Overview
Public: Applies a DataFrame grouping sieve.
The DataFrame sieve can be used to simplify very complex nested if-then logic to group data into buckets. Given a DataFrame with N columns, the first N-1 columns represent the variables needed to group data into buckets. The last column is the desired group. The sieve then progresses down the rows of the DataFrame and checks to see if the input data matches the values in the columns of the sieve. Nils in the sieve are treated as wildcards and match anything. The first row that matches wins and the sieve progression stops.
sieve_df - The sieve, defined as a dataframe. The names of the sieve vectors must correspond to the names of the vectors in the dataframe source to target map. The last vector in the sieve_df is used as the result of the sieve.
Examples:
# This sieve captures the following business logic # 1 - All Non-Graduate Nursing, regardless of contact, gets assigned to the :intensive group. # 2 - All Undergraduate programs with contact get assigned to the :intensive group. # 3 - All Undergraduate programs without a contact get assigned to the :base group. # 4 - All Graduate engineering programs with a contact get assigned to the :intensive group. # 5 - All other programs get assigned to the :base group sieve_df = Daru::DataFrame.new([ [ 'Undergrad' , 'NURS' , nil , :intensive ], [ 'Undergrad' , nil , true , :intensive ], [ 'Undergrad' , nil , false , :base ], [ 'Grad' , 'ENG' , true , :intensive ], [ nil , nil , nil , :base ], ].transpose, order: [:level, :program, :contact, :group] )
test_df = Daru::DataFrame.new([ ['Undergrad' , 'CHEM' , false], ['Undergrad' , 'CHEM' , true], ['Grad' , 'CHEM' , true], ['Undergrad' , 'NURS' , false], ['Unknown' , 'CHEM' , true], ].transpose, order: [:level, :program, :contact] )
Remi::SourceToTargetMap.apply(test_df) do map source(:level, :program, :contact,) .target(:group) .transform(Remi::Transform::DataFrameSieve.new(sieve_df)) end
test_df
# => #
Instance Attribute Summary
Attributes inherited from Remi::Transform
#multi_args, #source_metadata, #target_metadata
Instance Method Summary collapse
-
#initialize(sieve_df, *args, **kargs, &block) ⇒ DataFrameSieve
constructor
A new instance of DataFrameSieve.
- #transform(row) ⇒ Object
Methods inherited from Remi::Transform
Constructor Details
#initialize(sieve_df, *args, **kargs, &block) ⇒ DataFrameSieve
Returns a new instance of DataFrameSieve.
674 675 676 677 |
# File 'lib/remi/transform.rb', line 674 def initialize(sieve_df, *args, **kargs, &block) super @sieve_table = sieve_df.transpose.to_h.values end |
Instance Method Details
#transform(row) ⇒ Object
680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 |
# File 'lib/remi/transform.rb', line 680 def transform(row) sieve_keys = @sieve_table.first.index.to_a sieve_result_key = sieve_keys.pop raise ArgumentError, "#{sieve_keys - row.source_keys} not found in row" unless (sieve_keys - row.source_keys).size == 0 @sieve_table.each.find do |sieve_row| match_row = true sieve_keys.each do |sieve_key| match_value = if sieve_row[sieve_key].is_a?(Regexp) !!sieve_row[sieve_key].match(row[sieve_key]) else sieve_row[sieve_key] == row[sieve_key] end match_row &&= sieve_row[sieve_key].nil? || match_value end match_row end[sieve_result_key] end |