Module: DaruLite::DataFrame::Aggregatable

Included in:
DaruLite::DataFrame
Defined in:
lib/daru_lite/data_frame/aggregatable.rb

Instance Method Summary collapse

Instance Method Details

#aggregate(options = {}, multi_index_level = -1)) ⇒ DaruLite::DataFrame

Function to use for aggregating the data.

Note: ‘GroupBy` class `aggregate` method uses this `aggregate` method internally.

Examples:

df = DaruLite::DataFrame.new(
   {col: [:a, :b, :c, :d, :e], num: [52,12,07,17,01]})
=> #<DaruLite::DataFrame(5x2)>
     col num
   0   a  52
   1   b  12
   2   c   7
   3   d  17
   4   e   1

 df.aggregate(num_100_times: ->(df) { (df.num*100).first })
=> #<DaruLite::DataFrame(5x1)>
            num_100_ti
          0       5200
          1       1200
          2        700
          3       1700
          4        100

When we have duplicate index :

idx = DaruLite::CategoricalIndex.new [:a, :b, :a, :a, :c]
df = DaruLite::DataFrame.new({num: [52,12,07,17,01]}, index: idx)
=> #<DaruLite::DataFrame(5x1)>
     num
   a  52
   b  12
   a   7
   a  17
   c   1

df.aggregate(num: :mean)
=> #<DaruLite::DataFrame(3x1)>
                   num
          a 25.3333333
          b         12
          c          1

Parameters:

  • options (Hash) (defaults to: {})

    options for column, you want in resultant dataframe

Returns:



85
86
87
88
89
90
91
92
93
94
95
# File 'lib/daru_lite/data_frame/aggregatable.rb', line 85

def aggregate(options = {}, multi_index_level = -1)
  if block_given?
    positions_tuples, new_index = yield(@index) # NOTE: use of yield is private for now
  else
    positions_tuples, new_index = group_index_for_aggregation(@index, multi_index_level)
  end

  colmn_value = aggregate_by_positions_tuples(options, positions_tuples)

  DaruLite::DataFrame.new(colmn_value, index: new_index, order: options.keys)
end

#group_by(*vectors) ⇒ Object

Group elements by vector to perform operations on them. Returns a DaruLite::Core::GroupBy object.See the DaruLite::Core::GroupBy docs for a detailed list of possible operations.

Arguments

  • vectors - An Array contatining names of vectors to group by.

Usage

df = DaruLite::DataFrame.new({
  a: %w{foo bar foo bar   foo bar foo foo},
  b: %w{one one two three two two one three},
  c:   [1  ,2  ,3  ,1    ,3  ,6  ,3  ,8],
  d:   [11 ,22 ,33 ,44   ,55 ,66 ,77 ,88]
})
df.group_by([:a,:b,:c]).groups
#=> {["bar", "one", 2]=>[1],
# ["bar", "three", 1]=>[3],
# ["bar", "two", 6]=>[5],
# ["foo", "one", 1]=>[0],
# ["foo", "one", 3]=>[6],
# ["foo", "three", 8]=>[7],
# ["foo", "two", 3]=>[2, 4]}

Raises:

  • (ArgumentError)


28
29
30
31
32
33
34
35
36
# File 'lib/daru_lite/data_frame/aggregatable.rb', line 28

def group_by(*vectors)
  vectors.flatten!
  missing = vectors - @vectors.to_a
  raise(ArgumentError, "Vector(s) missing: #{missing.join(', ')}") unless missing.empty?

  vectors = [@vectors.first] if vectors.empty?

  DaruLite::Core::GroupBy.new(self, vectors)
end

#group_by_and_aggregate(*group_by_keys, **aggregation_map) ⇒ Object



97
98
99
# File 'lib/daru_lite/data_frame/aggregatable.rb', line 97

def group_by_and_aggregate(*group_by_keys, **aggregation_map)
  group_by(*group_by_keys).aggregate(aggregation_map)
end