Module: DaruLite::DataFrame::Pivotable

Included in:
DaruLite::DataFrame
Defined in:
lib/daru_lite/data_frame/pivotable.rb

Instance Method Summary collapse

Instance Method Details

#pivot_table(opts = {}) ⇒ Object

Pivots a data frame on specified vectors and applies an aggregate function to quickly generate a summary.

Options

:index - Keys to group by on the pivot table row index. Pass vector names contained in an Array.

:vectors - Keys to group by on the pivot table column index. Pass vector names contained in an Array.

:agg - Function to aggregate the grouped values. Default to :mean. Can use any of the statistics functions applicable on Vectors that can be found in the DaruLite::Statistics::Vector module.

:values - Columns to aggregate. Will consider all numeric columns not specified in :index or :vectors. Optional.

Usage

df = DaruLite::DataFrame.new({
  a: ['foo'  ,  'foo',  'foo',  'foo',  'foo',  'bar',  'bar',  'bar',  'bar'],
  b: ['one'  ,  'one',  'one',  'two',  'two',  'one',  'one',  'two',  'two'],
  c: ['small','large','large','small','small','large','small','large','small'],
  d: [1,2,2,3,3,4,5,6,7],
  e: [2,4,4,6,6,8,10,12,14]
})
df.pivot_table(index: [:a], vectors: [:b], agg: :sum, values: :e)

#=>
# #<DaruLite::DataFrame:88342020 @name = 08cdaf4e-b154-4186-9084-e76dd191b2c9 @size = 2>
#            [:e, :one] [:e, :two]
#     [:bar]         18         26
#     [:foo]         10         12

Raises:

  • (ArgumentError)


38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# File 'lib/daru_lite/data_frame/pivotable.rb', line 38

def pivot_table(opts = {})
  raise ArgumentError, 'Specify grouping index' if Array(opts[:index]).empty?

  index               = opts[:index]
  vectors             = opts[:vectors] || []
  aggregate_function  = opts[:agg] || :mean
  values              = prepare_pivot_values index, vectors, opts
  raise IndexError, 'No numeric vectors to aggregate' if values.empty?

  grouped = group_by(index)
  return grouped.send(aggregate_function) if vectors.empty?

  super_hash = make_pivot_hash grouped, vectors, values, aggregate_function

  pivot_dataframe super_hash
end