Class: Polars::GroupBy
- Inherits:
-
Object
- Object
- Polars::GroupBy
- Defined in:
- lib/polars/group_by.rb
Overview
Starts a new GroupBy operation.
Instance Method Summary collapse
-
#agg(aggs) ⇒ DataFrame
Use multiple aggregations on columns.
-
#count ⇒ DataFrame
Count the number of values in each group.
-
#each ⇒ Object
Allows iteration over the groups of the group by operation.
-
#first ⇒ DataFrame
Aggregate the first values in the group.
-
#head(n = 5) ⇒ DataFrame
Get the first
n
rows of each group. -
#last ⇒ DataFrame
Aggregate the last values in the group.
-
#max ⇒ DataFrame
Reduce the groups to the maximal value.
-
#mean ⇒ DataFrame
Reduce the groups to the mean values.
-
#median ⇒ DataFrame
Return the median per group.
-
#min ⇒ DataFrame
Reduce the groups to the minimal value.
-
#n_unique ⇒ DataFrame
Count the unique values per group.
-
#plot(*args, **options) ⇒ Vega::LiteChart
Plot data.
-
#quantile(quantile, interpolation: "nearest") ⇒ DataFrame
Compute the quantile per group.
-
#sum ⇒ DataFrame
Reduce the groups to the sum.
-
#tail(n = 5) ⇒ DataFrame
Get the last
n
rows of each group.
Instance Method Details
#agg(aggs) ⇒ DataFrame
Use multiple aggregations on columns.
This can be combined with complete lazy API and is considered idiomatic polars.
138 139 140 141 142 143 |
# File 'lib/polars/group_by.rb', line 138 def agg(aggs) @df.lazy .group_by(@by, maintain_order: @maintain_order) .agg(aggs) .collect(no_optimization: true) end |
#count ⇒ DataFrame
Count the number of values in each group.
417 418 419 |
# File 'lib/polars/group_by.rb', line 417 def count agg(Polars.len.alias("count")) end |
#each ⇒ Object
Allows iteration over the groups of the group by operation.
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
# File 'lib/polars/group_by.rb', line 35 def each return to_enum(:each) unless block_given? temp_col = "__POLARS_GB_GROUP_INDICES" groups_df = @df.lazy .with_row_index(name: temp_col) .group_by(@by, maintain_order: @maintain_order) .agg(Polars.col(temp_col)) .collect(no_optimization: true) group_names = groups_df.select(Polars.all.exclude(temp_col)) # When grouping by a single column, group name is a single value # When grouping by multiple columns, group name is a tuple of values if @by.is_a?(::String) || @by.is_a?(Expr) _group_names = group_names.to_series.each else _group_names = group_names.iter_rows end _group_indices = groups_df.select(temp_col).to_series _current_index = 0 while _current_index < _group_indices.length group_name = _group_names.next group_data = @df[_group_indices[_current_index]] _current_index += 1 yield group_name, group_data end end |
#first ⇒ DataFrame
Aggregate the first values in the group.
272 273 274 |
# File 'lib/polars/group_by.rb', line 272 def first agg(Polars.all.first) end |
#head(n = 5) ⇒ DataFrame
Get the first n
rows of each group.
189 190 191 192 193 194 |
# File 'lib/polars/group_by.rb', line 189 def head(n = 5) @df.lazy .group_by(@by, maintain_order: @maintain_order) .head(n) .collect(no_optimization: true) end |
#last ⇒ DataFrame
Aggregate the last values in the group.
301 302 303 |
# File 'lib/polars/group_by.rb', line 301 def last agg(Polars.all.last) end |
#max ⇒ DataFrame
Reduce the groups to the maximal value.
388 389 390 |
# File 'lib/polars/group_by.rb', line 388 def max agg(Polars.all.max) end |
#mean ⇒ DataFrame
Reduce the groups to the mean values.
446 447 448 |
# File 'lib/polars/group_by.rb', line 446 def mean agg(Polars.all.mean) end |
#median ⇒ DataFrame
Return the median per group.
533 534 535 |
# File 'lib/polars/group_by.rb', line 533 def median agg(Polars.all.median) end |
#min ⇒ DataFrame
Reduce the groups to the minimal value.
359 360 361 |
# File 'lib/polars/group_by.rb', line 359 def min agg(Polars.all.min) end |
#n_unique ⇒ DataFrame
Count the unique values per group.
473 474 475 |
# File 'lib/polars/group_by.rb', line 473 def n_unique agg(Polars.all.n_unique) end |
#plot(*args, **options) ⇒ Vega::LiteChart
Plot data.
540 541 542 543 544 545 546 |
# File 'lib/polars/group_by.rb', line 540 def plot(*args, **) raise ArgumentError, "Multiple groups not supported" if @by.is_a?(::Array) && @by.size > 1 # same message as Ruby raise ArgumentError, "unknown keyword: :group" if .key?(:group) @df.plot(*args, **, group: @by) end |
#quantile(quantile, interpolation: "nearest") ⇒ DataFrame
Compute the quantile per group.
506 507 508 |
# File 'lib/polars/group_by.rb', line 506 def quantile(quantile, interpolation: "nearest") agg(Polars.all.quantile(quantile, interpolation: interpolation)) end |
#sum ⇒ DataFrame
Reduce the groups to the sum.
330 331 332 |
# File 'lib/polars/group_by.rb', line 330 def sum agg(Polars.all.sum) end |
#tail(n = 5) ⇒ DataFrame
Get the last n
rows of each group.
240 241 242 243 244 245 |
# File 'lib/polars/group_by.rb', line 240 def tail(n = 5) @df.lazy .group_by(@by, maintain_order: @maintain_order) .tail(n) .collect(no_optimization: true) end |