Class: Polars::GroupBy

Inherits:

Object

Object
Polars::GroupBy

show all

Defined in:: lib/polars/group_by.rb

Overview

Starts a new GroupBy operation.

Instance Method Summary collapse

#agg(aggs) ⇒ DataFrame
Use multiple aggregations on columns.
#agg_list ⇒ DataFrame
Aggregate the groups into Series.
#count ⇒ DataFrame
Count the number of values in each group.
#first ⇒ DataFrame
Aggregate the first values in the group.
#head(n = 5) ⇒ DataFrame
Get the first n rows of each group.
#last ⇒ DataFrame
Aggregate the last values in the group.
#max ⇒ DataFrame
Reduce the groups to the maximal value.
#mean ⇒ DataFrame
Reduce the groups to the mean values.
#median ⇒ DataFrame
Return the median per group.
#min ⇒ DataFrame
Reduce the groups to the minimal value.
#n_unique ⇒ DataFrame
Count the unique values per group.
#quantile(quantile, interpolation: "nearest") ⇒ DataFrame
Compute the quantile per group.
#sum ⇒ DataFrame
Reduce the groups to the sum.
#tail(n = 5) ⇒ DataFrame
Get the last n rows of each group.

Instance Method Details

#agg(aggs) ⇒ `DataFrame`

Use multiple aggregations on columns.

This can be combined with complete lazy API and is considered idiomatic polars.

Examples:

df = Polars::DataFrame.new(
  {"foo" => ["one", "two", "two", "one", "two"], "bar" => [5, 3, 2, 4, 1]}
)
df.groupby("foo", maintain_order: true).agg(
  [
    Polars.sum("bar").suffix("_sum"),
    Polars.col("bar").sort.tail(2).sum.suffix("_tail_sum")
  ]
)
# =>
# shape: (2, 3)
# ┌─────┬─────────┬──────────────┐
# │ foo ┆ bar_sum ┆ bar_tail_sum │
# │ --- ┆ ---     ┆ ---          │
# │ str ┆ i64     ┆ i64          │
# ╞═════╪═════════╪══════════════╡
# │ one ┆ 9       ┆ 9            │
# ├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ two ┆ 6       ┆ 5            │
# └─────┴─────────┴──────────────┘

Parameters:

aggs (Object) —
Single / multiple aggregation expression(s).

Returns:

(DataFrame)

# File 'lib/polars/group_by.rb', line 89

def agg(aggs)
  df = Utils.wrap_df(_df)
    .lazy
    .groupby(by, maintain_order: maintain_order)
    .agg(aggs)
    .collect(no_optimization: true, string_cache: false)
  _dataframe_class._from_rbdf(df._df)
end

#agg_list ⇒ `DataFrame`

Aggregate the groups into Series.

Examples:

df = Polars::DataFrame.new({"a" => ["one", "two", "one", "two"], "b" => [1, 2, 3, 4]})
df.groupby("a", maintain_order: true).agg_list
# =>
# shape: (2, 2)
# ┌─────┬───────────┐
# │ a   ┆ b         │
# │ --- ┆ ---       │
# │ str ┆ list[i64] │
# ╞═════╪═══════════╡
# │ one ┆ [1, 3]    │
# ├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
# │ two ┆ [2, 4]    │
# └─────┴───────────┘

Returns:

(DataFrame)



554
555
556

# File 'lib/polars/group_by.rb', line 554

def agg_list
  agg(Polars.all.list)
end

#count ⇒ `DataFrame`

Count the number of values in each group.

Examples:

df = Polars::DataFrame.new(
  {
    "a" => [1, 2, 2, 3, 4, 5],
    "b" => [0.5, 0.5, 4, 10, 13, 14],
    "c" => [true, true, true, false, false, true],
    "d" => ["Apple", "Orange", "Apple", "Apple", "Banana", "Banana"]
  }
)
df.groupby("d", maintain_order: true).count
# =>
# shape: (3, 2)
# ┌────────┬───────┐
# │ d      ┆ count │
# │ ---    ┆ ---   │
# │ str    ┆ u32   │
# ╞════════╪═══════╡
# │ Apple  ┆ 3     │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ Orange ┆ 1     │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ Banana ┆ 2     │
# └────────┴───────┘

Returns:

(DataFrame)



410
411
412

# File 'lib/polars/group_by.rb', line 410

def count
  agg(Polars.count)
end

#first ⇒ `DataFrame`

Aggregate the first values in the group.

Examples:

df = Polars::DataFrame.new(
  {
    "a" => [1, 2, 2, 3, 4, 5],
    "b" => [0.5, 0.5, 4, 10, 13, 14],
    "c" => [true, true, true, false, false, true],
    "d" => ["Apple", "Orange", "Apple", "Apple", "Banana", "Banana"]
  }
)
df.groupby("d", maintain_order: true).first
# =>
# shape: (3, 4)
# ┌────────┬─────┬──────┬───────┐
# │ d      ┆ a   ┆ b    ┆ c     │
# │ ---    ┆ --- ┆ ---  ┆ ---   │
# │ str    ┆ i64 ┆ f64  ┆ bool  │
# ╞════════╪═════╪══════╪═══════╡
# │ Apple  ┆ 1   ┆ 0.5  ┆ true  │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ Orange ┆ 2   ┆ 0.5  ┆ true  │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ Banana ┆ 4   ┆ 13.0 ┆ false │
# └────────┴─────┴──────┴───────┘

Returns:

(DataFrame)



255
256
257

# File 'lib/polars/group_by.rb', line 255

def first
  agg(Polars.all.first)
end

#head(n = 5) ⇒ `DataFrame`

Get the first n rows of each group.

Examples:

df = Polars::DataFrame.new(
  {
    "letters" => ["c", "c", "a", "c", "a", "b"],
    "nrs" => [1, 2, 3, 4, 5, 6]
  }
)
# =>
# shape: (6, 2)
# ┌─────────┬─────┐
# │ letters ┆ nrs │
# │ ---     ┆ --- │
# │ str     ┆ i64 │
# ╞═════════╪═════╡
# │ c       ┆ 1   │
# ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
# │ c       ┆ 2   │
# ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
# │ a       ┆ 3   │
# ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
# │ c       ┆ 4   │
# ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
# │ a       ┆ 5   │
# ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
# │ b       ┆ 6   │
# └─────────┴─────┘

df.groupby("letters").head(2).sort("letters")
# =>
# shape: (5, 2)
# ┌─────────┬─────┐
# │ letters ┆ nrs │
# │ ---     ┆ --- │
# │ str     ┆ i64 │
# ╞═════════╪═════╡
# │ a       ┆ 3   │
# ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
# │ a       ┆ 5   │
# ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
# │ b       ┆ 6   │
# ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
# │ c       ┆ 1   │
# ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
# │ c       ┆ 2   │
# └─────────┴─────┘

Parameters:

n (Integer) (defaults to: 5) —
Number of rows to return.

Returns:

(DataFrame)

# File 'lib/polars/group_by.rb', line 151

def head(n = 5)
  df = (
    Utils.wrap_df(_df)
      .lazy
      .groupby(by, maintain_order: maintain_order)
      .head(n)
      .collect(no_optimization: true, string_cache: false)
  )
  _dataframe_class._from_rbdf(df._df)
end

#last ⇒ `DataFrame`

Aggregate the last values in the group.

Examples:

df = Polars::DataFrame.new(
  {
    "a" => [1, 2, 2, 3, 4, 5],
    "b" => [0.5, 0.5, 4, 10, 13, 14],
    "c" => [true, true, true, false, false, true],
    "d" => ["Apple", "Orange", "Apple", "Apple", "Banana", "Banana"]
  }
)
df.groupby("d", maintain_order: true).last
# =>
# shape: (3, 4)
# ┌────────┬─────┬──────┬───────┐
# │ d      ┆ a   ┆ b    ┆ c     │
# │ ---    ┆ --- ┆ ---  ┆ ---   │
# │ str    ┆ i64 ┆ f64  ┆ bool  │
# ╞════════╪═════╪══════╪═══════╡
# │ Apple  ┆ 3   ┆ 10.0 ┆ false │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ Orange ┆ 2   ┆ 0.5  ┆ true  │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ Banana ┆ 5   ┆ 14.0 ┆ true  │
# └────────┴─────┴──────┴───────┘

Returns:

(DataFrame)



286
287
288

# File 'lib/polars/group_by.rb', line 286

def last
  agg(Polars.all.last)
end

#max ⇒ `DataFrame`

Reduce the groups to the maximal value.

Examples:

df = Polars::DataFrame.new(
  {
    "a" => [1, 2, 2, 3, 4, 5],
    "b" => [0.5, 0.5, 4, 10, 13, 14],
    "c" => [true, true, true, false, false, true],
    "d" => ["Apple", "Orange", "Apple", "Apple", "Banana", "Banana"]
  }
)
df.groupby("d", maintain_order: true).max
# =>
# shape: (3, 4)
# ┌────────┬─────┬──────┬──────┐
# │ d      ┆ a   ┆ b    ┆ c    │
# │ ---    ┆ --- ┆ ---  ┆ ---  │
# │ str    ┆ i64 ┆ f64  ┆ bool │
# ╞════════╪═════╪══════╪══════╡
# │ Apple  ┆ 3   ┆ 10.0 ┆ true │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
# │ Orange ┆ 2   ┆ 0.5  ┆ true │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
# │ Banana ┆ 5   ┆ 14.0 ┆ true │
# └────────┴─────┴──────┴──────┘

Returns:

(DataFrame)



379
380
381

# File 'lib/polars/group_by.rb', line 379

def max
  agg(Polars.all.max)
end

#mean ⇒ `DataFrame`

Reduce the groups to the mean values.

Examples:

df = Polars::DataFrame.new(
  {
    "a" => [1, 2, 2, 3, 4, 5],
    "b" => [0.5, 0.5, 4, 10, 13, 14],
    "c" => [true, true, true, false, false, true],
    "d" => ["Apple", "Orange", "Apple", "Apple", "Banana", "Banana"]
  }
)
df.groupby("d", maintain_order: true).mean
# =>
# shape: (3, 4)
# ┌────────┬─────┬──────────┬──────────┐
# │ d      ┆ a   ┆ b        ┆ c        │
# │ ---    ┆ --- ┆ ---      ┆ ---      │
# │ str    ┆ f64 ┆ f64      ┆ f64      │
# ╞════════╪═════╪══════════╪══════════╡
# │ Apple  ┆ 2.0 ┆ 4.833333 ┆ 0.666667 │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
# │ Orange ┆ 2.0 ┆ 0.5      ┆ 1.0      │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
# │ Banana ┆ 4.5 ┆ 13.5     ┆ 0.5      │
# └────────┴─────┴──────────┴──────────┘

Returns:

(DataFrame)



441
442
443

# File 'lib/polars/group_by.rb', line 441

def mean
  agg(Polars.all.mean)
end

#median ⇒ `DataFrame`

Return the median per group.

Examples:

df = Polars::DataFrame.new(
  {
    "a" => [1, 2, 2, 3, 4, 5],
    "b" => [0.5, 0.5, 4, 10, 13, 14],
    "d" => ["Apple", "Banana", "Apple", "Apple", "Banana", "Banana"]
  }
)
df.groupby("d", maintain_order: true).median
# =>
# shape: (2, 3)
# ┌────────┬─────┬──────┐
# │ d      ┆ a   ┆ b    │
# │ ---    ┆ --- ┆ ---  │
# │ str    ┆ f64 ┆ f64  │
# ╞════════╪═════╪══════╡
# │ Apple  ┆ 2.0 ┆ 4.0  │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤
# │ Banana ┆ 4.0 ┆ 13.0 │
# └────────┴─────┴──────┘

Returns:

(DataFrame)



532
533
534

# File 'lib/polars/group_by.rb', line 532

def median
  agg(Polars.all.median)
end

#min ⇒ `DataFrame`

Reduce the groups to the minimal value.

Examples:

df = Polars::DataFrame.new(
  {
    "a" => [1, 2, 2, 3, 4, 5],
    "b" => [0.5, 0.5, 4, 10, 13, 14],
    "c" => [true, true, true, false, false, true],
    "d" => ["Apple", "Orange", "Apple", "Apple", "Banana", "Banana"],
  }
)
df.groupby("d", maintain_order: true).min
# =>
# shape: (3, 4)
# ┌────────┬─────┬──────┬───────┐
# │ d      ┆ a   ┆ b    ┆ c     │
# │ ---    ┆ --- ┆ ---  ┆ ---   │
# │ str    ┆ i64 ┆ f64  ┆ bool  │
# ╞════════╪═════╪══════╪═══════╡
# │ Apple  ┆ 1   ┆ 0.5  ┆ false │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ Orange ┆ 2   ┆ 0.5  ┆ true  │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ Banana ┆ 4   ┆ 13.0 ┆ false │
# └────────┴─────┴──────┴───────┘

Returns:

(DataFrame)



348
349
350

# File 'lib/polars/group_by.rb', line 348

def min
  agg(Polars.all.min)
end

#n_unique ⇒ `DataFrame`

Count the unique values per group.

Examples:

df = Polars::DataFrame.new(
  {
    "a" => [1, 2, 1, 3, 4, 5],
    "b" => [0.5, 0.5, 0.5, 10, 13, 14],
    "d" => ["Apple", "Banana", "Apple", "Apple", "Banana", "Banana"]
  }
)
df.groupby("d", maintain_order: true).n_unique
# =>
# shape: (2, 3)
# ┌────────┬─────┬─────┐
# │ d      ┆ a   ┆ b   │
# │ ---    ┆ --- ┆ --- │
# │ str    ┆ u32 ┆ u32 │
# ╞════════╪═════╪═════╡
# │ Apple  ┆ 2   ┆ 2   │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
# │ Banana ┆ 3   ┆ 3   │
# └────────┴─────┴─────┘

Returns:

(DataFrame)



469
470
471

# File 'lib/polars/group_by.rb', line 469

def n_unique
  agg(Polars.all.n_unique)
end

#quantile(quantile, interpolation: "nearest") ⇒ `DataFrame`

Compute the quantile per group.

Examples:

df = Polars::DataFrame.new(
  {
    "a" => [1, 2, 2, 3, 4, 5],
    "b" => [0.5, 0.5, 4, 10, 13, 14],
    "d" => ["Apple", "Orange", "Apple", "Apple", "Banana", "Banana"]
  }
)
df.groupby("d", maintain_order: true).quantile(1)
# =>
# shape: (3, 3)
# ┌────────┬─────┬──────┐
# │ d      ┆ a   ┆ b    │
# │ ---    ┆ --- ┆ ---  │
# │ str    ┆ f64 ┆ f64  │
# ╞════════╪═════╪══════╡
# │ Apple  ┆ 3.0 ┆ 10.0 │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤
# │ Orange ┆ 2.0 ┆ 0.5  │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤
# │ Banana ┆ 5.0 ┆ 14.0 │
# └────────┴─────┴──────┘

Parameters:

quantile (Float) —
Quantile between 0.0 and 1.0.
interpolation ("nearest", "higher", "lower", "midpoint", "linear") (defaults to: "nearest") —
Interpolation method.

Returns:

(DataFrame)



504
505
506

# File 'lib/polars/group_by.rb', line 504

def quantile(quantile, interpolation: "nearest")
  agg(Polars.all.quantile(quantile, interpolation: interpolation))
end

#sum ⇒ `DataFrame`

Reduce the groups to the sum.

Examples:

df = Polars::DataFrame.new(
  {
    "a" => [1, 2, 2, 3, 4, 5],
    "b" => [0.5, 0.5, 4, 10, 13, 14],
    "c" => [true, true, true, false, false, true],
    "d" => ["Apple", "Orange", "Apple", "Apple", "Banana", "Banana"]
  }
)
df.groupby("d", maintain_order: true).sum
# =>
# shape: (3, 4)
# ┌────────┬─────┬──────┬─────┐
# │ d      ┆ a   ┆ b    ┆ c   │
# │ ---    ┆ --- ┆ ---  ┆ --- │
# │ str    ┆ i64 ┆ f64  ┆ u32 │
# ╞════════╪═════╪══════╪═════╡
# │ Apple  ┆ 6   ┆ 14.5 ┆ 2   │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┤
# │ Orange ┆ 2   ┆ 0.5  ┆ 1   │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┤
# │ Banana ┆ 9   ┆ 27.0 ┆ 1   │
# └────────┴─────┴──────┴─────┘

Returns:

(DataFrame)



317
318
319

# File 'lib/polars/group_by.rb', line 317

def sum
  agg(Polars.all.sum)
end

#tail(n = 5) ⇒ `DataFrame`

Get the last n rows of each group.

Examples:

df = Polars::DataFrame.new(
  {
    "letters" => ["c", "c", "a", "c", "a", "b"],
    "nrs" => [1, 2, 3, 4, 5, 6]
  }
)
# =>
# shape: (6, 2)
# ┌─────────┬─────┐
# │ letters ┆ nrs │
# │ ---     ┆ --- │
# │ str     ┆ i64 │
# ╞═════════╪═════╡
# │ c       ┆ 1   │
# ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
# │ c       ┆ 2   │
# ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
# │ a       ┆ 3   │
# ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
# │ c       ┆ 4   │
# ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
# │ a       ┆ 5   │
# ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
# │ b       ┆ 6   │
# └─────────┴─────┘

df.groupby("letters").tail(2).sort("letters")
# =>
# shape: (5, 2)
# ┌─────────┬─────┐
# │ letters ┆ nrs │
# │ ---     ┆ --- │
# │ str     ┆ i64 │
# ╞═════════╪═════╡
# │ a       ┆ 3   │
# ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
# │ a       ┆ 5   │
# ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
# │ b       ┆ 6   │
# ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
# │ c       ┆ 2   │
# ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
# │ c       ┆ 4   │
# └─────────┴─────┘

Parameters:

n (Integer) (defaults to: 5) —
Number of rows to return.

Returns:

(DataFrame)

# File 'lib/polars/group_by.rb', line 215

def tail(n = 5)
  df = (
    Utils.wrap_df(_df)
      .lazy
      .groupby(by, maintain_order: maintain_order)
      .tail(n)
      .collect(no_optimization: true, string_cache: false)
  )
  _dataframe_class._from_rbdf(df._df)
end

Class: Polars::GroupBy

Overview

Instance Method Summary collapse

Instance Method Details

#agg(aggs) ⇒ DataFrame

Examples:

#agg_list ⇒ DataFrame

Examples:

#count ⇒ DataFrame

Examples:

#first ⇒ DataFrame

Examples:

#head(n = 5) ⇒ DataFrame

Examples:

#last ⇒ DataFrame

Examples:

#max ⇒ DataFrame

Examples:

#mean ⇒ DataFrame

Examples:

#median ⇒ DataFrame

Examples:

#min ⇒ DataFrame

Examples:

#n_unique ⇒ DataFrame

Examples:

#quantile(quantile, interpolation: "nearest") ⇒ DataFrame

Examples:

#sum ⇒ DataFrame

Examples:

#tail(n = 5) ⇒ DataFrame

Examples:

#agg(aggs) ⇒ `DataFrame`

#agg_list ⇒ `DataFrame`

#count ⇒ `DataFrame`

#first ⇒ `DataFrame`

#head(n = 5) ⇒ `DataFrame`

#last ⇒ `DataFrame`

#max ⇒ `DataFrame`

#mean ⇒ `DataFrame`

#median ⇒ `DataFrame`

#min ⇒ `DataFrame`

#n_unique ⇒ `DataFrame`

#quantile(quantile, interpolation: "nearest") ⇒ `DataFrame`

#sum ⇒ `DataFrame`

#tail(n = 5) ⇒ `DataFrame`