Class: Polars::LazyFrame

Inherits:

Object

Object
Polars::LazyFrame

show all

Defined in:: lib/polars/lazy_frame.rb

Overview

Representation of a Lazy computation graph/query againat a DataFrame.

Class Method Summary collapse

.read_json(file) ⇒ LazyFrame
Read a logical plan from a JSON file to construct a LazyFrame.

Instance Method Summary collapse

#cache ⇒ LazyFrame
Cache the result once the execution of the physical plan hits this node.
#cleared ⇒ LazyFrame
Create an empty copy of the current LazyFrame.
#collect(type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, string_cache: false, no_optimization: false, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false) ⇒ DataFrame
Collect into a DataFrame.
#columns ⇒ Array
Get or set column names.
#describe_optimized_plan(type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false) ⇒ String
Create a string representation of the optimized query plan.
#describe_plan ⇒ String
Create a string representation of the unoptimized query plan.
#drop(columns) ⇒ LazyFrame
Remove one or multiple columns from a DataFrame.
#drop_nulls(subset: nil) ⇒ LazyFrame
Drop rows with null values from this LazyFrame.
#dtypes ⇒ Array
Get dtypes of columns in LazyFrame.
#explode(columns) ⇒ LazyFrame
Explode lists to long format.
#fetch(n_rows = 500, type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, string_cache: false, no_optimization: false, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false) ⇒ DataFrame
Collect a small number of rows for debugging purposes.
#fill_nan(fill_value) ⇒ LazyFrame
Fill floating point NaN values.
#fill_null(value = nil, strategy: nil, limit: nil, matches_supertype: nil) ⇒ LazyFrame
Fill null values using the specified value or strategy.
#filter(predicate) ⇒ LazyFrame
Filter the rows in the DataFrame based on a predicate expression.
#first ⇒ LazyFrame
Get the first row of the DataFrame.
#groupby(by, maintain_order: false) ⇒ LazyGroupBy
Start a groupby operation.
#groupby_dynamic(index_column, every:, period: nil, offset: nil, truncate: true, include_boundaries: false, closed: "left", by: nil, start_by: "window") ⇒ DataFrame
Group based on a time value (or index value of type :i32, :i64).
#groupby_rolling(index_column:, period:, offset: nil, closed: "right", by: nil) ⇒ LazyFrame
Create rolling groups based on a time column.
#head(n = 5) ⇒ LazyFrame
Get the first n rows.
#include?(key) ⇒ Boolean
Check if LazyFrame includes key.
#interpolate ⇒ LazyFrame
Interpolate intermediate values.
#join(other, left_on: nil, right_on: nil, on: nil, how: "inner", suffix: "_right", allow_parallel: true, force_parallel: false) ⇒ LazyFrame
Add a join operation to the Logical Plan.
#join_asof(other, left_on: nil, right_on: nil, on: nil, by_left: nil, by_right: nil, by: nil, strategy: "backward", suffix: "_right", tolerance: nil, allow_parallel: true, force_parallel: false) ⇒ LazyFrame
Perform an asof join.
#last ⇒ LazyFrame
Get the last row of the DataFrame.
#lazy ⇒ LazyFrame
Return lazy representation, i.e.
#limit(n = 5) ⇒ LazyFrame
Get the first n rows.
#max ⇒ LazyFrame
Aggregate the columns in the DataFrame to their maximum value.
#mean ⇒ LazyFrame
Aggregate the columns in the DataFrame to their mean value.
#median ⇒ LazyFrame
Aggregate the columns in the DataFrame to their median value.
#melt(id_vars: nil, value_vars: nil, variable_name: nil, value_name: nil) ⇒ LazyFrame
Unpivot a DataFrame from wide to long format.
#min ⇒ LazyFrame
Aggregate the columns in the DataFrame to their minimum value.
#pipe(func, *args, **kwargs, &block) ⇒ LazyFrame
Offers a structured way to apply a sequence of user-defined functions (UDFs).
#quantile(quantile, interpolation: "nearest") ⇒ LazyFrame
Aggregate the columns in the DataFrame to their quantile value.
#rename(mapping) ⇒ LazyFrame
Rename column names.
#reverse ⇒ LazyFrame
Reverse the DataFrame.
#schema ⇒ Hash
Get the schema.
#select(exprs) ⇒ LazyFrame
Select columns from this DataFrame.
#shift(periods) ⇒ LazyFrame
Shift the values by a given period.
#shift_and_fill(periods, fill_value) ⇒ LazyFrame
Shift the values by a given period and fill the resulting null values.
#slice(offset, length = nil) ⇒ LazyFrame
Get a slice of this DataFrame.
#sort(by, reverse: false, nulls_last: false) ⇒ LazyFrame
Sort the DataFrame.
#std(ddof: 1) ⇒ LazyFrame
Aggregate the columns in the DataFrame to their standard deviation value.
#sum ⇒ LazyFrame
Aggregate the columns in the DataFrame to their sum value.
#tail(n = 5) ⇒ LazyFrame
Get the last n rows.
#take_every(n) ⇒ LazyFrame
Take every nth row in the LazyFrame and return as a new LazyFrame.
#to_s ⇒ String
Returns a string representing the LazyFrame.
#unique(maintain_order: true, subset: nil, keep: "first") ⇒ LazyFrame
Drop duplicate rows from this DataFrame.
#unnest(names) ⇒ LazyFrame
Decompose a struct into its fields.
#var(ddof: 1) ⇒ LazyFrame
Aggregate the columns in the DataFrame to their variance value.
#width ⇒ Integer
Get the width of the LazyFrame.
#with_column(column) ⇒ LazyFrame
Add or overwrite column in a DataFrame.
#with_columns(exprs) ⇒ LazyFrame
Add or overwrite multiple columns in a DataFrame.
#with_context(other) ⇒ LazyFrame
Add an external context to the computation graph.
#with_row_count(name: "row_nr", offset: 0) ⇒ LazyFrame
Add a column at index 0 that counts the rows.
#write_json(file) ⇒ nil
Write the logical plan of this LazyFrame to a file or string in JSON format.

Class Method Details

.read_json(file) ⇒ `LazyFrame`

Read a logical plan from a JSON file to construct a LazyFrame.

# File 'lib/polars/lazy_frame.rb', line 158

def self.read_json(file)
  if file.is_a?(String) || (defined?(Pathname) && file.is_a?(Pathname))
    file = Utils.format_path(file)
  end

  Utils.wrap_ldf(RbLazyFrame.read_json(file))
end

Instance Method Details

#cache ⇒ `LazyFrame`

Cache the result once the execution of the physical plan hits this node.



591
592
593

# File 'lib/polars/lazy_frame.rb', line 591

def cache
  _from_rbldf(_ldf.cache)
end

#cleared ⇒ `LazyFrame`

Create an empty copy of the current LazyFrame.

The copy has an identical schema but no data.

Examples:

df = Polars::DataFrame.new(
  {
    "a" => [nil, 2, 3, 4],
    "b" => [0.5, nil, 2.5, 13],
    "c" => [true, true, false, nil],
  }
).lazy
df.cleared.fetch
# =>
# shape: (0, 3)
# ┌─────┬─────┬──────┐
# │ a   ┆ b   ┆ c    │
# │ --- ┆ --- ┆ ---  │
# │ i64 ┆ f64 ┆ bool │
# ╞═════╪═════╪══════╡
# └─────┴─────┴──────┘



618
619
620

# File 'lib/polars/lazy_frame.rb', line 618

def cleared
  DataFrame.new(columns: schema).lazy
end

#collect(type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, string_cache: false, no_optimization: false, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false) ⇒ `DataFrame`

Collect into a DataFrame.

Note: use #fetch if you want to run your query on the first n rows only. This can be a huge time saver in debugging queries.

Examples:

df = Polars::DataFrame.new(
  {
    "a" => ["a", "b", "a", "b", "b", "c"],
    "b" => [1, 2, 3, 4, 5, 6],
    "c" => [6, 5, 4, 3, 2, 1]
  }
).lazy
df.groupby("a", maintain_order: true).agg(Polars.all.sum).collect
# =>
# shape: (3, 3)
# ┌─────┬─────┬─────┐
# │ a   ┆ b   ┆ c   │
# │ --- ┆ --- ┆ --- │
# │ str ┆ i64 ┆ i64 │
# ╞═════╪═════╪═════╡
# │ a   ┆ 4   ┆ 10  │
# ├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
# │ b   ┆ 11  ┆ 10  │
# ├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
# │ c   ┆ 6   ┆ 1   │
# └─────┴─────┴─────┘

# File 'lib/polars/lazy_frame.rb', line 449

def collect(
  type_coercion: true,
  predicate_pushdown: true,
  projection_pushdown: true,
  simplify_expression: true,
  string_cache: false,
  no_optimization: false,
  slice_pushdown: true,
  common_subplan_elimination: true,
  allow_streaming: false
)
  if no_optimization
    predicate_pushdown = false
    projection_pushdown = false
    slice_pushdown = false
    common_subplan_elimination = false
  end

  if allow_streaming
    common_subplan_elimination = false
  end

  ldf = _ldf.optimization_toggle(
    type_coercion,
    predicate_pushdown,
    projection_pushdown,
    simplify_expression,
    slice_pushdown,
    common_subplan_elimination,
    allow_streaming
  )
  Utils.wrap_df(ldf.collect)
end

#columns ⇒ `Array`

Get or set column names.

Examples:

df = (
   Polars::DataFrame.new(
     {
       "foo" => [1, 2, 3],
       "bar" => [6, 7, 8],
       "ham" => ["a", "b", "c"]
     }
   )
   .lazy
   .select(["foo", "bar"])
)
df.columns
# => ["foo", "bar"]



184
185
186

# File 'lib/polars/lazy_frame.rb', line 184

def columns
  _ldf.columns
end

#describe_optimized_plan(type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false) ⇒ `String`

Create a string representation of the optimized query plan.

# File 'lib/polars/lazy_frame.rb', line 321

def describe_optimized_plan(
  type_coercion: true,
  predicate_pushdown: true,
  projection_pushdown: true,
  simplify_expression: true,
  slice_pushdown: true,
  common_subplan_elimination: true,
  allow_streaming: false
)
  ldf = _ldf.optimization_toggle(
    type_coercion,
    predicate_pushdown,
    projection_pushdown,
    simplify_expression,
    slice_pushdown,
    common_subplan_elimination,
    allow_streaming,
  )

  ldf.describe_optimized_plan
end

#describe_plan ⇒ `String`

Create a string representation of the unoptimized query plan.



314
315
316

# File 'lib/polars/lazy_frame.rb', line 314

def describe_plan
  _ldf.describe_plan
end

#drop(columns) ⇒ `LazyFrame`

Remove one or multiple columns from a DataFrame.

# File 'lib/polars/lazy_frame.rb', line 1656

def drop(columns)
  if columns.is_a?(String)
    columns = [columns]
  end
  _from_rbldf(_ldf.drop_columns(columns))
end

#drop_nulls(subset: nil) ⇒ `LazyFrame`

Drop rows with null values from this LazyFrame.

Examples:

df = Polars::DataFrame.new(
  {
    "foo" => [1, 2, 3],
    "bar" => [6, nil, 8],
    "ham" => ["a", "b", "c"]
  }
)
df.lazy.drop_nulls.collect
# =>
# shape: (2, 3)
# ┌─────┬─────┬─────┐
# │ foo ┆ bar ┆ ham │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ str │
# ╞═════╪═════╪═════╡
# │ 1   ┆ 6   ┆ a   │
# ├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
# │ 3   ┆ 8   ┆ c   │
# └─────┴─────┴─────┘

# File 'lib/polars/lazy_frame.rb', line 2262

def drop_nulls(subset: nil)
  if !subset.nil? && !subset.is_a?(Array)
    subset = [subset]
  end
  _from_rbldf(_ldf.drop_nulls(subset))
end

#dtypes ⇒ `Array`

Get dtypes of columns in LazyFrame.

Examples:

lf = Polars::DataFrame.new(
  {
    "foo" => [1, 2, 3],
    "bar" => [6.0, 7.0, 8.0],
    "ham" => ["a", "b", "c"]
  }
).lazy
lf.dtypes
# => [Polars::Int64, Polars::Float64, Polars::Utf8]



202
203
204

# File 'lib/polars/lazy_frame.rb', line 202

def dtypes
  _ldf.dtypes
end

#explode(columns) ⇒ `LazyFrame`

Explode lists to long format.

Examples:

df = Polars::DataFrame.new(
  {
    "letters" => ["a", "a", "b", "c"],
    "numbers" => [[1], [2, 3], [4, 5], [6, 7, 8]],
  }
).lazy
df.explode("numbers").collect
# =>
# shape: (8, 2)
# ┌─────────┬─────────┐
# │ letters ┆ numbers │
# │ ---     ┆ ---     │
# │ str     ┆ i64     │
# ╞═════════╪═════════╡
# │ a       ┆ 1       │
# ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
# │ a       ┆ 2       │
# ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
# │ a       ┆ 3       │
# ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
# │ b       ┆ 4       │
# ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
# │ b       ┆ 5       │
# ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
# │ c       ┆ 6       │
# ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
# │ c       ┆ 7       │
# ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
# │ c       ┆ 8       │
# └─────────┴─────────┘

# File 'lib/polars/lazy_frame.rb', line 2209

def explode(columns)
  columns = Utils.selection_to_rbexpr_list(columns)
  _from_rbldf(_ldf.explode(columns))
end

#fetch(n_rows = 500, type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, string_cache: false, no_optimization: false, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false) ⇒ `DataFrame`

Collect a small number of rows for debugging purposes.

Fetch is like a #collect operation, but it overwrites the number of rows read by every scan operation. This is a utility that helps debug a query on a smaller number of rows.

Note that the fetch does not guarantee the final number of rows in the DataFrame. Filter, join operations and a lower number of rows available in the scanned file influence the final number of rows.

Examples:

df = Polars::DataFrame.new(
  {
    "a" => ["a", "b", "a", "b", "b", "c"],
    "b" => [1, 2, 3, 4, 5, 6],
    "c" => [6, 5, 4, 3, 2, 1]
  }
).lazy
df.groupby("a", maintain_order: true).agg(Polars.all.sum).fetch(2)
# =>
# shape: (2, 3)
# ┌─────┬─────┬─────┐
# │ a   ┆ b   ┆ c   │
# │ --- ┆ --- ┆ --- │
# │ str ┆ i64 ┆ i64 │
# ╞═════╪═════╪═════╡
# │ a   ┆ 1   ┆ 6   │
# ├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
# │ b   ┆ 2   ┆ 5   │
# └─────┴─────┴─────┘

# File 'lib/polars/lazy_frame.rb', line 537

def fetch(
  n_rows = 500,
  type_coercion: true,
  predicate_pushdown: true,
  projection_pushdown: true,
  simplify_expression: true,
  string_cache: false,
  no_optimization: false,
  slice_pushdown: true,
  common_subplan_elimination: true,
  allow_streaming: false
)
  if no_optimization
    predicate_pushdown = false
    projection_pushdown = false
    slice_pushdown = false
    common_subplan_elimination = false
  end

  ldf = _ldf.optimization_toggle(
    type_coercion,
    predicate_pushdown,
    projection_pushdown,
    simplify_expression,
    slice_pushdown,
    common_subplan_elimination,
    allow_streaming
  )
  Utils.wrap_df(ldf.fetch(n_rows))
end

#fill_nan(fill_value) ⇒ `LazyFrame`

Note:

Note that floating point NaN (Not a Number) are not missing values! To replace missing values, use fill_null instead.

Fill floating point NaN values.

Examples:

df = Polars::DataFrame.new(
  {
    "a" => [1.5, 2, Float::NAN, 4],
    "b" => [0.5, 4, Float::NAN, 13],
  }
).lazy
df.fill_nan(99).collect
# =>
# shape: (4, 2)
# ┌──────┬──────┐
# │ a    ┆ b    │
# │ ---  ┆ ---  │
# │ f64  ┆ f64  │
# ╞══════╪══════╡
# │ 1.5  ┆ 0.5  │
# ├╌╌╌╌╌╌┼╌╌╌╌╌╌┤
# │ 2.0  ┆ 4.0  │
# ├╌╌╌╌╌╌┼╌╌╌╌╌╌┤
# │ 99.0 ┆ 99.0 │
# ├╌╌╌╌╌╌┼╌╌╌╌╌╌┤
# │ 4.0  ┆ 13.0 │
# └──────┴──────┘

# File 'lib/polars/lazy_frame.rb', line 1977

def fill_nan(fill_value)
  if !fill_value.is_a?(Expr)
    fill_value = Utils.lit(fill_value)
  end
  _from_rbldf(_ldf.fill_nan(fill_value._rbexpr))
end

#fill_null(value = nil, strategy: nil, limit: nil, matches_supertype: nil) ⇒ `LazyFrame`

Fill null values using the specified value or strategy.



1939
1940
1941

# File 'lib/polars/lazy_frame.rb', line 1939

def fill_null(value = nil, strategy: nil, limit: nil, matches_supertype: nil)
  select(Polars.all.fill_null(value, strategy: strategy, limit: limit))
end

#filter(predicate) ⇒ `LazyFrame`

Filter the rows in the DataFrame based on a predicate expression.

Examples:

Filter on one condition:

lf = Polars::DataFrame.new(
  {
    "foo" => [1, 2, 3],
    "bar" => [6, 7, 8],
    "ham" => ["a", "b", "c"]
  }
).lazy
lf.filter(Polars.col("foo") < 3).collect
# =>
# shape: (2, 3)
# ┌─────┬─────┬─────┐
# │ foo ┆ bar ┆ ham │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ str │
# ╞═════╪═════╪═════╡
# │ 1   ┆ 6   ┆ a   │
# ├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
# │ 2   ┆ 7   ┆ b   │
# └─────┴─────┴─────┘

Filter on multiple conditions:

lf.filter((Polars.col("foo") < 3) & (Polars.col("ham") == "a")).collect
# =>
# shape: (1, 3)
# ┌─────┬─────┬─────┐
# │ foo ┆ bar ┆ ham │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ str │
# ╞═════╪═════╪═════╡
# │ 1   ┆ 6   ┆ a   │
# └─────┴─────┴─────┘

# File 'lib/polars/lazy_frame.rb', line 661

def filter(predicate)
  _from_rbldf(
    _ldf.filter(
      Utils.expr_to_lit_or_expr(predicate, str_to_lit: false)._rbexpr
    )
  )
end

#first ⇒ `LazyFrame`

Get the first row of the DataFrame.



1872
1873
1874

# File 'lib/polars/lazy_frame.rb', line 1872

def first
  slice(0, 1)
end

#groupby(by, maintain_order: false) ⇒ `LazyGroupBy`

Start a groupby operation.

Examples:

df = Polars::DataFrame.new(
  {
    "a" => ["a", "b", "a", "b", "b", "c"],
    "b" => [1, 2, 3, 4, 5, 6],
    "c" => [6, 5, 4, 3, 2, 1]
  }
).lazy
df.groupby("a", maintain_order: true).agg(Polars.col("b").sum).collect
# =>
# shape: (3, 2)
# ┌─────┬─────┐
# │ a   ┆ b   │
# │ --- ┆ --- │
# │ str ┆ i64 │
# ╞═════╪═════╡
# │ a   ┆ 4   │
# ├╌╌╌╌╌┼╌╌╌╌╌┤
# │ b   ┆ 11  │
# ├╌╌╌╌╌┼╌╌╌╌╌┤
# │ c   ┆ 6   │
# └─────┴─────┘

# File 'lib/polars/lazy_frame.rb', line 799

def groupby(by, maintain_order: false)
  rbexprs_by = Utils.selection_to_rbexpr_list(by)
  lgb = _ldf.groupby(rbexprs_by, maintain_order)
  LazyGroupBy.new(lgb, self.class)
end

#groupby_dynamic(index_column, every:, period: nil, offset: nil, truncate: true, include_boundaries: false, closed: "left", by: nil, start_by: "window") ⇒ `DataFrame`

Group based on a time value (or index value of type :i32, :i64).

Time windows are calculated and rows are assigned to windows. Different from a normal groupby is that a row can be member of multiple groups. The time/index window could be seen as a rolling window, with a window size determined by dates/times/values instead of slots in the DataFrame.

A window is defined by:

every: interval of the window
period: length of the window
offset: offset of the window

The every, period and offset arguments are created with the following string language:

1ns (1 nanosecond)
1us (1 microsecond)
1ms (1 millisecond)
1s (1 second)
1m (1 minute)
1h (1 hour)
1d (1 day)
1w (1 week)
1mo (1 calendar month)
1y (1 calendar year)
1i (1 index count)

Or combine them: "3d12h4m25s" # 3 days, 12 hours, 4 minutes, and 25 seconds

In case of a groupby_dynamic on an integer column, the windows are defined by:

"1i" # length 1
"10i" # length 10

Examples:

df = Polars::DataFrame.new(
  {
    "time" => Polars.date_range(
      DateTime.new(2021, 12, 16),
      DateTime.new(2021, 12, 16, 3),
      "30m"
    ),
    "n" => 0..6
  }
)
# =>
# shape: (7, 2)
# ┌─────────────────────┬─────┐
# │ time                ┆ n   │
# │ ---                 ┆ --- │
# │ datetime[μs]        ┆ i64 │
# ╞═════════════════════╪═════╡
# │ 2021-12-16 00:00:00 ┆ 0   │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
# │ 2021-12-16 00:30:00 ┆ 1   │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
# │ 2021-12-16 01:00:00 ┆ 2   │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
# │ 2021-12-16 01:30:00 ┆ 3   │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
# │ 2021-12-16 02:00:00 ┆ 4   │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
# │ 2021-12-16 02:30:00 ┆ 5   │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
# │ 2021-12-16 03:00:00 ┆ 6   │
# └─────────────────────┴─────┘

Group by windows of 1 hour starting at 2021-12-16 00:00:00.

df.groupby_dynamic("time", every: "1h", closed: "right").agg(
  [
    Polars.col("time").min.alias("time_min"),
    Polars.col("time").max.alias("time_max")
  ]
)
# =>
# shape: (4, 3)
# ┌─────────────────────┬─────────────────────┬─────────────────────┐
# │ time                ┆ time_min            ┆ time_max            │
# │ ---                 ┆ ---                 ┆ ---                 │
# │ datetime[μs]        ┆ datetime[μs]        ┆ datetime[μs]        │
# ╞═════════════════════╪═════════════════════╪═════════════════════╡
# │ 2021-12-15 23:00:00 ┆ 2021-12-16 00:00:00 ┆ 2021-12-16 00:00:00 │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ 2021-12-16 00:00:00 ┆ 2021-12-16 00:30:00 ┆ 2021-12-16 01:00:00 │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ 2021-12-16 01:00:00 ┆ 2021-12-16 01:30:00 ┆ 2021-12-16 02:00:00 │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ 2021-12-16 02:00:00 ┆ 2021-12-16 02:30:00 ┆ 2021-12-16 03:00:00 │
# └─────────────────────┴─────────────────────┴─────────────────────┘

The window boundaries can also be added to the aggregation result.

df.groupby_dynamic(
  "time", every: "1h", include_boundaries: true, closed: "right"
).agg([Polars.col("time").count.alias("time_count")])
# =>
# shape: (4, 4)
# ┌─────────────────────┬─────────────────────┬─────────────────────┬────────────┐
# │ _lower_boundary     ┆ _upper_boundary     ┆ time                ┆ time_count │
# │ ---                 ┆ ---                 ┆ ---                 ┆ ---        │
# │ datetime[μs]        ┆ datetime[μs]        ┆ datetime[μs]        ┆ u32        │
# ╞═════════════════════╪═════════════════════╪═════════════════════╪════════════╡
# │ 2021-12-15 23:00:00 ┆ 2021-12-16 00:00:00 ┆ 2021-12-15 23:00:00 ┆ 1          │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ 2021-12-16 00:00:00 ┆ 2021-12-16 01:00:00 ┆ 2021-12-16 00:00:00 ┆ 2          │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ 2021-12-16 01:00:00 ┆ 2021-12-16 02:00:00 ┆ 2021-12-16 01:00:00 ┆ 2          │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ 2021-12-16 02:00:00 ┆ 2021-12-16 03:00:00 ┆ 2021-12-16 02:00:00 ┆ 2          │
# └─────────────────────┴─────────────────────┴─────────────────────┴────────────┘

When closed="left", should not include right end of interval.

df.groupby_dynamic("time", every: "1h", closed: "left").agg(
  [
    Polars.col("time").count.alias("time_count"),
    Polars.col("time").list.alias("time_agg_list")
  ]
)
# =>
# shape: (4, 3)
# ┌─────────────────────┬────────────┬─────────────────────────────────────┐
# │ time                ┆ time_count ┆ time_agg_list                       │
# │ ---                 ┆ ---        ┆ ---                                 │
# │ datetime[μs]        ┆ u32        ┆ list[datetime[μs]]                  │
# ╞═════════════════════╪════════════╪═════════════════════════════════════╡
# │ 2021-12-16 00:00:00 ┆ 2          ┆ [2021-12-16 00:00:00, 2021-12-16... │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ 2021-12-16 01:00:00 ┆ 2          ┆ [2021-12-16 01:00:00, 2021-12-16... │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ 2021-12-16 02:00:00 ┆ 2          ┆ [2021-12-16 02:00:00, 2021-12-16... │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ 2021-12-16 03:00:00 ┆ 1          ┆ [2021-12-16 03:00:00]               │
# └─────────────────────┴────────────┴─────────────────────────────────────┘

When closed="both" the time values at the window boundaries belong to 2 groups.

df.groupby_dynamic("time", every: "1h", closed: "both").agg(
  [Polars.col("time").count.alias("time_count")]
)
# =>
# shape: (5, 2)
# ┌─────────────────────┬────────────┐
# │ time                ┆ time_count │
# │ ---                 ┆ ---        │
# │ datetime[μs]        ┆ u32        │
# ╞═════════════════════╪════════════╡
# │ 2021-12-15 23:00:00 ┆ 1          │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ 2021-12-16 00:00:00 ┆ 3          │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ 2021-12-16 01:00:00 ┆ 3          │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ 2021-12-16 02:00:00 ┆ 3          │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ 2021-12-16 03:00:00 ┆ 1          │
# └─────────────────────┴────────────┘

Dynamic groupbys can also be combined with grouping on normal keys.

df = Polars::DataFrame.new(
  {
    "time" => Polars.date_range(
      DateTime.new(2021, 12, 16),
      DateTime.new(2021, 12, 16, 3),
      "30m"
    ),
    "groups" => ["a", "a", "a", "b", "b", "a", "a"]
  }
)
df.groupby_dynamic(
  "time",
  every: "1h",
  closed: "both",
  by: "groups",
  include_boundaries: true
).agg([Polars.col("time").count.alias("time_count")])
# =>
# shape: (7, 5)
# ┌────────┬─────────────────────┬─────────────────────┬─────────────────────┬────────────┐
# │ groups ┆ _lower_boundary     ┆ _upper_boundary     ┆ time                ┆ time_count │
# │ ---    ┆ ---                 ┆ ---                 ┆ ---                 ┆ ---        │
# │ str    ┆ datetime[μs]        ┆ datetime[μs]        ┆ datetime[μs]        ┆ u32        │
# ╞════════╪═════════════════════╪═════════════════════╪═════════════════════╪════════════╡
# │ a      ┆ 2021-12-15 23:00:00 ┆ 2021-12-16 00:00:00 ┆ 2021-12-15 23:00:00 ┆ 1          │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ a      ┆ 2021-12-16 00:00:00 ┆ 2021-12-16 01:00:00 ┆ 2021-12-16 00:00:00 ┆ 3          │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ a      ┆ 2021-12-16 01:00:00 ┆ 2021-12-16 02:00:00 ┆ 2021-12-16 01:00:00 ┆ 1          │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ a      ┆ 2021-12-16 02:00:00 ┆ 2021-12-16 03:00:00 ┆ 2021-12-16 02:00:00 ┆ 2          │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ a      ┆ 2021-12-16 03:00:00 ┆ 2021-12-16 04:00:00 ┆ 2021-12-16 03:00:00 ┆ 1          │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ b      ┆ 2021-12-16 01:00:00 ┆ 2021-12-16 02:00:00 ┆ 2021-12-16 01:00:00 ┆ 2          │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ b      ┆ 2021-12-16 02:00:00 ┆ 2021-12-16 03:00:00 ┆ 2021-12-16 02:00:00 ┆ 1          │
# └────────┴─────────────────────┴─────────────────────┴─────────────────────┴────────────┘

Dynamic groupby on an index column.

df = Polars::DataFrame.new(
  {
    "idx" => Polars.arange(0, 6, eager: true),
    "A" => ["A", "A", "B", "B", "B", "C"]
  }
)
df.groupby_dynamic(
  "idx",
  every: "2i",
  period: "3i",
  include_boundaries: true,
  closed: "right"
).agg(Polars.col("A").list.alias("A_agg_list"))
# =>
# shape: (3, 4)
# ┌─────────────────┬─────────────────┬─────┬─────────────────┐
# │ _lower_boundary ┆ _upper_boundary ┆ idx ┆ A_agg_list      │
# │ ---             ┆ ---             ┆ --- ┆ ---             │
# │ i64             ┆ i64             ┆ i64 ┆ list[str]       │
# ╞═════════════════╪═════════════════╪═════╪═════════════════╡
# │ 0               ┆ 3               ┆ 0   ┆ ["A", "B", "B"] │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ 2               ┆ 5               ┆ 2   ┆ ["B", "B", "C"] │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
# │ 4               ┆ 7               ┆ 4   ┆ ["C"]           │
# └─────────────────┴─────────────────┴─────┴─────────────────┘

# File 'lib/polars/lazy_frame.rb', line 1168

def groupby_dynamic(
  index_column,
  every:,
  period: nil,
  offset: nil,
  truncate: true,
  include_boundaries: false,
  closed: "left",
  by: nil,
  start_by: "window"
)
  if offset.nil?
    if period.nil?
      offset = "-#{every}"
    else
      offset = "0ns"
    end
  end

  if period.nil?
    period = every
  end

  period = Utils._timedelta_to_pl_duration(period)
  offset = Utils._timedelta_to_pl_duration(offset)
  every = Utils._timedelta_to_pl_duration(every)

  rbexprs_by = by.nil? ? [] : Utils.selection_to_rbexpr_list(by)
  lgb = _ldf.groupby_dynamic(
    index_column,
    every,
    period,
    offset,
    truncate,
    include_boundaries,
    closed,
    rbexprs_by,
    start_by
  )
  LazyGroupBy.new(lgb, self.class)
end

#groupby_rolling(index_column:, period:, offset: nil, closed: "right", by: nil) ⇒ `LazyFrame`

Create rolling groups based on a time column.

Also works for index values of type :i32 or :i64.

Different from a dynamic_groupby the windows are now determined by the individual values and are not of constant intervals. For constant intervals use groupby_dynamic.

The period and offset arguments are created either from a timedelta, or by using the following string language:

1ns (1 nanosecond)
1us (1 microsecond)
1ms (1 millisecond)
1s (1 second)
1m (1 minute)
1h (1 hour)
1d (1 day)
1w (1 week)
1mo (1 calendar month)
1y (1 calendar year)
1i (1 index count)

Or combine them: "3d12h4m25s" # 3 days, 12 hours, 4 minutes, and 25 seconds

In case of a groupby_rolling on an integer column, the windows are defined by:

"1i" # length 1
"10i" # length 10

Examples:

dates = [
  "2020-01-01 13:45:48",
  "2020-01-01 16:42:13",
  "2020-01-01 16:45:09",
  "2020-01-02 18:12:48",
  "2020-01-03 19:45:32",
  "2020-01-08 23:16:43"
]
df = Polars::DataFrame.new({"dt" => dates, "a" => [3, 7, 5, 9, 2, 1]}).with_column(
  Polars.col("dt").str.strptime(:datetime)
)
df.groupby_rolling(index_column: "dt", period: "2d").agg(
  [
    Polars.sum("a").alias("sum_a"),
    Polars.min("a").alias("min_a"),
    Polars.max("a").alias("max_a")
  ]
)
# =>
# shape: (6, 4)
# ┌─────────────────────┬───────┬───────┬───────┐
# │ dt                  ┆ sum_a ┆ min_a ┆ max_a │
# │ ---                 ┆ ---   ┆ ---   ┆ ---   │
# │ datetime[μs]        ┆ i64   ┆ i64   ┆ i64   │
# ╞═════════════════════╪═══════╪═══════╪═══════╡
# │ 2020-01-01 13:45:48 ┆ 3     ┆ 3     ┆ 3     │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ 2020-01-01 16:42:13 ┆ 10    ┆ 3     ┆ 7     │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ 2020-01-01 16:45:09 ┆ 15    ┆ 3     ┆ 7     │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ 2020-01-02 18:12:48 ┆ 24    ┆ 3     ┆ 9     │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ 2020-01-03 19:45:32 ┆ 11    ┆ 2     ┆ 9     │
# ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ 2020-01-08 23:16:43 ┆ 1     ┆ 1     ┆ 1     │
# └─────────────────────┴───────┴───────┴───────┘

# File 'lib/polars/lazy_frame.rb', line 894

def groupby_rolling(
  index_column:,
  period:,
  offset: nil,
  closed: "right",
  by: nil
)
  if offset.nil?
    offset = "-#{period}"
  end

  rbexprs_by = by.nil? ? [] : Utils.selection_to_rbexpr_list(by)
  period = Utils._timedelta_to_pl_duration(period)
  offset = Utils._timedelta_to_pl_duration(offset)

  lgb = _ldf.groupby_rolling(
    index_column, period, offset, closed, rbexprs_by
  )
  LazyGroupBy.new(lgb, self.class)
end

#head(n = 5) ⇒ `LazyFrame`

Note:

Consider using the #fetch operation if you only want to test your query. The #fetch operation will load the first n rows at the scan level, whereas the #head/#limit are applied at the end.

Get the first n rows.



1848
1849
1850

# File 'lib/polars/lazy_frame.rb', line 1848

def head(n = 5)
  slice(0, n)
end

#include?(key) ⇒ `Boolean`

Check if LazyFrame includes key.



239
240
241

# File 'lib/polars/lazy_frame.rb', line 239

def include?(key)
  columns.include?(key)
end

#interpolate ⇒ `LazyFrame`

Interpolate intermediate values. The interpolation method is linear.

Examples:

df = Polars::DataFrame.new(
  {
    "foo" => [1, nil, 9, 10],
    "bar" => [6, 7, 9, nil],
    "baz" => [1, nil, nil, 9]
  }
).lazy
df.interpolate.collect
# =>
# shape: (4, 3)
# ┌─────┬──────┬─────┐
# │ foo ┆ bar  ┆ baz │
# │ --- ┆ ---  ┆ --- │
# │ i64 ┆ i64  ┆ i64 │
# ╞═════╪══════╪═════╡
# │ 1   ┆ 6    ┆ 1   │
# ├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┤
# │ 5   ┆ 7    ┆ 3   │
# ├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┤
# │ 9   ┆ 9    ┆ 6   │
# ├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┤
# │ 10  ┆ null ┆ 9   │
# └─────┴──────┴─────┘



2367
2368
2369

# File 'lib/polars/lazy_frame.rb', line 2367

def interpolate
  select(Utils.col("*").interpolate)
end

#join(other, left_on: nil, right_on: nil, on: nil, how: "inner", suffix: "_right", allow_parallel: true, force_parallel: false) ⇒ `LazyFrame`

Add a join operation to the Logical Plan.

Examples:

df = Polars::DataFrame.new(
  {
    "foo" => [1, 2, 3],
    "bar" => [6.0, 7.0, 8.0],
    "ham" => ["a", "b", "c"]
  }
).lazy
other_df = Polars::DataFrame.new(
  {
    "apple" => ["x", "y", "z"],
    "ham" => ["a", "b", "d"]
  }
).lazy
df.join(other_df, on: "ham").collect
# =>
# shape: (2, 4)
# ┌─────┬─────┬─────┬───────┐
# │ foo ┆ bar ┆ ham ┆ apple │
# │ --- ┆ --- ┆ --- ┆ ---   │
# │ i64 ┆ f64 ┆ str ┆ str   │
# ╞═════╪═════╪═════╪═══════╡
# │ 1   ┆ 6.0 ┆ a   ┆ x     │
# ├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ 2   ┆ 7.0 ┆ b   ┆ y     │
# └─────┴─────┴─────┴───────┘

df.join(other_df, on: "ham", how: "outer").collect
# =>
# shape: (4, 4)
# ┌──────┬──────┬─────┬───────┐
# │ foo  ┆ bar  ┆ ham ┆ apple │
# │ ---  ┆ ---  ┆ --- ┆ ---   │
# │ i64  ┆ f64  ┆ str ┆ str   │
# ╞══════╪══════╪═════╪═══════╡
# │ 1    ┆ 6.0  ┆ a   ┆ x     │
# ├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ 2    ┆ 7.0  ┆ b   ┆ y     │
# ├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ null ┆ null ┆ d   ┆ z     │
# ├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ 3    ┆ 8.0  ┆ c   ┆ null  │
# └──────┴──────┴─────┴───────┘

df.join(other_df, on: "ham", how: "left").collect
# =>
# shape: (3, 4)
# ┌─────┬─────┬─────┬───────┐
# │ foo ┆ bar ┆ ham ┆ apple │
# │ --- ┆ --- ┆ --- ┆ ---   │
# │ i64 ┆ f64 ┆ str ┆ str   │
# ╞═════╪═════╪═════╪═══════╡
# │ 1   ┆ 6.0 ┆ a   ┆ x     │
# ├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ 2   ┆ 7.0 ┆ b   ┆ y     │
# ├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ 3   ┆ 8.0 ┆ c   ┆ null  │
# └─────┴─────┴─────┴───────┘

df.join(other_df, on: "ham", how: "semi").collect
# =>
# shape: (2, 3)
# ┌─────┬─────┬─────┐
# │ foo ┆ bar ┆ ham │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ f64 ┆ str │
# ╞═════╪═════╪═════╡
# │ 1   ┆ 6.0 ┆ a   │
# ├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
# │ 2   ┆ 7.0 ┆ b   │
# └─────┴─────┴─────┘

df.join(other_df, on: "ham", how: "anti").collect
# =>
# shape: (1, 3)
# ┌─────┬─────┬─────┐
# │ foo ┆ bar ┆ ham │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ f64 ┆ str │
# ╞═════╪═════╪═════╡
# │ 3   ┆ 8.0 ┆ c   │
# └─────┴─────┴─────┘

# File 'lib/polars/lazy_frame.rb', line 1454

def join(
  other,
  left_on: nil,
  right_on: nil,
  on: nil,
  how: "inner",
  suffix: "_right",
  allow_parallel: true,
  force_parallel: false
)
  if !other.is_a?(LazyFrame)
    raise ArgumentError, "Expected a `LazyFrame` as join table, got #{other.class.name}"
  end

  if how == "cross"
    return _from_rbldf(
      _ldf.join(
        other._ldf, [], [], allow_parallel, force_parallel, how, suffix
      )
    )
  end

  if !on.nil?
    rbexprs = Utils.selection_to_rbexpr_list(on)
    rbexprs_left = rbexprs
    rbexprs_right = rbexprs
  elsif !left_on.nil? && !right_on.nil?
    rbexprs_left = Utils.selection_to_rbexpr_list(left_on)
    rbexprs_right = Utils.selection_to_rbexpr_list(right_on)
  else
    raise ArgumentError, "must specify `on` OR `left_on` and `right_on`"
  end

  _from_rbldf(
    self._ldf.join(
      other._ldf,
      rbexprs_left,
      rbexprs_right,
      allow_parallel,
      force_parallel,
      how,
      suffix,
    )
  )
end

#join_asof(other, left_on: nil, right_on: nil, on: nil, by_left: nil, by_right: nil, by: nil, strategy: "backward", suffix: "_right", tolerance: nil, allow_parallel: true, force_parallel: false) ⇒ `LazyFrame`

Perform an asof join.

This is similar to a left-join except that we match on nearest key rather than equal keys.

Both DataFrames must be sorted by the join_asof key.

For each row in the left DataFrame:

A "backward" search selects the last row in the right DataFrame whose 'on' key is less than or equal to the left's key.
A "forward" search selects the first row in the right DataFrame whose 'on' key is greater than or equal to the left's key.

The default is "backward".

# File 'lib/polars/lazy_frame.rb', line 1272

def join_asof(
  other,
  left_on: nil,
  right_on: nil,
  on: nil,
  by_left: nil,
  by_right: nil,
  by: nil,
  strategy: "backward",
  suffix: "_right",
  tolerance: nil,
  allow_parallel: true,
  force_parallel: false
)
  if !other.is_a?(LazyFrame)
    raise ArgumentError, "Expected a `LazyFrame` as join table, got #{other.class.name}"
  end

  if on.is_a?(String)
    left_on = on
    right_on = on
  end

  if left_on.nil? || right_on.nil?
    raise ArgumentError, "You should pass the column to join on as an argument."
  end

  if by_left.is_a?(String) || by_left.is_a?(Expr)
    by_left_ = [by_left]
  else
    by_left_ = by_left
  end

  if by_right.is_a?(String) || by_right.is_a?(Expr)
    by_right_ = [by_right]
  else
    by_right_ = by_right
  end

  if by.is_a?(String)
    by_left_ = [by]
    by_right_ = [by]
  elsif by.is_a?(Array)
    by_left_ = by
    by_right_ = by
  end

  tolerance_str = nil
  tolerance_num = nil
  if tolerance.is_a?(String)
    tolerance_str = tolerance
  else
    tolerance_num = tolerance
  end

  _from_rbldf(
    _ldf.join_asof(
      other._ldf,
      Polars.col(left_on)._rbexpr,
      Polars.col(right_on)._rbexpr,
      by_left_,
      by_right_,
      allow_parallel,
      force_parallel,
      suffix,
      strategy,
      tolerance_num,
      tolerance_str
    )
  )
end

#last ⇒ `LazyFrame`

Get the last row of the DataFrame.



1865
1866
1867

# File 'lib/polars/lazy_frame.rb', line 1865

def last
  tail(1)
end

#lazy ⇒ `LazyFrame`

Return lazy representation, i.e. itself.

Useful for writing code that expects either a DataFrame or LazyFrame.

Examples:

df = Polars::DataFrame.new(
  {
    "a" => [nil, 2, 3, 4],
    "b" => [0.5, nil, 2.5, 13],
    "c" => [true, true, false, nil]
  }
)
df.lazy



584
585
586

# File 'lib/polars/lazy_frame.rb', line 584

def lazy
  self
end

#limit(n = 5) ⇒ `LazyFrame`

Note:

Consider using the #fetch operation if you only want to test your query. The #fetch operation will load the first n rows at the scan level, whereas the #head/#limit are applied at the end.

Get the first n rows.

Alias for #head.



1833
1834
1835

# File 'lib/polars/lazy_frame.rb', line 1833

def limit(n = 5)
  head(5)
end

#max ⇒ `LazyFrame`

Aggregate the columns in the DataFrame to their maximum value.

Examples:

df = Polars::DataFrame.new({"a" => [1, 2, 3, 4], "b" => [1, 2, 1, 1]}).lazy
df.max.collect
# =>
# shape: (1, 2)
# ┌─────┬─────┐
# │ a   ┆ b   │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═════╪═════╡
# │ 4   ┆ 2   │
# └─────┴─────┘



2064
2065
2066

# File 'lib/polars/lazy_frame.rb', line 2064

def max
  _from_rbldf(_ldf.max)
end

#mean ⇒ `LazyFrame`

Aggregate the columns in the DataFrame to their mean value.

Examples:

df = Polars::DataFrame.new({"a" => [1, 2, 3, 4], "b" => [1, 2, 1, 1]}).lazy
df.mean.collect
# =>
# shape: (1, 2)
# ┌─────┬──────┐
# │ a   ┆ b    │
# │ --- ┆ ---  │
# │ f64 ┆ f64  │
# ╞═════╪══════╡
# │ 2.5 ┆ 1.25 │
# └─────┴──────┘



2124
2125
2126

# File 'lib/polars/lazy_frame.rb', line 2124

def mean
  _from_rbldf(_ldf.mean)
end

#median ⇒ `LazyFrame`

Aggregate the columns in the DataFrame to their median value.

Examples:

df = Polars::DataFrame.new({"a" => [1, 2, 3, 4], "b" => [1, 2, 1, 1]}).lazy
df.median.collect
# =>
# shape: (1, 2)
# ┌─────┬─────┐
# │ a   ┆ b   │
# │ --- ┆ --- │
# │ f64 ┆ f64 │
# ╞═════╪═════╡
# │ 2.5 ┆ 1.0 │
# └─────┴─────┘



2144
2145
2146

# File 'lib/polars/lazy_frame.rb', line 2144

def median
  _from_rbldf(_ldf.median)
end

#melt(id_vars: nil, value_vars: nil, variable_name: nil, value_name: nil) ⇒ `LazyFrame`

Unpivot a DataFrame from wide to long format.

Optionally leaves identifiers set.

This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (id_vars), while all other columns, considered measured variables (value_vars), are "unpivoted" to the row axis, leaving just two non-identifier columns, 'variable' and 'value'.

Examples:

df = Polars::DataFrame.new(
  {
    "a" => ["x", "y", "z"],
    "b" => [1, 3, 5],
    "c" => [2, 4, 6]
  }
).lazy
df.melt(id_vars: "a", value_vars: ["b", "c"]).collect
# =>
# shape: (6, 3)
# ┌─────┬──────────┬───────┐
# │ a   ┆ variable ┆ value │
# │ --- ┆ ---      ┆ ---   │
# │ str ┆ str      ┆ i64   │
# ╞═════╪══════════╪═══════╡
# │ x   ┆ b        ┆ 1     │
# ├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ y   ┆ b        ┆ 3     │
# ├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ z   ┆ b        ┆ 5     │
# ├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ x   ┆ c        ┆ 2     │
# ├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ y   ┆ c        ┆ 4     │
# ├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ z   ┆ c        ┆ 6     │
# └─────┴──────────┴───────┘

# File 'lib/polars/lazy_frame.rb', line 2318

def melt(id_vars: nil, value_vars: nil, variable_name: nil, value_name: nil)
  if value_vars.is_a?(String)
    value_vars = [value_vars]
  end
  if id_vars.is_a?(String)
    id_vars = [id_vars]
  end
  if value_vars.nil?
    value_vars = []
  end
  if id_vars.nil?
    id_vars = []
  end
  _from_rbldf(
    _ldf.melt(id_vars, value_vars, value_name, variable_name)
  )
end

#min ⇒ `LazyFrame`

Aggregate the columns in the DataFrame to their minimum value.

Examples:

df = Polars::DataFrame.new({"a" => [1, 2, 3, 4], "b" => [1, 2, 1, 1]}).lazy
df.min.collect
# =>
# shape: (1, 2)
# ┌─────┬─────┐
# │ a   ┆ b   │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═════╪═════╡
# │ 1   ┆ 1   │
# └─────┴─────┘



2084
2085
2086

# File 'lib/polars/lazy_frame.rb', line 2084

def min
  _from_rbldf(_ldf.min)
end

#pipe(func, *args, **kwargs, &block) ⇒ `LazyFrame`

Offers a structured way to apply a sequence of user-defined functions (UDFs).

Examples:

cast_str_to_int = lambda do |data, col_name:|
  data.with_column(Polars.col(col_name).cast(:i64))
end

df = Polars::DataFrame.new({"a" => [1, 2, 3, 4], "b" => ["10", "20", "30", "40"]}).lazy
df.pipe(cast_str_to_int, col_name: "b").collect()
# =>
# shape: (4, 2)
# ┌─────┬─────┐
# │ a   ┆ b   │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═════╪═════╡
# │ 1   ┆ 10  │
# ├╌╌╌╌╌┼╌╌╌╌╌┤
# │ 2   ┆ 20  │
# ├╌╌╌╌╌┼╌╌╌╌╌┤
# │ 3   ┆ 30  │
# ├╌╌╌╌╌┼╌╌╌╌╌┤
# │ 4   ┆ 40  │
# └─────┴─────┘



307
308
309

# File 'lib/polars/lazy_frame.rb', line 307

def pipe(func, *args, **kwargs, &block)
  func.call(self, *args, **kwargs, &block)
end

#quantile(quantile, interpolation: "nearest") ⇒ `LazyFrame`

Aggregate the columns in the DataFrame to their quantile value.

Examples:

df = Polars::DataFrame.new({"a" => [1, 2, 3, 4], "b" => [1, 2, 1, 1]}).lazy
df.quantile(0.7).collect
# =>
# shape: (1, 2)
# ┌─────┬─────┐
# │ a   ┆ b   │
# │ --- ┆ --- │
# │ f64 ┆ f64 │
# ╞═════╪═════╡
# │ 3.0 ┆ 1.0 │
# └─────┴─────┘

# File 'lib/polars/lazy_frame.rb', line 2169

def quantile(quantile, interpolation: "nearest")
  quantile = Utils.expr_to_lit_or_expr(quantile, str_to_lit: false)
  _from_rbldf(_ldf.quantile(quantile._rbexpr, interpolation))
end

#rename(mapping) ⇒ `LazyFrame`

Rename column names.

# File 'lib/polars/lazy_frame.rb', line 1669

def rename(mapping)
  existing = mapping.keys
  _new = mapping.values
  _from_rbldf(_ldf.rename(existing, _new))
end

#reverse ⇒ `LazyFrame`

Reverse the DataFrame.



1678
1679
1680

# File 'lib/polars/lazy_frame.rb', line 1678

def reverse
  _from_rbldf(_ldf.reverse)
end

#schema ⇒ `Hash`

Get the schema.

Examples:

lf = Polars::DataFrame.new(
  {
    "foo" => [1, 2, 3],
    "bar" => [6.0, 7.0, 8.0],
    "ham" => ["a", "b", "c"]
  }
).lazy
lf.schema
# => {"foo"=>Polars::Int64, "bar"=>Polars::Float64, "ham"=>Polars::Utf8}



220
221
222

# File 'lib/polars/lazy_frame.rb', line 220

def schema
  _ldf.schema
end

#select(exprs) ⇒ `LazyFrame`

Select columns from this DataFrame.

Examples:

df = Polars::DataFrame.new(
  {
    "foo" => [1, 2, 3],
    "bar" => [6, 7, 8],
    "ham" => ["a", "b", "c"],
  }
).lazy
df.select("foo").collect
# =>
# shape: (3, 1)
# ┌─────┐
# │ foo │
# │ --- │
# │ i64 │
# ╞═════╡
# │ 1   │
# ├╌╌╌╌╌┤
# │ 2   │
# ├╌╌╌╌╌┤
# │ 3   │
# └─────┘

df.select(["foo", "bar"]).collect
# =>
# shape: (3, 2)
# ┌─────┬─────┐
# │ foo ┆ bar │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═════╪═════╡
# │ 1   ┆ 6   │
# ├╌╌╌╌╌┼╌╌╌╌╌┤
# │ 2   ┆ 7   │
# ├╌╌╌╌╌┼╌╌╌╌╌┤
# │ 3   ┆ 8   │
# └─────┴─────┘

df.select(Polars.col("foo") + 1).collect
# =>
# shape: (3, 1)
# ┌─────┐
# │ foo │
# │ --- │
# │ i64 │
# ╞═════╡
# │ 2   │
# ├╌╌╌╌╌┤
# │ 3   │
# ├╌╌╌╌╌┤
# │ 4   │
# └─────┘

df.select([Polars.col("foo") + 1, Polars.col("bar") + 1]).collect
# =>
# shape: (3, 2)
# ┌─────┬─────┐
# │ foo ┆ bar │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═════╪═════╡
# │ 2   ┆ 7   │
# ├╌╌╌╌╌┼╌╌╌╌╌┤
# │ 3   ┆ 8   │
# ├╌╌╌╌╌┼╌╌╌╌╌┤
# │ 4   ┆ 9   │
# └─────┴─────┘

df.select(Polars.when(Polars.col("foo") > 2).then(10).otherwise(0)).collect
# =>
# shape: (3, 1)
# ┌─────────┐
# │ literal │
# │ ---     │
# │ i64     │
# ╞═════════╡
# │ 0       │
# ├╌╌╌╌╌╌╌╌╌┤
# │ 0       │
# ├╌╌╌╌╌╌╌╌╌┤
# │ 10      │
# └─────────┘

# File 'lib/polars/lazy_frame.rb', line 762

def select(exprs)
  exprs = Utils.selection_to_rbexpr_list(exprs)
  _from_rbldf(_ldf.select(exprs))
end

#shift(periods) ⇒ `LazyFrame`

Shift the values by a given period.

Examples:

df = Polars::DataFrame.new(
  {
    "a" => [1, 3, 5],
    "b" => [2, 4, 6]
  }
).lazy
df.shift(1).collect
# =>
# shape: (3, 2)
# ┌──────┬──────┐
# │ a    ┆ b    │
# │ ---  ┆ ---  │
# │ i64  ┆ i64  │
# ╞══════╪══════╡
# │ null ┆ null │
# ├╌╌╌╌╌╌┼╌╌╌╌╌╌┤
# │ 1    ┆ 2    │
# ├╌╌╌╌╌╌┼╌╌╌╌╌╌┤
# │ 3    ┆ 4    │
# └──────┴──────┘

df.shift(-1).collect
# =>
# shape: (3, 2)
# ┌──────┬──────┐
# │ a    ┆ b    │
# │ ---  ┆ ---  │
# │ i64  ┆ i64  │
# ╞══════╪══════╡
# │ 3    ┆ 4    │
# ├╌╌╌╌╌╌┼╌╌╌╌╌╌┤
# │ 5    ┆ 6    │
# ├╌╌╌╌╌╌┼╌╌╌╌╌╌┤
# │ null ┆ null │
# └──────┴──────┘



1726
1727
1728

# File 'lib/polars/lazy_frame.rb', line 1726

def shift(periods)
  _from_rbldf(_ldf.shift(periods))
end

#shift_and_fill(periods, fill_value) ⇒ `LazyFrame`

Shift the values by a given period and fill the resulting null values.

Examples:

df = Polars::DataFrame.new(
  {
    "a" => [1, 3, 5],
    "b" => [2, 4, 6]
  }
).lazy
df.shift_and_fill(1, 0).collect
# =>
# shape: (3, 2)
# ┌─────┬─────┐
# │ a   ┆ b   │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═════╪═════╡
# │ 0   ┆ 0   │
# ├╌╌╌╌╌┼╌╌╌╌╌┤
# │ 1   ┆ 2   │
# ├╌╌╌╌╌┼╌╌╌╌╌┤
# │ 3   ┆ 4   │
# └─────┴─────┘

df.shift_and_fill(-1, 0).collect
# =>
# shape: (3, 2)
# ┌─────┬─────┐
# │ a   ┆ b   │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═════╪═════╡
# │ 3   ┆ 4   │
# ├╌╌╌╌╌┼╌╌╌╌╌┤
# │ 5   ┆ 6   │
# ├╌╌╌╌╌┼╌╌╌╌╌┤
# │ 0   ┆ 0   │
# └─────┴─────┘

# File 'lib/polars/lazy_frame.rb', line 1776

def shift_and_fill(periods, fill_value)
  if !fill_value.is_a?(Expr)
    fill_value = Polars.lit(fill_value)
  end
  _from_rbldf(_ldf.shift_and_fill(periods, fill_value._rbexpr))
end

#slice(offset, length = nil) ⇒ `LazyFrame`

Get a slice of this DataFrame.

Examples:

df = Polars::DataFrame.new(
  {
    "a" => ["x", "y", "z"],
    "b" => [1, 3, 5],
    "c" => [2, 4, 6]
  }
).lazy
df.slice(1, 2).collect
# =>
# shape: (2, 3)
# ┌─────┬─────┬─────┐
# │ a   ┆ b   ┆ c   │
# │ --- ┆ --- ┆ --- │
# │ str ┆ i64 ┆ i64 │
# ╞═════╪═════╪═════╡
# │ y   ┆ 3   ┆ 4   │
# ├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
# │ z   ┆ 5   ┆ 6   │
# └─────┴─────┴─────┘

# File 'lib/polars/lazy_frame.rb', line 1813

def slice(offset, length = nil)
  if length && length < 0
    raise ArgumentError, "Negative slice lengths (#{length}) are invalid for LazyFrame"
  end
  _from_rbldf(_ldf.slice(offset, length))
end

#sort(by, reverse: false, nulls_last: false) ⇒ `LazyFrame`

Sort the DataFrame.

Sorting can be done by:

A single column name
An expression
Multiple expressions

Examples:

df = Polars::DataFrame.new(
  {
    "foo" => [1, 2, 3],
    "bar" => [6.0, 7.0, 8.0],
    "ham" => ["a", "b", "c"]
  }
).lazy
df.sort("foo", reverse: true).collect
# =>
# shape: (3, 3)
# ┌─────┬─────┬─────┐
# │ foo ┆ bar ┆ ham │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ f64 ┆ str │
# ╞═════╪═════╪═════╡
# │ 3   ┆ 8.0 ┆ c   │
# ├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
# │ 2   ┆ 7.0 ┆ b   │
# ├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
# │ 1   ┆ 6.0 ┆ a   │
# └─────┴─────┴─────┘

# File 'lib/polars/lazy_frame.rb', line 385

def sort(by, reverse: false, nulls_last: false)
  if by.is_a?(String)
    _from_rbldf(_ldf.sort(by, reverse, nulls_last))
  end
  if Utils.bool?(reverse)
    reverse = [reverse]
  end

  by = Utils.selection_to_rbexpr_list(by)
  _from_rbldf(_ldf.sort_by_exprs(by, reverse, nulls_last))
end

#std(ddof: 1) ⇒ `LazyFrame`

Aggregate the columns in the DataFrame to their standard deviation value.

Examples:

df = Polars::DataFrame.new({"a" => [1, 2, 3, 4], "b" => [1, 2, 1, 1]}).lazy
df.std.collect
# =>
# shape: (1, 2)
# ┌──────────┬─────┐
# │ a        ┆ b   │
# │ ---      ┆ --- │
# │ f64      ┆ f64 │
# ╞══════════╪═════╡
# │ 1.290994 ┆ 0.5 │
# └──────────┴─────┘

df.std(ddof: 0).collect
# =>
# shape: (1, 2)
# ┌──────────┬──────────┐
# │ a        ┆ b        │
# │ ---      ┆ ---      │
# │ f64      ┆ f64      │
# ╞══════════╪══════════╡
# │ 1.118034 ┆ 0.433013 │
# └──────────┴──────────┘



2012
2013
2014

# File 'lib/polars/lazy_frame.rb', line 2012

def std(ddof: 1)
  _from_rbldf(_ldf.std(ddof))
end

#sum ⇒ `LazyFrame`

Aggregate the columns in the DataFrame to their sum value.

Examples:

df = Polars::DataFrame.new({"a" => [1, 2, 3, 4], "b" => [1, 2, 1, 1]}).lazy
df.sum.collect
# =>
# shape: (1, 2)
# ┌─────┬─────┐
# │ a   ┆ b   │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═════╪═════╡
# │ 10  ┆ 5   │
# └─────┴─────┘



2104
2105
2106

# File 'lib/polars/lazy_frame.rb', line 2104

def sum
  _from_rbldf(_ldf.sum)
end

#tail(n = 5) ⇒ `LazyFrame`

Get the last n rows.



1858
1859
1860

# File 'lib/polars/lazy_frame.rb', line 1858

def tail(n = 5)
  _from_rbldf(_ldf.tail(n))
end

#take_every(n) ⇒ `LazyFrame`

Take every nth row in the LazyFrame and return as a new LazyFrame.

Examples:

s = Polars::DataFrame.new({"a" => [1, 2, 3, 4], "b" => [5, 6, 7, 8]}).lazy
s.take_every(2).collect
# =>
# shape: (2, 2)
# ┌─────┬─────┐
# │ a   ┆ b   │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═════╪═════╡
# │ 1   ┆ 5   │
# ├╌╌╌╌╌┼╌╌╌╌╌┤
# │ 3   ┆ 7   │
# └─────┴─────┘



1932
1933
1934

# File 'lib/polars/lazy_frame.rb', line 1932

def take_every(n)
  select(Utils.col("*").take_every(n))
end

#to_s ⇒ `String`

Returns a string representing the LazyFrame.

# File 'lib/polars/lazy_frame.rb', line 251

def to_s
  "    naive plan: (run LazyFrame#describe_optimized_plan to see the optimized plan)\n\n    \#{describe_plan}\n  EOS\nend\n"

#unique(maintain_order: true, subset: nil, keep: "first") ⇒ `LazyFrame`

Drop duplicate rows from this DataFrame.

Note that this fails if there is a column of type List in the DataFrame or subset.

# File 'lib/polars/lazy_frame.rb', line 2228

def unique(maintain_order: true, subset: nil, keep: "first")
  if !subset.nil? && !subset.is_a?(Array)
    subset = [subset]
  end
  _from_rbldf(_ldf.unique(maintain_order, subset, keep))
end

#unnest(names) ⇒ `LazyFrame`

Decompose a struct into its fields.

The fields will be inserted into the DataFrame on the location of the struct type.

Examples:

df = (
  Polars::DataFrame.new(
    {
      "before" => ["foo", "bar"],
      "t_a" => [1, 2],
      "t_b" => ["a", "b"],
      "t_c" => [true, nil],
      "t_d" => [[1, 2], [3]],
      "after" => ["baz", "womp"]
    }
  )
  .lazy
  .select(
    ["before", Polars.struct(Polars.col("^t_.$")).alias("t_struct"), "after"]
  )
)
df.fetch
# =>
# shape: (2, 3)
# ┌────────┬─────────────────────┬───────┐
# │ before ┆ t_struct            ┆ after │
# │ ---    ┆ ---                 ┆ ---   │
# │ str    ┆ struct[4]           ┆ str   │
# ╞════════╪═════════════════════╪═══════╡
# │ foo    ┆ {1,"a",true,[1, 2]} ┆ baz   │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ bar    ┆ {2,"b",null,[3]}    ┆ womp  │
# └────────┴─────────────────────┴───────┘

df.unnest("t_struct").fetch
# =>
# shape: (2, 6)
# ┌────────┬─────┬─────┬──────┬───────────┬───────┐
# │ before ┆ t_a ┆ t_b ┆ t_c  ┆ t_d       ┆ after │
# │ ---    ┆ --- ┆ --- ┆ ---  ┆ ---       ┆ ---   │
# │ str    ┆ i64 ┆ str ┆ bool ┆ list[i64] ┆ str   │
# ╞════════╪═════╪═════╪══════╪═══════════╪═══════╡
# │ foo    ┆ 1   ┆ a   ┆ true ┆ [1, 2]    ┆ baz   │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ bar    ┆ 2   ┆ b   ┆ null ┆ [3]       ┆ womp  │
# └────────┴─────┴─────┴──────┴───────────┴───────┘

# File 'lib/polars/lazy_frame.rb', line 2424

def unnest(names)
  if names.is_a?(String)
    names = [names]
  end
  _from_rbldf(_ldf.unnest(names))
end

#var(ddof: 1) ⇒ `LazyFrame`

Aggregate the columns in the DataFrame to their variance value.

Examples:

df = Polars::DataFrame.new({"a" => [1, 2, 3, 4], "b" => [1, 2, 1, 1]}).lazy
df.var.collect
# =>
# shape: (1, 2)
# ┌──────────┬──────┐
# │ a        ┆ b    │
# │ ---      ┆ ---  │
# │ f64      ┆ f64  │
# ╞══════════╪══════╡
# │ 1.666667 ┆ 0.25 │
# └──────────┴──────┘

df.var(ddof: 0).collect
# =>
# shape: (1, 2)
# ┌──────┬────────┐
# │ a    ┆ b      │
# │ ---  ┆ ---    │
# │ f64  ┆ f64    │
# ╞══════╪════════╡
# │ 1.25 ┆ 0.1875 │
# └──────┴────────┘



2044
2045
2046

# File 'lib/polars/lazy_frame.rb', line 2044

def var(ddof: 1)
  _from_rbldf(_ldf.var(ddof))
end

#width ⇒ `Integer`

Get the width of the LazyFrame.

Examples:

lf = Polars::DataFrame.new({"foo" => [1, 2, 3], "bar" => [4, 5, 6]}).lazy
lf.width
# => 2



232
233
234

# File 'lib/polars/lazy_frame.rb', line 232

def width
  _ldf.width
end

#with_column(column) ⇒ `LazyFrame`

Add or overwrite column in a DataFrame.

Examples:

df = Polars::DataFrame.new(
  {
    "a" => [1, 3, 5],
    "b" => [2, 4, 6]
  }
).lazy
df.with_column((Polars.col("b") ** 2).alias("b_squared")).collect
# =>
# shape: (3, 3)
# ┌─────┬─────┬───────────┐
# │ a   ┆ b   ┆ b_squared │
# │ --- ┆ --- ┆ ---       │
# │ i64 ┆ i64 ┆ f64       │
# ╞═════╪═════╪═══════════╡
# │ 1   ┆ 2   ┆ 4.0       │
# ├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
# │ 3   ┆ 4   ┆ 16.0      │
# ├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
# │ 5   ┆ 6   ┆ 36.0      │
# └─────┴─────┴───────────┘

df.with_column(Polars.col("a") ** 2).collect
# =>
# shape: (3, 2)
# ┌──────┬─────┐
# │ a    ┆ b   │
# │ ---  ┆ --- │
# │ f64  ┆ i64 │
# ╞══════╪═════╡
# │ 1.0  ┆ 2   │
# ├╌╌╌╌╌╌┼╌╌╌╌╌┤
# │ 9.0  ┆ 4   │
# ├╌╌╌╌╌╌┼╌╌╌╌╌┤
# │ 25.0 ┆ 6   │
# └──────┴─────┘



1645
1646
1647

# File 'lib/polars/lazy_frame.rb', line 1645

def with_column(column)
  with_columns([column])
end

#with_columns(exprs) ⇒ `LazyFrame`

Add or overwrite multiple columns in a DataFrame.

Examples:

ldf = Polars::DataFrame.new(
  {
    "a" => [1, 2, 3, 4],
    "b" => [0.5, 4, 10, 13],
    "c" => [true, true, false, true]
  }
).lazy
ldf.with_columns(
  [
    (Polars.col("a") ** 2).alias("a^2"),
    (Polars.col("b") / 2).alias("b/2"),
    (Polars.col("c").is_not).alias("not c")
  ]
).collect
# =>
# shape: (4, 6)
# ┌─────┬──────┬───────┬──────┬──────┬───────┐
# │ a   ┆ b    ┆ c     ┆ a^2  ┆ b/2  ┆ not c │
# │ --- ┆ ---  ┆ ---   ┆ ---  ┆ ---  ┆ ---   │
# │ i64 ┆ f64  ┆ bool  ┆ f64  ┆ f64  ┆ bool  │
# ╞═════╪══════╪═══════╪══════╪══════╪═══════╡
# │ 1   ┆ 0.5  ┆ true  ┆ 1.0  ┆ 0.25 ┆ false │
# ├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ 2   ┆ 4.0  ┆ true  ┆ 4.0  ┆ 2.0  ┆ false │
# ├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ 3   ┆ 10.0 ┆ false ┆ 9.0  ┆ 5.0  ┆ true  │
# ├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
# │ 4   ┆ 13.0 ┆ true  ┆ 16.0 ┆ 6.5  ┆ false │
# └─────┴──────┴───────┴──────┴──────┴───────┘

# File 'lib/polars/lazy_frame.rb', line 1537

def with_columns(exprs)
  exprs =
    if exprs.nil?
      []
    elsif exprs.is_a?(Expr)
      [exprs]
    else
      exprs.to_a
    end

  rbexprs = []
  exprs.each do |e|
    case e
    when Expr
      rbexprs << e._rbexpr
    when Series
      rbexprs = Utils.lit(e)._rbexpr
    else
      raise ArgumentError, "Expected an expression, got #{e}"
    end
  end

  _from_rbldf(_ldf.with_columns(rbexprs))
end

#with_context(other) ⇒ `LazyFrame`

Add an external context to the computation graph.

This allows expressions to also access columns from DataFrames that are not part of this one.

Examples:

df_a = Polars::DataFrame.new({"a" => [1, 2, 3], "b" => ["a", "c", nil]}).lazy
df_other = Polars::DataFrame.new({"c" => ["foo", "ham"]})
(
  df_a.with_context(df_other.lazy).select(
    [Polars.col("b") + Polars.col("c").first]
  )
).collect
# =>
# shape: (3, 1)
# ┌──────┐
# │ b    │
# │ ---  │
# │ str  │
# ╞══════╡
# │ afoo │
# ├╌╌╌╌╌╌┤
# │ cfoo │
# ├╌╌╌╌╌╌┤
# │ null │
# └──────┘

# File 'lib/polars/lazy_frame.rb', line 1593

def with_context(other)
  if !other.is_a?(Array)
    other = [other]
  end

  _from_rbldf(_ldf.with_context(other.map(&:_ldf)))
end

#with_row_count(name: "row_nr", offset: 0) ⇒ `LazyFrame`

Note:

This can have a negative effect on query performance. This may, for instance, block predicate pushdown optimization.

Add a column at index 0 that counts the rows.

Examples:

df = Polars::DataFrame.new(
  {
    "a" => [1, 3, 5],
    "b" => [2, 4, 6]
  }
).lazy
df.with_row_count.collect
# =>
# shape: (3, 3)
# ┌────────┬─────┬─────┐
# │ row_nr ┆ a   ┆ b   │
# │ ---    ┆ --- ┆ --- │
# │ u32    ┆ i64 ┆ i64 │
# ╞════════╪═════╪═════╡
# │ 0      ┆ 1   ┆ 2   │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
# │ 1      ┆ 3   ┆ 4   │
# ├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
# │ 2      ┆ 5   ┆ 6   │
# └────────┴─────┴─────┘



1910
1911
1912

# File 'lib/polars/lazy_frame.rb', line 1910

def with_row_count(name: "row_nr", offset: 0)
  _from_rbldf(_ldf.with_row_count(name, offset))
end

#write_json(file) ⇒ `nil`

Write the logical plan of this LazyFrame to a file or string in JSON format.

# File 'lib/polars/lazy_frame.rb', line 265

def write_json(file)
  if file.is_a?(String) || (defined?(Pathname) && file.is_a?(Pathname))
    file = Utils.format_path(file)
  end
  _ldf.write_json(file)
  nil
end

Class: Polars::LazyFrame

Overview

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.read_json(file) ⇒ LazyFrame

Instance Method Details

#cache ⇒ LazyFrame

#cleared ⇒ LazyFrame

Examples:

#collect(type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, string_cache: false, no_optimization: false, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false) ⇒ DataFrame

Examples:

#columns ⇒ Array

Examples:

#describe_optimized_plan(type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false) ⇒ String

#describe_plan ⇒ String

#drop(columns) ⇒ LazyFrame

#drop_nulls(subset: nil) ⇒ LazyFrame

Examples:

#dtypes ⇒ Array

Examples:

#explode(columns) ⇒ LazyFrame

Examples:

#fetch(n_rows = 500, type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, string_cache: false, no_optimization: false, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false) ⇒ DataFrame

Examples:

#fill_nan(fill_value) ⇒ LazyFrame

Examples:

#fill_null(value = nil, strategy: nil, limit: nil, matches_supertype: nil) ⇒ LazyFrame

#filter(predicate) ⇒ LazyFrame

Examples:

Filter on one condition:

Filter on multiple conditions:

#first ⇒ LazyFrame

#groupby(by, maintain_order: false) ⇒ LazyGroupBy

Examples:

#groupby_dynamic(index_column, every:, period: nil, offset: nil, truncate: true, include_boundaries: false, closed: "left", by: nil, start_by: "window") ⇒ DataFrame

Examples:

Group by windows of 1 hour starting at 2021-12-16 00:00:00.

The window boundaries can also be added to the aggregation result.

When closed="left", should not include right end of interval.

When closed="both" the time values at the window boundaries belong to 2 groups.

Dynamic groupbys can also be combined with grouping on normal keys.

Dynamic groupby on an index column.

#groupby_rolling(index_column:, period:, offset: nil, closed: "right", by: nil) ⇒ LazyFrame

Examples:

#head(n = 5) ⇒ LazyFrame

#include?(key) ⇒ Boolean

#interpolate ⇒ LazyFrame

Examples:

#join(other, left_on: nil, right_on: nil, on: nil, how: "inner", suffix: "_right", allow_parallel: true, force_parallel: false) ⇒ LazyFrame

Examples:

#join_asof(other, left_on: nil, right_on: nil, on: nil, by_left: nil, by_right: nil, by: nil, strategy: "backward", suffix: "_right", tolerance: nil, allow_parallel: true, force_parallel: false) ⇒ LazyFrame

#last ⇒ LazyFrame

#lazy ⇒ LazyFrame

Examples:

#limit(n = 5) ⇒ LazyFrame

#max ⇒ LazyFrame

Examples:

#mean ⇒ LazyFrame

Examples:

#median ⇒ LazyFrame

Examples:

#melt(id_vars: nil, value_vars: nil, variable_name: nil, value_name: nil) ⇒ LazyFrame

Examples:

#min ⇒ LazyFrame

Examples:

#pipe(func, *args, **kwargs, &block) ⇒ LazyFrame

Examples:

#quantile(quantile, interpolation: "nearest") ⇒ LazyFrame

Examples:

#rename(mapping) ⇒ LazyFrame

#reverse ⇒ LazyFrame

#schema ⇒ Hash

Examples:

#select(exprs) ⇒ LazyFrame

Examples:

#shift(periods) ⇒ LazyFrame

Examples:

#shift_and_fill(periods, fill_value) ⇒ LazyFrame

Examples:

.read_json(file) ⇒ `LazyFrame`

#cache ⇒ `LazyFrame`

#cleared ⇒ `LazyFrame`

#collect(type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, string_cache: false, no_optimization: false, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false) ⇒ `DataFrame`

#columns ⇒ `Array`

#describe_optimized_plan(type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false) ⇒ `String`

#describe_plan ⇒ `String`

#drop(columns) ⇒ `LazyFrame`

#drop_nulls(subset: nil) ⇒ `LazyFrame`

#dtypes ⇒ `Array`

#explode(columns) ⇒ `LazyFrame`

#fetch(n_rows = 500, type_coercion: true, predicate_pushdown: true, projection_pushdown: true, simplify_expression: true, string_cache: false, no_optimization: false, slice_pushdown: true, common_subplan_elimination: true, allow_streaming: false) ⇒ `DataFrame`

#fill_nan(fill_value) ⇒ `LazyFrame`

#fill_null(value = nil, strategy: nil, limit: nil, matches_supertype: nil) ⇒ `LazyFrame`

#filter(predicate) ⇒ `LazyFrame`

#first ⇒ `LazyFrame`

#groupby(by, maintain_order: false) ⇒ `LazyGroupBy`

#groupby_dynamic(index_column, every:, period: nil, offset: nil, truncate: true, include_boundaries: false, closed: "left", by: nil, start_by: "window") ⇒ `DataFrame`

#groupby_rolling(index_column:, period:, offset: nil, closed: "right", by: nil) ⇒ `LazyFrame`

#head(n = 5) ⇒ `LazyFrame`

#include?(key) ⇒ `Boolean`

#interpolate ⇒ `LazyFrame`

#join(other, left_on: nil, right_on: nil, on: nil, how: "inner", suffix: "_right", allow_parallel: true, force_parallel: false) ⇒ `LazyFrame`

#join_asof(other, left_on: nil, right_on: nil, on: nil, by_left: nil, by_right: nil, by: nil, strategy: "backward", suffix: "_right", tolerance: nil, allow_parallel: true, force_parallel: false) ⇒ `LazyFrame`

#last ⇒ `LazyFrame`

#lazy ⇒ `LazyFrame`

#limit(n = 5) ⇒ `LazyFrame`

#max ⇒ `LazyFrame`

#mean ⇒ `LazyFrame`

#median ⇒ `LazyFrame`

#melt(id_vars: nil, value_vars: nil, variable_name: nil, value_name: nil) ⇒ `LazyFrame`

#min ⇒ `LazyFrame`

#pipe(func, *args, **kwargs, &block) ⇒ `LazyFrame`

#quantile(quantile, interpolation: "nearest") ⇒ `LazyFrame`

#rename(mapping) ⇒ `LazyFrame`

#reverse ⇒ `LazyFrame`

#schema ⇒ `Hash`

#select(exprs) ⇒ `LazyFrame`

#shift(periods) ⇒ `LazyFrame`

#shift_and_fill(periods, fill_value) ⇒ `LazyFrame`

#slice(offset, length = nil) ⇒ `LazyFrame`

#sort(by, reverse: false, nulls_last: false) ⇒ `LazyFrame`

#std(ddof: 1) ⇒ `LazyFrame`

#sum ⇒ `LazyFrame`

#tail(n = 5) ⇒ `LazyFrame`

#take_every(n) ⇒ `LazyFrame`

#to_s ⇒ `String`

#unique(maintain_order: true, subset: nil, keep: "first") ⇒ `LazyFrame`

#unnest(names) ⇒ `LazyFrame`

#var(ddof: 1) ⇒ `LazyFrame`

#width ⇒ `Integer`

#with_column(column) ⇒ `LazyFrame`

#with_columns(exprs) ⇒ `LazyFrame`

#with_context(other) ⇒ `LazyFrame`

#with_row_count(name: "row_nr", offset: 0) ⇒ `LazyFrame`

#write_json(file) ⇒ `nil`