Method: Polars::DataFrame#map_rows

Defined in:
lib/polars/data_frame.rb

#map_rows(return_dtype: nil, inference_size: 256, &f) ⇒ Object Also known as: apply

Note:

The frame-level apply cannot track column names (as the UDF is a black-box that may arbitrarily drop, rearrange, transform, or add new columns); if you want to apply a UDF such that column names are preserved, you should use the expression-level apply syntax instead.

Apply a custom/user-defined function (UDF) over the rows of the DataFrame.

The UDF will receive each row as a tuple of values: udf(row).

Implementing logic using a Ruby function is almost always significantly slower and more memory intensive than implementing the same logic using the native expression API because:

  • The native expression engine runs in Rust; UDFs run in Ruby.
  • Use of Ruby UDFs forces the DataFrame to be materialized in memory.
  • Polars-native expressions can be parallelised (UDFs cannot).
  • Polars-native expressions can be logically optimised (UDFs cannot).

Wherever possible you should strongly prefer the native expression API to achieve the best performance.

Examples:

df = Polars::DataFrame.new({"foo" => [1, 2, 3], "bar" => [-1, 5, 8]})

Return a DataFrame by mapping each row to a tuple:

df.map_rows { |t| [t[0] * 2, t[1] * 3] }
# =>
# shape: (3, 2)
# ┌──────────┬──────────┐
# │ column_0 ┆ column_1 │
# │ ---      ┆ ---      │
# │ i64      ┆ i64      │
# ╞══════════╪══════════╡
# │ 2        ┆ -3       │
# │ 4        ┆ 15       │
# │ 6        ┆ 24       │
# └──────────┴──────────┘

Return a Series by mapping each row to a scalar:

df.map_rows { |t| t[0] * 2 + t[1] }
# =>
# shape: (3, 1)
# ┌─────┐
# │ map │
# │ --- │
# │ i64 │
# ╞═════╡
# │ 1   │
# │ 9   │
# │ 14  │
# └─────┘

Parameters:

  • return_dtype (Symbol) (defaults to: nil)

    Output type of the operation. If none given, Polars tries to infer the type.

  • inference_size (Integer) (defaults to: 256)

    Only used in the case when the custom function returns rows. This uses the first n rows to determine the output schema

Returns:



3413
3414
3415
3416
3417
3418
3419
3420
# File 'lib/polars/data_frame.rb', line 3413

def map_rows(return_dtype: nil, inference_size: 256, &f)
  out, is_df = _df.map_rows(f, return_dtype, inference_size)
  if is_df
    _from_rbdf(out)
  else
    _from_rbdf(Utils.wrap_s(out).to_frame._df)
  end
end